You can write data from pandas.DataFrame
and pandas.Series
to CSV files using the to_csv()
method. This method also allows appending to an existing CSV file. By altering the delimiter, the data can be saved as a TSV (Tab-separated values) file.
- pandas.DataFrame.to_csv — pandas 2.0.3 documentation
- pandas.Series.to_csv — pandas 2.0.3 documentation
Not all arguments are covered in this article. For a comprehensive understanding of all arguments, please refer to the official documentation linked above.
Contents
- Write to CSV file with to_csv()
- Write only specific columns: columns
- Write column/row names: header, index
- Encoding: encoding
- Delimiter: sep
- Write mode (write/overwrite or append): mode
- Handle missing values NaN: na_rep
- Format for floating-point numbers (float): float_format
- Write with custom formats
The pandas.read_csv()
function allows you to read CSV files and load them into DataFrame
objects.
- pandas: Read CSV into DataFrame with read_csv()
The sample code in this article uses pandas version 2.0.3
. Consider the following DataFrame
as an example.
import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).head(3)print(df)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70
source: pandas_to_csv.py
The following examples use DataFrame
but are equally applicable to Series
.
Write to CSV file with to_csv()
to_csv()
is provided as a method for both pandas.DataFrame
and pandas.Series
.
By specifying the file path as the first argument, the data can be written to that path in CSV format.
df.to_csv('data/dst/to_csv_out.csv')
source: pandas_to_csv.py
name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70
source: to_csv_out.csv
The path can be either absolute or relative. For how to check and change the current directory, see the following article.
- Get and change the current working directory in Python
Write only specific columns: columns
To write only specific columns, specify a list of column names to the columns
argument.
df.to_csv('data/dst/to_csv_out_columns.csv', columns=['age', 'point'])
source: pandas_to_csv.py
name,age,pointAlice,24,64Bob,42,92Charlie,18,70
source: to_csv_out_columns.csv
By default, columns
is set to None
, and all columns are included in the output.
Write column/row names: header
, index
You can control whether to write column names (columns
) and row names (index
) by setting the header
and index
arguments to True
or False
.
df.to_csv('data/dst/to_csv_out_header_index.csv', header=False, index=False)
24,NY,6442,CA,9218,CA,70
source: to_csv_out_header_index.csv
By default, both are set to True
, and both columns
(header) and index
are included in the output, as demonstrated in the previous examples.
Encoding: encoding
The encoding
argument allows you to define the encoding of the output CSV file. By default, this is utf-8
. You can change the encoding as needed, for example, to encoding='shift_jis'
or encoding='cp932'
.
Delimiter: sep
The sep
argument allows you to set the delimiter. The default delimiter is a comma ,
, which produces a CSV file.
To create a TSV file, which uses a tab character \t
as the delimiter, you can simply set sep='\t'
.
df.to_csv('data/dst/to_csv_out.tsv', sep='\t')
source: pandas_to_csv.py
name age state pointAlice 24 NY 64Bob 42 CA 92Charlie 18 CA 70
source: to_csv_out.tsv
Write mode (write/overwrite or append): mode
The mode
argument allows you to specify the write mode, similar to the built-in open()
function.
- Read, write, and create files in Python (with and open())
By default, mode
is set to 'w'
. If the specified path does not exist, it creates a new file; if the path already exists, it overwrites the existing file.
To prevent overwriting an existing file, use mode='x'
. If the specified path does not exist, a new file is created; if it does, an error is returned, and the file is not overwritten.
# df.to_csv('data/dst/to_csv_out.csv', mode='x')# FileExistsError: [Errno 17] File exists: 'data/dst/to_csv_out.csv'
source: pandas_to_csv.py
For appending data, use mode='a'
. This will append the DataFrame
to the end of the existing file as new rows. Note that the header (columns
) will also be appended unless header=False
is specified.
df.to_csv('data/dst/to_csv_out_a.csv')df.to_csv('data/dst/to_csv_out_a.csv', mode='a', header=False)
source: pandas_to_csv.py
name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70Alice,24,NY,64Bob,42,CA,92Charlie,18,CA,70
source: to_csv_out_a.csv
mode='a'
is used to append rows. If you want to add columns, you should read the target file, add the columns, and then overwrite the original file. You can use mode='w'
for overwriting the file, but this argument can be omitted because 'w'
is the default mode.
df.to_csv('data/dst/to_csv_out_a_new_column.csv')df_new = pd.read_csv('data/dst/to_csv_out_a_new_column.csv', index_col=0)print(df_new)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70df_new['new_col'] = 'new data'print(df_new)# age state point new_col# name # Alice 24 NY 64 new data# Bob 42 CA 92 new data# Charlie 18 CA 70 new datadf_new.to_csv('data/dst/to_csv_out_a_new_column.csv')
source: pandas_to_csv.py
name,age,state,point,new_colAlice,24,NY,64,new dataBob,42,CA,92,new dataCharlie,18,CA,70,new data
source: to_csv_out_a_new_column.csv
Handle missing values NaN
: na_rep
Consider the following DataFrame
containing missing values NaN
.
df_nan = df.copy()df_nan.iat[0, 1] = float('nan')df_nan.iat[1, 2] = float('nan')print(df_nan)# age state point# name # Alice 24 NaN 64.0# Bob 42 CA NaN# Charlie 18 CA 70.0
source: pandas_to_csv.py
By default, missing values are represented as empty strings in the output.
df_nan.to_csv('data/dst/to_csv_out_nan.csv')
source: pandas_to_csv.py
name,age,state,pointAlice,24,,64.0Bob,42,CA,Charlie,18,CA,70.0
source: to_csv_out_nan.csv
You can replace missing values with a specific string using the na_rep
argument.
df_nan.to_csv('data/dst/to_csv_out_nan_rep.csv', na_rep='NaN')
source: pandas_to_csv.py
name,age,state,pointAlice,24,NaN,64.0Bob,42,CA,NaNCharlie,18,CA,70.0
source: to_csv_out_nan_rep.csv
Refer to the following articles for information on handling missing values in DataFrame
.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Replace NaN (missing values) with fillna()
Format for floating-point numbers (float
): float_format
Consider the following DataFrame
.
df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})print(df)# col1 col2 col3# 0 1.234568e-01 123456789.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object
source: pandas_to_csv_float.py
When you use print()
, the output appears in scientific notation. However, this is due to the display settings and does not indicate that the actual value has been rounded off.
print(df.iat[0, 0])# 0.123456789print(df.iat[1, 0])# 1000000000.0
source: pandas_to_csv_float.py
By default, when using to_csv()
, the value is saved as is.
df.to_csv('data/dst/to_csv_out_float_default.csv')
source: pandas_to_csv_float.py
,col1,col2,col30,0.123456789,123456789.0,1234567891,1000000000.0,0.0,0
source: to_csv_out_float_default.csv
With the float_format
argument, you can control the formatting of floating-point numbers (float
) when saving. You can either provide a format string in style used by printf (with a %
), or use a callable object like format()
.
Here is an example using the printf-style, where the number of digits after the decimal point is set to three.
print('%.3f' % 0.123456789)# 0.123print('%.3f' % 123456789)# 123456789.000df.to_csv('data/dst/to_csv_out_float_format_3f.csv', float_format='%.3f')
source: pandas_to_csv_float.py
,col1,col2,col30,0.123,123456789.000,1234567891,1000000000.000,0.000,0
source: to_csv_out_float_format_3f.csv
Here is an example using a callable object. This outputs in scientific notation with three digits after the decimal point.
- Format strings and numbers with format() in Python
print('{:.3e}'.format(0.123456789))# 1.235e-01print('{:.3e}'.format(123456789))# 1.235e+08df.to_csv('data/dst/to_csv_out_float_format_3e.csv', float_format='{:.3e}'.format)
source: pandas_to_csv_float.py
,col1,col2,col30,1.235e-01,1.235e+08,1234567891,1.000e+09,0.000e+00,0
source: to_csv_out_float_format_3e.csv
Please note that specifying the number of digits may lead to loss of information beyond the specified number of digits during saving.
Write with custom formats
The float_format
argument, as the name suggests, only applies to columns with the float
data type. Columns with the integer (int
) type remain as they are. Moreover, you can't specify different formats for each column.
If you want to specify a format for int
columns, or use a different format for each float
column, you should convert the data in the original DataFrame
to strings in the desired format before saving it.
df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})df['col1'] = df['col1'].map('{:.3f}'.format)df['col2'] = df['col2'].map('{:.3e}'.format)df['col3'] = df['col3'].map('{:#010x}'.format)print(df)# col1 col2 col3# 0 0.123 1.235e+08 0x075bcd15# 1 1000000000.000 0.000e+00 0x00000000df.to_csv('data/dst/to_csv_out_float_format_str.csv')
source: pandas_to_csv_float.py
,col1,col2,col30,0.123,1.235e+08,0x075bcd151,1000000000.000,0.000e+00,0x00000000
source: to_csv_out_float_format_str.csv
In the example above, note that when saving integers in hexadecimal form, pandas.read_csv()
will interpret them as strings. If you want them to be treated as numerical data, you'll need to convert them after loading.
df = pd.read_csv('data/dst/to_csv_out_float_format_str.csv', index_col=0)print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 0x075bcd15# 1 1.000000e+09 0.0 0x00000000print(df.dtypes)# col1 float64# col2 float64# col3 object# dtype: objectdf['col3'] = df['col3'].map(lambda x: int(x, 16))print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object
source: pandas_to_csv_float.py