pandas: Write DataFrame to CSV with to_csv()

You can write data from pandas.DataFrame and pandas.Series to CSV files using the to_csv() method. This method also allows appending to an existing CSV file. By altering the delimiter, the data can be saved as a TSV (Tab-separated values) file.

Not all arguments are covered in this article. For a comprehensive understanding of all arguments, please refer to the official documentation linked above.

Contents

Write to CSV file with to_csv()
Write only specific columns: columns
Write column/row names: header, index
Encoding: encoding
Delimiter: sep
Write mode (write/overwrite or append): mode
Handle missing values NaN: na_rep
Format for floating-point numbers (float): float_format
Write with custom formats

The pandas.read_csv() function allows you to read CSV files and load them into DataFrame objects.

pandas: Read CSV into DataFrame with read_csv()

The sample code in this article uses pandas version 2.0.3. Consider the following DataFrame as an example.

sample_pandas_normal.csv

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).head(3)print(df)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70

source: pandas_to_csv.py

The following examples use DataFrame but are equally applicable to Series.

Write to CSV file with `to_csv()`

to_csv() is provided as a method for both pandas.DataFrame and pandas.Series.

By specifying the file path as the first argument, the data can be written to that path in CSV format.

df.to_csv('data/dst/to_csv_out.csv')

source: pandas_to_csv.py

name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70

source: to_csv_out.csv

The path can be either absolute or relative. For how to check and change the current directory, see the following article.

Get and change the current working directory in Python

Write only specific columns: `columns`

To write only specific columns, specify a list of column names to the columns argument.

df.to_csv('data/dst/to_csv_out_columns.csv', columns=['age', 'point'])

source: pandas_to_csv.py

name,age,pointAlice,24,64Bob,42,92Charlie,18,70

source: to_csv_out_columns.csv

By default, columns is set to None, and all columns are included in the output.

Write column/row names: `header`, `index`

You can control whether to write column names (columns) and row names (index) by setting the header and index arguments to True or False.

df.to_csv('data/dst/to_csv_out_header_index.csv', header=False, index=False)

source: pandas_to_csv.py

Encoding: `encoding`

The encoding argument allows you to define the encoding of the output CSV file. By default, this is utf-8. You can change the encoding as needed, for example, to encoding='shift_jis' or encoding='cp932'.

Delimiter: `sep`

The sep argument allows you to set the delimiter. The default delimiter is a comma ,, which produces a CSV file.

To create a TSV file, which uses a tab character \t as the delimiter, you can simply set sep='\t'.

df.to_csv('data/dst/to_csv_out.tsv', sep='\t')

source: pandas_to_csv.py

name age state pointAlice 24 NY 64Bob 42 CA 92Charlie 18 CA 70

source: to_csv_out.tsv

Write mode (write/overwrite or append): `mode`

The mode argument allows you to specify the write mode, similar to the built-in open() function.

Read, write, and create files in Python (with and open())

By default, mode is set to 'w'. If the specified path does not exist, it creates a new file; if the path already exists, it overwrites the existing file.

To prevent overwriting an existing file, use mode='x'. If the specified path does not exist, a new file is created; if it does, an error is returned, and the file is not overwritten.

# df.to_csv('data/dst/to_csv_out.csv', mode='x')# FileExistsError: [Errno 17] File exists: 'data/dst/to_csv_out.csv'

source: pandas_to_csv.py

For appending data, use mode='a'. This will append the DataFrame to the end of the existing file as new rows. Note that the header (columns) will also be appended unless header=False is specified.

df.to_csv('data/dst/to_csv_out_a.csv')df.to_csv('data/dst/to_csv_out_a.csv', mode='a', header=False)

source: pandas_to_csv.py

name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70Alice,24,NY,64Bob,42,CA,92Charlie,18,CA,70

source: to_csv_out_a.csv

mode='a' is used to append rows. If you want to add columns, you should read the target file, add the columns, and then overwrite the original file. You can use mode='w' for overwriting the file, but this argument can be omitted because 'w' is the default mode.

df.to_csv('data/dst/to_csv_out_a_new_column.csv')df_new = pd.read_csv('data/dst/to_csv_out_a_new_column.csv', index_col=0)print(df_new)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70df_new['new_col'] = 'new data'print(df_new)# age state point new_col# name # Alice 24 NY 64 new data# Bob 42 CA 92 new data# Charlie 18 CA 70 new datadf_new.to_csv('data/dst/to_csv_out_a_new_column.csv')

source: pandas_to_csv.py

name,age,state,point,new_colAlice,24,NY,64,new dataBob,42,CA,92,new dataCharlie,18,CA,70,new data

source: to_csv_out_a_new_column.csv

Handle missing values `NaN`: `na_rep`

Consider the following DataFrame containing missing values NaN.

df_nan = df.copy()df_nan.iat[0, 1] = float('nan')df_nan.iat[1, 2] = float('nan')print(df_nan)# age state point# name # Alice 24 NaN 64.0# Bob 42 CA NaN# Charlie 18 CA 70.0

source: pandas_to_csv.py

By default, missing values are represented as empty strings in the output.

df_nan.to_csv('data/dst/to_csv_out_nan.csv')

source: pandas_to_csv.py

name,age,state,pointAlice,24,,64.0Bob,42,CA,Charlie,18,CA,70.0

source: to_csv_out_nan.csv

You can replace missing values with a specific string using the na_rep argument.

df_nan.to_csv('data/dst/to_csv_out_nan_rep.csv', na_rep='NaN')

source: pandas_to_csv.py

name,age,state,pointAlice,24,NaN,64.0Bob,42,CA,NaNCharlie,18,CA,70.0

source: to_csv_out_nan_rep.csv

Refer to the following articles for information on handling missing values in DataFrame.

pandas: Remove NaN (missing values) with dropna()
pandas: Replace NaN (missing values) with fillna()

Format for floating-point numbers (`float`): `float_format`

Consider the following DataFrame.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})print(df)# col1 col2 col3# 0 1.234568e-01 123456789.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object

source: pandas_to_csv_float.py

When you use print(), the output appears in scientific notation. However, this is due to the display settings and does not indicate that the actual value has been rounded off.

print(df.iat[0, 0])# 0.123456789print(df.iat[1, 0])# 1000000000.0

source: pandas_to_csv_float.py

By default, when using to_csv(), the value is saved as is.

df.to_csv('data/dst/to_csv_out_float_default.csv')

source: pandas_to_csv_float.py

,col1,col2,col30,0.123456789,123456789.0,1234567891,1000000000.0,0.0,0

source: to_csv_out_float_default.csv

With the float_format argument, you can control the formatting of floating-point numbers (float) when saving. You can either provide a format string in style used by printf (with a %), or use a callable object like format().

Here is an example using the printf-style, where the number of digits after the decimal point is set to three.

Built-in Types - printf-style String Formatting — Python 3.11.4 documentation

print('%.3f' % 0.123456789)# 0.123print('%.3f' % 123456789)# 123456789.000df.to_csv('data/dst/to_csv_out_float_format_3f.csv', float_format='%.3f')

source: pandas_to_csv_float.py

,col1,col2,col30,0.123,123456789.000,1234567891,1000000000.000,0.000,0

source: to_csv_out_float_format_3f.csv

Here is an example using a callable object. This outputs in scientific notation with three digits after the decimal point.

Format strings and numbers with format() in Python

print('{:.3e}'.format(0.123456789))# 1.235e-01print('{:.3e}'.format(123456789))# 1.235e+08df.to_csv('data/dst/to_csv_out_float_format_3e.csv', float_format='{:.3e}'.format)

source: pandas_to_csv_float.py

,col1,col2,col30,1.235e-01,1.235e+08,1234567891,1.000e+09,0.000e+00,0

source: to_csv_out_float_format_3e.csv

Please note that specifying the number of digits may lead to loss of information beyond the specified number of digits during saving.

Write with custom formats

The float_format argument, as the name suggests, only applies to columns with the float data type. Columns with the integer (int) type remain as they are. Moreover, you can't specify different formats for each column.

If you want to specify a format for int columns, or use a different format for each float column, you should convert the data in the original DataFrame to strings in the desired format before saving it.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})df['col1'] = df['col1'].map('{:.3f}'.format)df['col2'] = df['col2'].map('{:.3e}'.format)df['col3'] = df['col3'].map('{:#010x}'.format)print(df)# col1 col2 col3# 0 0.123 1.235e+08 0x075bcd15# 1 1000000000.000 0.000e+00 0x00000000df.to_csv('data/dst/to_csv_out_float_format_str.csv')

source: pandas_to_csv_float.py

,col1,col2,col30,0.123,1.235e+08,0x075bcd151,1000000000.000,0.000e+00,0x00000000

source: to_csv_out_float_format_str.csv

In the example above, note that when saving integers in hexadecimal form, pandas.read_csv() will interpret them as strings. If you want them to be treated as numerical data, you'll need to convert them after loading.

df = pd.read_csv('data/dst/to_csv_out_float_format_str.csv', index_col=0)print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 0x075bcd15# 1 1.000000e+09 0.0 0x00000000print(df.dtypes)# col1 float64# col2 float64# col3 object# dtype: objectdf['col3'] = df['col3'].map(lambda x: int(x, 16))print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object

source: pandas_to_csv_float.py

pandas: Write DataFrame to CSV with to_csv() | note.nkmk.me (2024)

FAQs

How to use to_csv in Pandas? ›

Pandas DataFrame to_csv() function exports the DataFrame to CSV format. If a file argument is provided, the output will be the CSV file. Otherwise, the return value is a CSV format like string. sep: Specify a custom delimiter for the CSV output, the default is a comma.

How to write Pandas DataFrame to tsv file? ›

Write the DataFrame to a TSV file: Use the to_csv() function from Pandas to write the DataFrame to the specified file path. Set the sep parameter to '\t' to indicate that the columns should be separated by tabs. Verify the output: Check if the TSV file has been successfully created and contains the desired data.

Learn More Now ›

What is the best way to write a Pandas data frame to a CSV file? ›

to_csv() Method. The . to_csv() method is a built-in function in Pandas that allows you to save a Pandas DataFrame as a CSV file. This method exports the DataFrame into a comma-separated values (CSV) file, which is a simple and widely used format for storing tabular data.

Get More Info Here ›

How do I convert a DataFrame data to a CSV file in Python? ›

The Pandas to_csv() function is used to convert the DataFrame into CSV data. To write the CSV data into a file, we can simply pass a file object to the function. Otherwise, the CSV data is returned in a string format.

View Details ›

What does to_csv mean? ›

to_csv() is a function in pandas that we use to write object to CSV file.

Get More Info Here ›

What task does the following command df to_csv a CSV perform? ›

The method `df. to_csv("file. csv")` is used in Python with the Pandas library to export a DataFrame (`df`) to a CSV (Comma-Separated Values) file.

See Details ›

What is the difference between CSV and TSV? ›

TSV stands for Tab Separated Value. TSV file is a flat file, which uses the Tab character to delimit data and reports one time-series per line. CSV stands for Comma Separated Value. CSV file is a flat file, which uses the comma (,) character to delimit data and reports one observation per line.

Get More Info Here ›

How to convert TSV to csv using Python? ›

How to Convert TSV to CSV via Python

Install 'Aspose. Cells for Python via Java'.
Add a library reference (import the library) to your Python project.
Load TSV file with an instance of Workbook.
Convert TSV to CSV by calling Workbook. save method.
Get the conversion result of TSV to CSV.

Read On ›

How to put csv data into Pandas DataFrame? ›

Import a CSV file using the read_csv() function from the pandas library. Set a column index while reading your data into memory. Specify the columns in your data that you want the read_csv() function to return. Read data from a URL with the pandas.

View Details ›

How to append DataFrame to_csv mode? ›

For appending data, use mode='a' . This will append the DataFrame to the end of the existing file as new rows. Note that the header ( columns ) will also be appended unless header=False is specified. mode='a' is used to append rows.

Show Me More ›

Does df to_csv overwrite? ›

If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.

Read On ›

What is the delimiter of Pandas to_csv? ›

You can provide any string of length 1 as a delimiter for the output to the to_csv() function using the parameter 'sep'. The default delimiter is comma ','. In the below example, the symbol backslash i.e. '/' is used as a delimiter.

Find Out More ›

How to write pandas DataFrame to CSV without index? ›

To save a DataFrame in Pandas without both header and index, we just need to set both index and header to false. We'll also use the to_csv method to save these changes in the saved CSV file.

pandas: Write DataFrame to CSV with to_csv() | note.nkmk.me (2024)

Write to CSV file with to_csv()

Write only specific columns: columns

Write column/row names: header, index

Encoding: encoding

Delimiter: sep

Write mode (write/overwrite or append): mode

Handle missing values NaN: na_rep

Format for floating-point numbers (float): float_format

Write with custom formats

FAQs

How to use to_csv in Pandas? ›

Does df to_csv overwrite? ›

Write to CSV file with `to_csv()`

Write only specific columns: `columns`

Write column/row names: `header`, `index`

Encoding: `encoding`

Delimiter: `sep`

Write mode (write/overwrite or append): `mode`

Handle missing values `NaN`: `na_rep`

Format for floating-point numbers (`float`): `float_format`