pandas: Write DataFrame to CSV with to_csv() | note.nkmk.me (2024)

You can write data from pandas.DataFrame and pandas.Series to CSV files using the to_csv() method. This method also allows appending to an existing CSV file. By altering the delimiter, the data can be saved as a TSV (Tab-separated values) file.

Not all arguments are covered in this article. For a comprehensive understanding of all arguments, please refer to the official documentation linked above.

Contents

  • Write to CSV file with to_csv()
  • Write only specific columns: columns
  • Write column/row names: header, index
  • Encoding: encoding
  • Delimiter: sep
  • Write mode (write/overwrite or append): mode
  • Handle missing values NaN: na_rep
  • Format for floating-point numbers (float): float_format
  • Write with custom formats

The pandas.read_csv() function allows you to read CSV files and load them into DataFrame objects.

  • pandas: Read CSV into DataFrame with read_csv()

The sample code in this article uses pandas version 2.0.3. Consider the following DataFrame as an example.

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0).head(3)print(df)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70

The following examples use DataFrame but are equally applicable to Series.

Write to CSV file with to_csv()

to_csv() is provided as a method for both pandas.DataFrame and pandas.Series.

By specifying the file path as the first argument, the data can be written to that path in CSV format.

df.to_csv('data/dst/to_csv_out.csv')
name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70

The path can be either absolute or relative. For how to check and change the current directory, see the following article.

  • Get and change the current working directory in Python

Write only specific columns: columns

To write only specific columns, specify a list of column names to the columns argument.

df.to_csv('data/dst/to_csv_out_columns.csv', columns=['age', 'point'])
name,age,pointAlice,24,64Bob,42,92Charlie,18,70

By default, columns is set to None, and all columns are included in the output.

Write column/row names: header, index

You can control whether to write column names (columns) and row names (index) by setting the header and index arguments to True or False.

24,NY,6442,CA,9218,CA,70

By default, both are set to True, and both columns (header) and index are included in the output, as demonstrated in the previous examples.

Encoding: encoding

The encoding argument allows you to define the encoding of the output CSV file. By default, this is utf-8. You can change the encoding as needed, for example, to encoding='shift_jis' or encoding='cp932'.

Delimiter: sep

The sep argument allows you to set the delimiter. The default delimiter is a comma ,, which produces a CSV file.

To create a TSV file, which uses a tab character \t as the delimiter, you can simply set sep='\t'.

df.to_csv('data/dst/to_csv_out.tsv', sep='\t')
name age state pointAlice 24 NY 64Bob 42 CA 92Charlie 18 CA 70

Write mode (write/overwrite or append): mode

The mode argument allows you to specify the write mode, similar to the built-in open() function.

  • Read, write, and create files in Python (with and open())

By default, mode is set to 'w'. If the specified path does not exist, it creates a new file; if the path already exists, it overwrites the existing file.

To prevent overwriting an existing file, use mode='x'. If the specified path does not exist, a new file is created; if it does, an error is returned, and the file is not overwritten.

# df.to_csv('data/dst/to_csv_out.csv', mode='x')# FileExistsError: [Errno 17] File exists: 'data/dst/to_csv_out.csv'

For appending data, use mode='a'. This will append the DataFrame to the end of the existing file as new rows. Note that the header (columns) will also be appended unless header=False is specified.

df.to_csv('data/dst/to_csv_out_a.csv')df.to_csv('data/dst/to_csv_out_a.csv', mode='a', header=False)
name,age,state,pointAlice,24,NY,64Bob,42,CA,92Charlie,18,CA,70Alice,24,NY,64Bob,42,CA,92Charlie,18,CA,70

mode='a' is used to append rows. If you want to add columns, you should read the target file, add the columns, and then overwrite the original file. You can use mode='w' for overwriting the file, but this argument can be omitted because 'w' is the default mode.

df.to_csv('data/dst/to_csv_out_a_new_column.csv')df_new = pd.read_csv('data/dst/to_csv_out_a_new_column.csv', index_col=0)print(df_new)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70df_new['new_col'] = 'new data'print(df_new)# age state point new_col# name # Alice 24 NY 64 new data# Bob 42 CA 92 new data# Charlie 18 CA 70 new datadf_new.to_csv('data/dst/to_csv_out_a_new_column.csv')
name,age,state,point,new_colAlice,24,NY,64,new dataBob,42,CA,92,new dataCharlie,18,CA,70,new data

Handle missing values NaN: na_rep

Consider the following DataFrame containing missing values NaN.

df_nan = df.copy()df_nan.iat[0, 1] = float('nan')df_nan.iat[1, 2] = float('nan')print(df_nan)# age state point# name # Alice 24 NaN 64.0# Bob 42 CA NaN# Charlie 18 CA 70.0

By default, missing values are represented as empty strings in the output.

df_nan.to_csv('data/dst/to_csv_out_nan.csv')
name,age,state,pointAlice,24,,64.0Bob,42,CA,Charlie,18,CA,70.0

You can replace missing values with a specific string using the na_rep argument.

df_nan.to_csv('data/dst/to_csv_out_nan_rep.csv', na_rep='NaN')
name,age,state,pointAlice,24,NaN,64.0Bob,42,CA,NaNCharlie,18,CA,70.0

Refer to the following articles for information on handling missing values in DataFrame.

  • pandas: Remove NaN (missing values) with dropna()
  • pandas: Replace NaN (missing values) with fillna()

Format for floating-point numbers (float): float_format

Consider the following DataFrame.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})print(df)# col1 col2 col3# 0 1.234568e-01 123456789.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object

When you use print(), the output appears in scientific notation. However, this is due to the display settings and does not indicate that the actual value has been rounded off.

print(df.iat[0, 0])# 0.123456789print(df.iat[1, 0])# 1000000000.0

By default, when using to_csv(), the value is saved as is.

df.to_csv('data/dst/to_csv_out_float_default.csv')
,col1,col2,col30,0.123456789,123456789.0,1234567891,1000000000.0,0.0,0

With the float_format argument, you can control the formatting of floating-point numbers (float) when saving. You can either provide a format string in style used by printf (with a %), or use a callable object like format().

Here is an example using the printf-style, where the number of digits after the decimal point is set to three.

print('%.3f' % 0.123456789)# 0.123print('%.3f' % 123456789)# 123456789.000df.to_csv('data/dst/to_csv_out_float_format_3f.csv', float_format='%.3f')
,col1,col2,col30,0.123,123456789.000,1234567891,1000000000.000,0.000,0

Here is an example using a callable object. This outputs in scientific notation with three digits after the decimal point.

  • Format strings and numbers with format() in Python
print('{:.3e}'.format(0.123456789))# 1.235e-01print('{:.3e}'.format(123456789))# 1.235e+08df.to_csv('data/dst/to_csv_out_float_format_3e.csv', float_format='{:.3e}'.format)
,col1,col2,col30,1.235e-01,1.235e+08,1234567891,1.000e+09,0.000e+00,0

Please note that specifying the number of digits may lead to loss of information beyond the specified number of digits during saving.

Write with custom formats

The float_format argument, as the name suggests, only applies to columns with the float data type. Columns with the integer (int) type remain as they are. Moreover, you can't specify different formats for each column.

If you want to specify a format for int columns, or use a different format for each float column, you should convert the data in the original DataFrame to strings in the desired format before saving it.

df = pd.DataFrame({'col1': [0.123456789, 1000000000.0], 'col2': [123456789.0, 0.0], 'col3': [123456789, 0]})df['col1'] = df['col1'].map('{:.3f}'.format)df['col2'] = df['col2'].map('{:.3e}'.format)df['col3'] = df['col3'].map('{:#010x}'.format)print(df)# col1 col2 col3# 0 0.123 1.235e+08 0x075bcd15# 1 1000000000.000 0.000e+00 0x00000000df.to_csv('data/dst/to_csv_out_float_format_str.csv')
,col1,col2,col30,0.123,1.235e+08,0x075bcd151,1000000000.000,0.000e+00,0x00000000

In the example above, note that when saving integers in hexadecimal form, pandas.read_csv() will interpret them as strings. If you want them to be treated as numerical data, you'll need to convert them after loading.

df = pd.read_csv('data/dst/to_csv_out_float_format_str.csv', index_col=0)print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 0x075bcd15# 1 1.000000e+09 0.0 0x00000000print(df.dtypes)# col1 float64# col2 float64# col3 object# dtype: objectdf['col3'] = df['col3'].map(lambda x: int(x, 16))print(df)# col1 col2 col3# 0 1.230000e-01 123500000.0 123456789# 1 1.000000e+09 0.0 0print(df.dtypes)# col1 float64# col2 float64# col3 int64# dtype: object
pandas: Write DataFrame to CSV with to_csv() | note.nkmk.me (2024)

FAQs

How to use to_csv in Pandas? ›

Pandas DataFrame to_csv() function exports the DataFrame to CSV format. If a file argument is provided, the output will be the CSV file. Otherwise, the return value is a CSV format like string. sep: Specify a custom delimiter for the CSV output, the default is a comma.

How to write Pandas DataFrame to tsv file? ›

Write the DataFrame to a TSV file: Use the to_csv() function from Pandas to write the DataFrame to the specified file path. Set the sep parameter to '\t' to indicate that the columns should be separated by tabs. Verify the output: Check if the TSV file has been successfully created and contains the desired data.

What is the best way to write a Pandas data frame to a CSV file? ›

to_csv() Method. The . to_csv() method is a built-in function in Pandas that allows you to save a Pandas DataFrame as a CSV file. This method exports the DataFrame into a comma-separated values (CSV) file, which is a simple and widely used format for storing tabular data.

How do I convert a DataFrame data to a CSV file in Python? ›

The Pandas to_csv() function is used to convert the DataFrame into CSV data. To write the CSV data into a file, we can simply pass a file object to the function. Otherwise, the CSV data is returned in a string format.

What does to_csv mean? ›

to_csv() is a function in pandas that we use to write object to CSV file.

What task does the following command df to_csv a CSV perform? ›

The method `df. to_csv("file. csv")` is used in Python with the Pandas library to export a DataFrame (`df`) to a CSV (Comma-Separated Values) file.

What is the difference between CSV and TSV? ›

TSV stands for Tab Separated Value. TSV file is a flat file, which uses the Tab character to delimit data and reports one time-series per line. CSV stands for Comma Separated Value. CSV file is a flat file, which uses the comma (,) character to delimit data and reports one observation per line.

How to convert TSV to csv using Python? ›

How to Convert TSV to CSV via Python
  1. Install 'Aspose. Cells for Python via Java'.
  2. Add a library reference (import the library) to your Python project.
  3. Load TSV file with an instance of Workbook.
  4. Convert TSV to CSV by calling Workbook. save method.
  5. Get the conversion result of TSV to CSV.

How to put csv data into Pandas DataFrame? ›

Import a CSV file using the read_csv() function from the pandas library. Set a column index while reading your data into memory. Specify the columns in your data that you want the read_csv() function to return. Read data from a URL with the pandas.

How to append DataFrame to_csv mode? ›

For appending data, use mode='a' . This will append the DataFrame to the end of the existing file as new rows. Note that the header ( columns ) will also be appended unless header=False is specified. mode='a' is used to append rows.

Does df to_csv overwrite? ›

If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.

What is the delimiter of Pandas to_csv? ›

You can provide any string of length 1 as a delimiter for the output to the to_csv() function using the parameter 'sep'. The default delimiter is comma ','. In the below example, the symbol backslash i.e. '/' is used as a delimiter.

How to write pandas DataFrame to CSV without index? ›

To save a DataFrame in Pandas without both header and index, we just need to set both index and header to false. We'll also use the to_csv method to save these changes in the saved CSV file.

How do I convert Python data to CSV? ›

To write data to a CSV file in Python, you can use Python's built-in csv module with the writer() and writerow() functions. Here's a simple example: import csv with open('file. csv', 'w', newline='') as file: writer = csv.

How do I write a DataFrame to an existing CSV file? ›

Pandas DataFrame to_csv() Syntax
  1. existing. csv: Name of the existing CSV file.
  2. mode: By default mode is 'w' which will overwrite the file. ...
  3. index: False means do not include an index column when appending the new data. ...
  4. header: False means do not include a header when appending the new data.
Dec 6, 2023

How to access CSV with Pandas? ›

Import a CSV file using the read_csv() function from the pandas library. Set a column index while reading your data into memory. Specify the columns in your data that you want the read_csv() function to return. Read data from a URL with the pandas.

What is to CSV method in Pandas? ›

Using the to_csv() Function in Pandas

Pandas's to_csv() function saves a dataframe as a CSV file. It takes several parameters that allow us to customize the output.

How to convert Pandas DataFrame to csv only certain columns? ›

Using the pandas DataFrame. to_csv() function, you can export specific columns from a DataFrame to a CSV file by passing in a list of column names as the argument for the parameter 'columns'. The data will be exported to the CSV file in the order of the column names specified in the list.

How to iterate through CSV using Pandas? ›

Iterating through specific columns and rows in a Pandas dataframe can be done using the iterrows() function. This function iterates over the rows of the dataframe, returning the index of each row and a series containing the data in the row.

Top Articles
Latest Posts
Article information

Author: Dean Jakubowski Ret

Last Updated:

Views: 6175

Rating: 5 / 5 (50 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Dean Jakubowski Ret

Birthday: 1996-05-10

Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

Phone: +96313309894162

Job: Legacy Sales Designer

Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.