Saving a Pandas Dataframe as a CSV

Pandas Read CSV in Python

Last Updated : 21 Nov, 2024

CSV files are the Comma Separated Files. It allows users to load tabular data into a DataFrame, which is a powerful structure for data manipulation and analysis. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame. Here’s a quick example to get you started.

Suppose you have a file named people.csv. First, we must import the Pandas library. then using Pandas load this data into a DataFrame as follows:

PYTHON

import pandas as pd

# reading csv file 
df = pd.read_csv("people.csv")
df

Output:

Pandas-Read-CSV

Pandas Read CSV in Python

`read_csv()` function – Syntax & Parameters

read_csv() function in Pandas is used to read data from CSV files into a Pandas DataFrame. A DataFrame is a powerful data structure that allows you to manipulate and analyze tabular data efficiently. CSV files are plain-text files where each row represents a record, and columns are separated by commas (or other delimiters).

Here is the Pandas read CSV syntax with its parameters.

Syntax: pd.read_csv(filepath_or_buffer, sep=’ ,’ , header=’infer’, index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)

Parameters:

filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
sep: It stands for separator, default is ‘, ‘.
header: It accepts int, a list of int, row numbers to use as the column names, and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.
usecols: Retrieves only selected columns from the CSV file.
nrows: Number of rows to be displayed from the dataset.
index_col: If None, there are no index numbers displayed along with records.
skiprows: Skips passed rows in the new data frame.

Features in Pandas `read_csv`

1. Read specific columns using read_csv

The usecols parameter allows to load only specific columns from a CSV file. This reduces memory usage and processing time by importing only the required data.

Python

df = pd.read_csv("people.csv", usecols=["First Name", "Email"])
print(df)

Output:

  First Name                       Email
0     Shelby        elijah57@example.net
1    Phillip       bethany14@example.com
2   Kristine       bthompson@example.com
3    Yesenia   kaitlinkaiser@example.com
4       Lori  buchananmanuel@example.net

2. Setting an Index Column (`index_col`)

The index_col parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.

Python

df = pd.read_csv("people.csv", index_col="First Name")
print(df)

Output:

setting-columns-as-the-DataFrame-index

Read CSV in Python

3. Handling Missing Values Using read_csv

The na_values parameter replaces specified strings (e.g., "N/A", "Unknown") with NaN, enabling consistent handling of missing or incomplete data during analysis.\

Python

df = pd.read_csv("people.csv", na_values=["N/A", "Unknown"])

We won’t got nan values as there is no missing value in our dataset.

4. Reading CSV Files with Different Delimiters

In this example, we will take a CSV file and then add some special characters to see how the sep parameter works.

Python

import pandas as pd

# Sample data stored in a multi-line string
data = """totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4"""

# Save the data to a CSV file
with open("sample.csv", "w") as file:
    file.write(data)
print(data)

Output:

totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4

The sample data is stored in a multi-line string for demonstration purposes.

Separator (sep): The sep='[:, |_]' argument allows Pandas to handle multiple delimiters (:, |, _, ,) using a regular expression.
Engine: The engine='python' argument is used because the default C engine does not support regular expressions for delimiters.

Python

# Load the CSV file using pandas with multiple delimiters
df = pd.read_csv('sample.csv',
                 sep='[:, |_]',  # Define the delimiters
                 engine='python')  # Use Python engine for regex separators
df

Output:

	totalbill	tip	Unnamed: 2	sex	smoker	Unnamed: 5	day	time	Unnamed: 8	size
16.99	NaN	1.01	Female	No	NaN	Sun	NaN	Dinner	NaN	2.0
10.34	NaN	1.66	NaN	Male	NaN	No	Sun	Dinner	NaN	3.0
21.01	3.50	Male	NaN	No	Sun	NaN	Dinner	NaN	3.0	NaN
23.68	NaN	3.31	NaN	Male	No	NaN	Sun	Dinner	NaN	2.0
24.59	3.61	NaN	Female	No	NaN	Sun	NaN	Dinner	NaN	4.0
25.29	NaN	4.71	Male	NaN	No	Sun	NaN	Dinner	NaN	4.0

5. Using nrows in read_csv()

The nrows parameter limits the number of rows read from a file, enabling quick previews or partial data loading for large datasets. Here, we just display only 5 rows using nrows parameter.

Python

df = pd.read_csv('people.csv', nrows=3)
df

Output:

	First Name	Last Name	Sex	Email	Date of birth	Job Title
0	Shelby	Terrell	Male	elijah57@example.net	1945-10-26	Games developer
1	Phillip	Summers	Female	bethany14@example.com	1910-03-24	Phytotherapist
2	Kristine	Travis	Male	bthompson@example.com	1992-07-02	Homeopath

6. Using skiprows in read_csv()

The skiprows parameter skips unnecessary rows at the start of a file, which is useful for ignoring metadata or extra headers that are not part of the dataset.

Python

df= pd.read_csv("people.csv")
print("Previous Dataset: ")
print(df)
# using skiprows
df = pd.read_csv("people.csv", skiprows = [4,5])
print("Dataset After skipping rows: ")
print(df)

Output:

Previous Dataset:
  First Name Last Name     Sex                       Email Date of birth           Job Title 
0     Shelby   Terrell    Male        elijah57@example.net    1945-10-26     Games developer
1    Phillip   Summers  Female       bethany14@example.com    1910-03-24      Phytotherapist  
2   Kristine    Travis    Male       bthompson@example.com    1992-07-02           Homeopath  
3    Yesenia  Martinez    Male   kaitlinkaiser@example.com    2017-08-03   Market researcher
4       Lori      Todd    Male  buchananmanuel@example.net    1938-12-01  Veterinary surgeon 
5       Erin       Day    Male         tconner@example.org    2015-10-28  Management officer  
6  Katherine      Buck  Female     conniecowan@example.com    1989-01-22             Analyst
7    Ricardo    Hinton    Male     wyattbishop@example.com    1924-03-26      Hydrogeologist

Dataset After skipping rows:

Read-CSV-in-Python

Pandas Read CSV

7. Parsing Dates (`parse_dates`)

The parse_dates parameter converts date columns into datetime objects, simplifying operations like filtering, sorting, or time-based analysis.

Python

df = pd.read_csv("people.csv", parse_dates=["Date of birth"])
print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   First Name     5 non-null      object        
 1   Last Name      5 non-null      object        
 2   Sex            5 non-null      object        
 3   Email          5 non-null      object        
 4   Date of birth  5 non-null      datetime64[ns]
 5   Job Title      5 non-null      object        
dtypes: datetime64[ns](1), object(5)
memory usage: 368.0+ bytes

Loading a CSV Data from a URL

Pandas allows you to directly read a CSV file hosted on the internet using the file’s URL. This can be incredibly useful when working with datasets shared on websites, cloud storage, or public repositories like GitHub.

Python

url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
df = pd.read_csv(url)
df

Output:


First Name	Last Name	Sex	Email	Date of birth	Job Title
0	Shelby	Terrell	Male	elijah57@example.net	1945-10-26	Games developer
1	Phillip	Summers	Female	bethany14@example.com	1910-03-24	Phytotherapist
2	Kristine	Travis	Male	bthompson@example.com	1992-07-02	Homeopath
3	Yesenia	Martinez	Male	kaitlinkaiser@example.com	2017-08-03	Market researcher
4	Lori	Todd	Male	buchananmanuel@example.net	1938-12-01	Veterinary surgeon

Saving a Pandas Dataframe as a CSV

K

Kartikaybhutani

Improve

Article Tags :

Practice Tags :

python

Similar Reads

Pandas Tutorial

Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t