Pandas Read CSV in Python
Last Updated :
21 Nov, 2024
CSV files are the Comma Separated Files. It allows users to load tabular data into a DataFrame, which is a powerful structure for data manipulation and analysis. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame. Here’s a quick example to get you started.
Suppose you have a file named people.csv.
First, we must import the Pandas library. then using Pandas load this data into a DataFrame as follows:
PYTHON
import pandas as pd
# reading csv file
df = pd.read_csv("people.csv")
df
Output:

Pandas Read CSV in Python
read_csv()
function – Syntax & Parameters
read_csv()
function in Pandas is used to read data from CSV files into a Pandas DataFrame. A DataFrame is a powerful data structure that allows you to manipulate and analyze tabular data efficiently. CSV files are plain-text files where each row represents a record, and columns are separated by commas (or other delimiters).
Here is the Pandas read CSV syntax with its parameters.
Syntax: pd.read_csv(filepath_or_buffer, sep=’ ,’ , header=’infer’, index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)
Parameters:
- filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ‘, ‘.
- header: It accepts int, a list of int, row numbers to use as the column names, and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.
- usecols: Retrieves only selected columns from the CSV file.
- nrows: Number of rows to be displayed from the dataset.
- index_col: If None, there are no index numbers displayed along with records.
- skiprows: Skips passed rows in the new data frame.
Features in Pandas read_csv
1. Read specific columns using read_csv
The usecols
parameter allows to load only specific columns from a CSV file. This reduces memory usage and processing time by importing only the required data.
Python
df = pd.read_csv("people.csv", usecols=["First Name", "Email"])
print(df)
Output:
First Name Email
0 Shelby elijah57@example.net
1 Phillip bethany14@example.com
2 Kristine bthompson@example.com
3 Yesenia kaitlinkaiser@example.com
4 Lori buchananmanuel@example.net
2. Setting an Index Column (index_col
)
The index_col
parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.
Python
df = pd.read_csv("people.csv", index_col="First Name")
print(df)
Output:

Read CSV in Python
3. Handling Missing Values Using read_csv
The na_values
parameter replaces specified strings (e.g., "N/A"
, "Unknown"
) with NaN
, enabling consistent handling of missing or incomplete data during analysis.\
Python
df = pd.read_csv("people.csv", na_values=["N/A", "Unknown"])
We won’t got nan values as there is no missing value in our dataset.
4. Reading CSV Files with Different Delimiters
In this example, we will take a CSV file and then add some special characters to see how the sep parameter works.
Python
import pandas as pd
# Sample data stored in a multi-line string
data = """totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4"""
# Save the data to a CSV file
with open("sample.csv", "w") as file:
file.write(data)
print(data)
Output:
totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4
The sample data is stored in a multi-line string for demonstration purposes.
- Separator (
sep
): The sep='[:, |_]'
argument allows Pandas to handle multiple delimiters (:
, |
, _
, ,
) using a regular expression. - Engine: The
engine='python'
argument is used because the default C engine does not support regular expressions for delimiters.
Python
# Load the CSV file using pandas with multiple delimiters
df = pd.read_csv('sample.csv',
sep='[:, |_]', # Define the delimiters
engine='python') # Use Python engine for regex separators
df
Output:
totalbill tip Unnamed: 2 sex smoker Unnamed: 5 day time Unnamed: 8 size
16.99 NaN 1.01 Female No NaN Sun NaN Dinner NaN 2.0
10.34 NaN 1.66 NaN Male NaN No Sun Dinner NaN 3.0
21.01 3.50 Male NaN No Sun NaN Dinner NaN 3.0 NaN
23.68 NaN 3.31 NaN Male No NaN Sun Dinner NaN 2.0
24.59 3.61 NaN Female No NaN Sun NaN Dinner NaN 4.0
25.29 NaN 4.71 Male NaN No Sun NaN Dinner NaN 4.0
5. Using nrows in read_csv()
The nrows
parameter limits the number of rows read from a file, enabling quick previews or partial data loading for large datasets. Here, we just display only 5 rows using nrows parameter.
Python
df = pd.read_csv('people.csv', nrows=3)
df
Output:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
6. Using skiprows in read_csv()
The skiprows
parameter skips unnecessary rows at the start of a file, which is useful for ignoring metadata or extra headers that are not part of the dataset.
Python
df= pd.read_csv("people.csv")
print("Previous Dataset: ")
print(df)
# using skiprows
df = pd.read_csv("people.csv", skiprows = [4,5])
print("Dataset After skipping rows: ")
print(df)
Output:
Previous Dataset:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
5 Erin Day Male tconner@example.org 2015-10-28 Management officer
6 Katherine Buck Female conniecowan@example.com 1989-01-22 Analyst
7 Ricardo Hinton Male wyattbishop@example.com 1924-03-26 Hydrogeologist
Dataset After skipping rows:

Pandas Read CSV
7. Parsing Dates (parse_dates
)
The parse_dates
parameter converts date columns into datetime objects, simplifying operations like filtering, sorting, or time-based analysis.
Python
df = pd.read_csv("people.csv", parse_dates=["Date of birth"])
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 First Name 5 non-null object
1 Last Name 5 non-null object
2 Sex 5 non-null object
3 Email 5 non-null object
4 Date of birth 5 non-null datetime64[ns]
5 Job Title 5 non-null object
dtypes: datetime64[ns](1), object(5)
memory usage: 368.0+ bytes
Loading a CSV Data from a URL
Pandas allows you to directly read a CSV file hosted on the internet using the file’s URL. This can be incredibly useful when working with datasets shared on websites, cloud storage, or public repositories like GitHub.
Python
url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
df = pd.read_csv(url)
df
Output:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
Similar Reads
Pandas Tutorial
Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
7 min read
Creating Objects
Creating a Pandas DataFrame
Pandas DataFrame comes is a powerful tool that allows us to store and manipulate data in a structured way, similar to an Excel spreadsheet or a SQL table. A DataFrame is similar to a table with rows and columns. It helps in handling large amounts of data, performing calculations, filtering informati
3 min read
Python Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). Pandas Series Examples[GFGTABS] Python # import pandas as pd import pandas as pd # simple array data = [1, 2, 3, 4] ser = pd.Series(data) print(ser) [/GFGTABS]Output0
10 min read
Creating a Pandas Series
A Pandas Series is like a single column of data in a spreadsheet. It is a one-dimensional array that can hold many types of data such as numbers, words or even other Python objects. Each value in a Series is associated with an index, which makes data retrieval and manipulation easy. This article exp
3 min read
Selection & Slicing
Dealing with Rows and Columns in Pandas DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming. In this article, we are using nba.csv file. Dealing with Columns In order to deal with col
5 min read
Pandas Extracting rows using .loc[] - Python
Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame. To download the CSV used in code, click here. Example: Extracting single Row In this exa
3 min read
Extracting rows using Pandas .iloc[] in Python
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. here we are learning how to Extract rows using Pandas .iloc[] in Python. Pandas .iloc
7 min read
Indexing and Selecting Data with Pandas
Indexing in Pandas refers to selecting specific rows and columns from a DataFrame. It allows you to subset data in various ways, such as selecting all rows with specific columns, some rows with all columns, or a subset of both rows and columns. This technique is also known as Subset Selection. Let's
6 min read
Boolean Indexing in Pandas
In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data. Boolean indexing is a type of indexing that uses actual values of the
6 min read
Python | Pandas DataFrame.ix[ ]
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame.ix[ ] is both Label and Integer based slicing technique. Besides pure
2 min read
Python | Pandas Series.str.slice()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas str.slice() method is used to slice substrings from a string present in Pandas
3 min read
How to take column-slices of DataFrame in Pandas?
In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns. Creating Dataframe to slice columns[GFGTABS] Python # importing pandas import pandas as pd # Using DataFrame() method from pandas modu
2 min read
Operations
Python | Pandas.apply()
Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine
4 min read
Apply function to every row in a Pandas DataFrame
Python is a great language for performing data analysis tasks. It provides a huge amount of Classes and functions which help in analyzing and manipulating data more easily. In this article, we will see how we can apply a function to every row in a Pandas Dataframe. Apply Function to Every Row in a P
7 min read
Python | Pandas Series.apply()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.apply() function invoke the p
3 min read
Pandas dataframe.aggregate() | Python
Dataframe.aggregate() function is used to apply some aggregation across one or more columns. Aggregate using callable, string, dict or list of string/callables. The most frequently used aggregations are: sum: Return the sum of the values for the requested axismin: Return the minimum of the values fo
2 min read
Pandas DataFrame mean() Method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame mean()Â Pandas dataframe.mean() function returns the mean of the value
2 min read
Python | Pandas Series.mean()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.mean() function return the me
2 min read
Python | Pandas dataframe.mad()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.mad() function return the mean absolute deviation of the values for t
2 min read
Python | Pandas Series.mad() to calculate Mean Absolute Deviation of a Series
Pandas provide a method to make Calculation of MAD (Mean Absolute Deviation) very easy. MAD is defined as average distance between each value and mean. The formula used to calculate MAD is: Syntax: Series.mad(axis=None, skipna=None, level=None) Parameters: axis: 0 or âindexâ for row wise operation a
2 min read
Python | Pandas dataframe.sem()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.sem() function return unbiased standard error of the mean over reques
3 min read
Python | Pandas Series.value_counts()
Pandas is one of the most widely used library for data handling and analysis. It simplifies many data manipulation tasks especially when working with tabular data. In this article, we'll explore the Series.value_counts() function in Pandas which helps you quickly count the frequency of unique values
2 min read
Python | Pandas Index.value_counts()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.Pandas Index.value_counts() function returns object containing counts of unique values.
2 min read
Applying Lambda functions to Pandas Dataframe
In Python Pandas, we have the freedom to add different functions whenever needed like lambda function, sort function, etc. We can apply a lambda function to both the columns and rows of the Pandas data frame. Syntax: lambda arguments: expression An anonymous function which we can pass in instantly w
6 min read
Manipulating Data
Adding New Column to Existing DataFrame in Pandas
Adding a new column to a DataFrame in Pandas is a simple and common operation when working with data in Python. You can quickly create new columns by directly assigning values to them. Let's discuss how to add new columns to the existing DataFrame in Pandas. There can be multiple methods, based on d
6 min read
Python | Delete rows/columns from DataFrame using Pandas.drop()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages which makes importing and analyzing data much easier. In this article, we will how to delete a row in Excel using Pandas as well as delete
4 min read
Python | Pandas DataFrame.truncate
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
3 min read
Python | Pandas Series.truncate()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.truncate() function is used t
2 min read
Iterating over rows and columns in Pandas DataFrame
Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, to iterate over dataframe, we have to iterate a dataframe like a dictionary. In a dictionary, we iterate over the keys of the object in the same way we have to iterate
7 min read
Pandas Dataframe.sort_values()
In Pandas, sort_values() function sorts a DataFrame by one or more columns in ascending or descending order. This method is essential for organizing and analyzing large datasets effectively. Syntax: DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last'
2 min read
Python | Pandas Dataframe.sort_values() | Set-2
Prerequisite: Pandas DataFrame.sort_values() | Set-1 Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas sort_values() function so
3 min read
How to add one row in existing Pandas DataFrame?
Adding rows to a Pandas DataFrame is a common task in data manipulation and can be achieved using methods like loc[], and concat(). Method 1. Using loc[] - By Specifying its Index and ValuesThe loc[] method is ideal for directly modifying an existing DataFrame, making it more memory-efficient compar
4 min read
Merging, Joining, Concatenating and Comparing
Python | Pandas Merging, Joining, and Concatenating
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using
11 min read
Python | Pandas Series.str.cat() to concatenate string
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.Pandas str.cat() is used to concatenate strings to the passed caller series of string.
3 min read
Python - Pandas dataframe.append()
Pandas append function is used to add rows of other dataframes to end of existing dataframe, returning a new dataframe object. Columns not in the original data frames are added as new columns and the new cells are populated with NaN value. Append Dataframe into another DataframeIn this example, we a
5 min read
Python | Pandas Series.append()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.append() function is used to
4 min read
Python | Pandas Index.append()
Python is an excellent language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas are one of those packages, making importing and analyzing data much easier. Pandas Index.append() The function is used to append a single or a collection of indices
2 min read
Python | Pandas Series.combine()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Series.combine() is a series mathematical operation method. This is used to com
3 min read
Add a row at top in pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Let's see how can we can add a row at top in pandas DataFrame.Observe this dataset first. C/C++ Code # importing pandas module import pandas as pd # making data fr
1 min read
Python | Pandas str.join() to join string/list elements with passed delimiter
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas str.join() method is used to join all elements in list present in a series with
2 min read
Join two text columns into a single column in Pandas
Let's see the different methods to join two text columns into a single column. Method #1: Using cat() function We can also use different separators during join. e.g. -, _, " " etc. # importing pandas import pandas as pd df = pd.DataFrame({'Last': ['Gaitonde', 'Singh', 'Mathur'], 'First': ['Ganesh',
1 min read
How To Compare Two Dataframes with Pandas compare?
A DataFrame is a 2D structure composed of rows and columns, and where data is stored into a tubular form. It is mutable in terms of size, and heterogeneous tabular data. Arithmetic operations can also be performed on both row and column labels. To know more about the creation of Pandas DataFrame. He
5 min read
How to compare the elements of the two Pandas Series?
Sometimes we need to compare pandas series to perform some comparative analysis. It is possible to compare two pandas Series with help of Relational operators, we can easily compare the corresponding elements of two series at a time. The result will be displayed in form of True or False. And we can
3 min read
Working with Date and Time
Python | Working with date and time using Pandas
While working with data, encountering time series data is very usual. Pandas is a very useful tool while working with time series data. Pandas provide a different set of tools using which we can perform all the necessary tasks on date-time data. Let's try to understand with the examples discussed b
8 min read
Python | Pandas Timestamp.timestamp
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Timestamp.timestamp() function returns the time expressed as the number of seco
3 min read
Python | Pandas Timestamp.now
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Pandas Timestamp.now() function returns the current time in the local timezone. It is Equiv
3 min read
Python | Pandas Timestamp.isoformat
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Timestamp objects represent date and time values, making them essential for wor
2 min read
Python | Pandas Timestamp.date
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Timestamp.date() function return a datetime object with same year, month and da
2 min read
Python | Pandas Timestamp.replace
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Pandas Timestamp.replace() function is used to replace the member values of the given
3 min read
Python | Pandas.to_datetime()
When a CSV file is imported and a Data Frame is made, the Date time objects in the file are read as a string object rather than a Date Time object Hence itâs very tough to perform operations like Time difference on a string rather than a Date Time object. Pandas to_datetime() method helps to convert
4 min read
Python | pandas.date_range() method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. pandas.date_range() is one of the general functions in Pandas which is used to return
4 min read