Data Structures in the Python Pandas Package



The data structure is a way of collecting data, organizing, and storing format that enables us to access and modify data in an efficient way. It is a collection of data types. It gives you the best way of organizing the items(values) in terms of memory.

The python pandas package handles data in an effective way because it has two powerful data structures named Series and DataFrames.

Series is nothing but a one-dimensional labeled array, which can be capable of holding any data type. it can store integer values, strings, floating-point numbers, etc. Each and every value in a Series is assigned to a label(assigned to an index), labels may be integer values or they may be a name representation.

Example

import pandas as pd
data = pd.Series([1,2,3,4,5])
print(data)

Explanation

Pandas Series data structure is defined by using a simple python list with 5 elements. By using the import keyword we have imported the pandas package and then we created the Series by using pandas. Series function.

Output

0   1
1   2
2   3
3   4
4   5
dtype: int64

The output of the Series is represented in the above block, we can observe that the pandas series is a 1-Dimensional object which stores homogeneous data and each value in a Series is represented by a label. For our above example label values are 0,1,2,3,4.

The other data structure in pandas is DataFrame, which is a 2-Dimensional labeled data structure used to represent in rows and columns format. Data present in each column may have different data types. The total structure of a DataFrame looks similar to a spreadsheet or SQL table. Like Series, DataFrame rows are also represented with labels.

Example

import pandas as pd
df = pd.DataFrame([[2,3,4,5],[6,7,8,9]], columns=['a','b','c','d'])
print(df)

Explanation

In this above example, we have created a simple pandas DataFrame by using a list of lists, and here the column labels are manually defined as a,b,c,d.

Output

    a   b   c   d
0   2   3   4   5
1   6   7   8   9

The output of pandas DataFrame can be seen in the above output block, the DataFrame is created with 2 rows and 4 columns, 0,1 and a,b,c,d are the respective rows and columns labels.

Before pandas 0.20.0 version, there is a 3-Dimensional data structure available, that data structure is called a panel. In newer versions of pandas, these panels’ 3-Dimensional data is represented as a MultiIndex DataFrame.

Updated on: 2021-11-17T07:10:36+05:30

767 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements