
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Check If Time Series Data is Stationary with Python
Time series is a collection of data points, which are recorded at regular intervals of time. It is used to study the trend of patterns, relationship between the variable over the defined time. The common examples of time series are stock prices, weather patterns and economic indicators.
It analyzes the time series data by the statistical and mathematical techniques. The main aim of the time series is to identify the patterns and trends of the previous data to forecast the future values.
The data is said to be stationary, if it doesn't change with the time. It is necessary to check if the data is stationary or not. There are different ways to check if time series data is stationary, let's see them one by one.
Augmented Dickey-Fuller(ADF)
Augmented Dickey-Fuller(ADF) is a statistical test which checks for the presence of the unit roots available in the time series data. The unit root is the data which is non stationary. It returns the test static and p value as the output.
In the output, if the p value is below 0.05 that indicates the non-stationary time series data. The below is the example of the ADF stationary data. We have function in python namely, adfuller() which is available in the statsmodel package to check the time series data is stationary.
Example
In this example we are finding the ADF statistic and p-value of the Augmented Dickey Fuller using the adfuller() function of the statsmodel package of python.
from statsmodels.tsa.stattools import adfuller import pandas as pd data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',parse_dates=['date'], index_col='date') t_data = data.loc[:, 'value'].values result = adfuller(t_data) print("The result of adfuller function:",result) print('ADF Statistic:', result[0]) print('p-value:', result[1])
Output
Following is the output produced after executing the program above -
The result of adfuller function: (3.145185689306744, 1.0, 15, 188, {'1%': -3.465620397124192, '5%': -2.8770397560752436, '10%': -2.5750324547306476}, 549.6705685364172) ADF Statistic: 3.145185689306744 p-value: 1.0
KPSS Test
The other test for checking the unit roots is the KPSS test. It is abbreviated as Kwiatkowski-Phillips-Schmidt-Shin. We have a function named kpss() in the statsmodels package which is used to check for the unit roots in the time series data.
Example
The below is an example to find the unit roots in the time series data.
from statsmodels.tsa.stattools import kpss import pandas as pd data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',parse_dates=['date'], index_col='date') t_data = data.loc[:, 'value'].values from statsmodels.tsa.stattools import kpss result = kpss(data) print("The result of kpss function:",result) print('KPSS Statistic:', result[0]) print('p-value:', result[1])
Output
The following is the output of the kpss() function of the statsmodels package.
The result of kpss function: (2.0131256386303322, 0.01, 9, {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739}) KPSS Statistic: 2.0131256386303322 p-value: 0.01
Rolling statistics
The other way to check the stationarity of the time series data is by plotting the moving average and moving standard deviation of the given time series data and has to check if the data remain constant. In the plot if the data vary over time then the time series data is non stationary.
Example
The following is the example for checking the data variation by plotting the moving average and moving standard deviation using the matplotlib library plot() function.
import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',parse_dates=['date'], index_col='date') t_data = data.loc[:, 'value'].values moving_avg = t_data.mean() moving_std = t_data.std() plt.plot(data, color='green', label='Original') plt.plot(moving_avg, color='red', label='moving average') plt.plot(moving_std, color='black', label='moving Standard deviation') plt.legend(loc='best') plt.title('Moving Average & Moving Standard Deviation') plt.show()
Output
The following is the output of the standardization of the time series data by plotting the moving average and moving standard Deviation.
