RFM Analysis Using Python
Last Updated :
06 May, 2025
In business analytics one of the easiest ways to understand and categorize customers is through RFM analysis. RFM stands for Recency, Frequency and Monetary value which are three simple ways to look at customer behaviour:
- Recency: How recently did the customer make a purchase? The more recent, the more engaged they are.
- Frequency: How often do they buy from you? Customers who buy often are more loyal.
- Monetary: How much does the customer spend? High spenders are usually more valuable.
We use it to group our customers into different categories like Top Customers, High-Value Customers and Lost Customers. This helps us to focus on customers who matter most so we can create better marketing strategies and improve customer satisfaction.
Python Implementation for RFM Analysis
1. Importing Required Libraries
We will import necessary libraries like numpy, pandas, matplotlib and datetime.
Python
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
2. Reading Data
You can download dataset from here.
Python
df = pd.read_csv('dataset.csv')
df.head()
Output:

Reading our Dataset
3. Convert PurchaseDate to Datetime
We convert the PurchaseDate column from a string to a datetime object to make it easier to work with dates and perform date calculations.
Python
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
4. Calculate Recency
We calculate the Recency i.e how recently a customer made a purchase by grouping the data by CustomerID and find the last purchase date for each customer, then calculate how many days have passed since that last purchase.
Python
df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()
Output:

CustomerID with their Recency
5. Calculate Frequency
Next we calculate Frequency i.e how often a customer makes a purchase. We drop duplicates to ensure we count only unique purchases per customer, then group by CustomerID to count the number of purchases each customer has made.
Python
frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()
Output:

CustomerID with Frequency
6. Calculate Monetary Value
Here, we calculate the Monetary value i.e how much a customer spends. We sum the TransactionAmount for each customer to get the total amount spent.
Python
df['Total'] = df['TransactionAmount'] # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()
Output:

CustomerID with Monetary Value
7. Merge Recency, Frequency and Monetary Data
In this step, we merge the recency, frequency and monetary data for each customer into a single DataFrame. This will give us a comprehensive view of the customer’s behavior.
Python
rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()
8. Rank Customers Based on Recency, Frequency and Monetary
We rank customers based on Recency, Frequency and Monetary. Lower recency is better while higher frequency and monetary values are better. The rank() function assigns a rank to each customer.
Python
rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)
9. Normalize the Ranks
We normalize the ranks to a scale of 0-100 to make them easier to compare. This makes the ranks more consistent across different customers and helps in calculating the final RFM score.
Python
rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100
10. Drop Individual Ranks
Since we no longer need the individual ranks (R_rank, F_rank, M_rank) we drop them from the DataFrame to clean up the data.
Python
rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()
Output:

Dataset After Dropping Individual Ranks
11. Calculate RFM Score
We calculate the RFM score by assigning different weights to Recency, Frequency and Monetary values. The weights are based on the business goals, with Monetary given the highest weight.
Python
rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)
12. Display RFM Score and CustomerID
Here, we display the CustomerID and RFM_Score for the first few customers to see the results.
Python
rfm_df[['CustomerID', 'RFM_Score']].head(7)
Output:

RFM_Score for first 7 Customers
13. Segment Customers Based on RFM Score
We classify customers into different segments based on their RFM scores. This helps to categorize them into groups like Top Customers, High Value Customers, etc.
Python
rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))
14. Display Customer Segments
In this step, we show the first 20 rows with CustomerID, RFM_Score and Customer_segment to see how customers have been grouped.
Python
rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)
Output:

Displaying Customer Segments for 20 Customers
15. Visualize Customer Segments with a Pie Chart
Finally, we create a pie chart to visualize the distribution of customers across different segments. This helps in understanding how many customers belong to each segment.
Python
plt.pie(rfm_df.Customer_segment.value_counts(),
labels=rfm_df.Customer_segment.value_counts().index,
autopct='%.0f%%')
plt.show()
Output:

Customer Segmentation Visualization in a Pie Chart
With this simple technique business can gain insights of customer behaviour and can plan accordingly.
You can download the ipynb file for the above implementation here.
Similar Reads
RFM Analysis Analysis Using Python
In business analytics one of the easiest ways to understand and categorize customers is through RFM analysis. RFM stands for Recency, Frequency and Monetary value which are three simple ways to look at customer behaviour: Recency: How recently did the customer make a purchase? The more recent, the m
4 min read
How to Use Python Pandas
to manipulate and analyze data efficientlyPandas is a Python toolbox for working with data collections. It includes functions for analyzing, cleaning, examining, and modifying data. In this article, we will see how we can use Python Pandas with the help of examples. What is Python Pandas?A Python li
5 min read
Sequential Data Analysis in Python
Sequential data, often referred to as ordered data, consists of observations arranged in a specific order. This type of data is not necessarily time-based; it can represent sequences such as text, DNA strands, or user actions. In this article, we are going to explore, sequential data analysis, it's
8 min read
Python | Pandas Period.strftime
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Period.strftime() function returns the string representation of the Period, dep
2 min read
Python | Pandas Period.second
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.Pandas Period.second attribute returns an integer value which represents the value of s
2 min read
re.findall() in Python
re.findall() method in Python helps us find all pattern occurrences in a string. It's like searching through a sentence to find every word that matches a specific rule. We can do this using regular expressions (regex) to create the pattern and then use re.findall() to get a list of matches. Let's sa
2 min read
Cross-correlation Analysis in Python
Cross-correlation analysis is a powerful technique in signal processing and time series analysis used to measure the similarity between two series at different time lags. It reveals how one series (reference) is correlated with the other (target) when shifted by a specific amount. This information i
5 min read
Python | Pandas Period.year
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Period.year attribute return an integer value representing the year the given p
2 min read
re.search() in Python
re.search() method in Python helps to find patterns in strings. It scans through the entire string and returns the first match it finds. This method is part of Python's re-module, which allows us to work with regular expressions (regex) simply. Example: [GFGTABS] Python import re s = "Hello, we
3 min read
Using ipdb to Debug Python Code
Interactive Python Debugger(IPDB) is a powerful debugging tool that is built on top of the IPython shell. It allows developers to step through their code line by line, set breakpoints, and inspect variables in real-time. Unlike other debuggers, IPDB runs inside the Python interpreter, which makes it
4 min read