RFM Analysis Using Python

Last Updated : 06 May, 2025

In business analytics one of the easiest ways to understand and categorize customers is through RFM analysis. RFM stands for Recency, Frequency and Monetary value which are three simple ways to look at customer behaviour:

Recency: How recently did the customer make a purchase? The more recent, the more engaged they are.
Frequency: How often do they buy from you? Customers who buy often are more loyal.
Monetary: How much does the customer spend? High spenders are usually more valuable.

We use it to group our customers into different categories like Top Customers, High-Value Customers and Lost Customers. This helps us to focus on customers who matter most so we can create better marketing strategies and improve customer satisfaction.

Python Implementation for RFM Analysis

1. Importing Required Libraries

We will import necessary libraries like numpy, pandas, matplotlib and datetime.

Python

import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

2. Reading Data

You can download dataset from here.

Python

df = pd.read_csv('dataset.csv')
df.head()

Output:

Reading our Dataset

3. Convert PurchaseDate to Datetime

We convert the PurchaseDate column from a string to a datetime object to make it easier to work with dates and perform date calculations.

Python

df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])

4. Calculate Recency

We calculate the Recency i.e how recently a customer made a purchase by grouping the data by CustomerID and find the last purchase date for each customer, then calculate how many days have passed since that last purchase.

Python

df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()

Output:

CustomerID with their Recency

5. Calculate Frequency

Next we calculate Frequency i.e how often a customer makes a purchase. We drop duplicates to ensure we count only unique purchases per customer, then group by CustomerID to count the number of purchases each customer has made.

Python

frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()

Output:

CustomerID with Frequency

6. Calculate Monetary Value

Here, we calculate the Monetary value i.e how much a customer spends. We sum the TransactionAmount for each customer to get the total amount spent.

Python

df['Total'] = df['TransactionAmount']  # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()

Output:

CustomerID with Monetary Value

7. Merge Recency, Frequency and Monetary Data

In this step, we merge the recency, frequency and monetary data for each customer into a single DataFrame. This will give us a comprehensive view of the customer’s behavior.

Python

rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()

8. Rank Customers Based on Recency, Frequency and Monetary

We rank customers based on Recency, Frequency and Monetary. Lower recency is better while higher frequency and monetary values are better. The rank() function assigns a rank to each customer.

Python

rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)

9. Normalize the Ranks

We normalize the ranks to a scale of 0-100 to make them easier to compare. This makes the ranks more consistent across different customers and helps in calculating the final RFM score.

Python

rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100

10. Drop Individual Ranks

Since we no longer need the individual ranks (R_rank, F_rank, M_rank) we drop them from the DataFrame to clean up the data.

Python

rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()

Output:

Dataset After Dropping Individual Ranks

11. Calculate RFM Score

We calculate the RFM score by assigning different weights to Recency, Frequency and Monetary values. The weights are based on the business goals, with Monetary given the highest weight.

Python

rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)

12. Display RFM Score and CustomerID

Here, we display the CustomerID and RFM_Score for the first few customers to see the results.

Python

rfm_df[['CustomerID', 'RFM_Score']].head(7)

Output:

RFM_Score for first 7 Customers

13. Segment Customers Based on RFM Score

We classify customers into different segments based on their RFM scores. This helps to categorize them into groups like Top Customers, High Value Customers, etc.

Python

rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
                                       np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
                                                np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
                                                         np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))

14. Display Customer Segments

In this step, we show the first 20 rows with CustomerID, RFM_Score and Customer_segment to see how customers have been grouped.

Python

rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)

Output:

Displaying Customer Segments for 20 Customers

15. Visualize Customer Segments with a Pie Chart

Finally, we create a pie chart to visualize the distribution of customers across different segments. This helps in understanding how many customers belong to each segment.

Python

plt.pie(rfm_df.Customer_segment.value_counts(),
        labels=rfm_df.Customer_segment.value_counts().index,
        autopct='%.0f%%')
plt.show()

Output:

Customer Segmentation Visualization in a Pie Chart

With this simple technique business can gain insights of customer behaviour and can plan accordingly.

You can download the ipynb file for the above implementation here.

How to Use Python Pandas

anuragnayak

Improve

Article Tags :

Practice Tags :

python

RFM Analysis Using Python

Python Implementation for RFM Analysis

1. Importing Required Libraries

2. Reading Data

3. Convert PurchaseDate to Datetime

4. Calculate Recency

5. Calculate Frequency

6. Calculate Monetary Value

7. Merge Recency, Frequency and Monetary Data

8. Rank Customers Based on Recency, Frequency and Monetary

9. Normalize the Ranks

10. Drop Individual Ranks

11. Calculate RFM Score

12. Display RFM Score and CustomerID

13. Segment Customers Based on RFM Score

14. Display Customer Segments

15. Visualize Customer Segments with a Pie Chart

Similar Reads

Thank You!

What kind of Experience do you want to share?