Open In App

RFM Analysis Using Python

Last Updated : 06 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In business analytics one of the easiest ways to understand and categorize customers is through RFM analysis. RFM stands for Recency, Frequency and Monetary value which are three simple ways to look at customer behaviour:

  • Recency: How recently did the customer make a purchase? The more recent, the more engaged they are.
  • Frequency: How often do they buy from you? Customers who buy often are more loyal.
  • Monetary: How much does the customer spend? High spenders are usually more valuable.

We use it to group our customers into different categories like Top Customers, High-Value Customers and Lost Customers. This helps us to focus on customers who matter most so we can create better marketing strategies and improve customer satisfaction.

Python Implementation for RFM Analysis

1. Importing Required Libraries

We will import necessary libraries like numpy, pandas, matplotlib and datetime.

Python
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

2. Reading Data

You can download dataset from here.

Python
df = pd.read_csv('dataset.csv')
df.head()

Output:

1

Reading our Dataset

3. Convert PurchaseDate to Datetime

We convert the PurchaseDate column from a string to a datetime object to make it easier to work with dates and perform date calculations.

Python
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])

4. Calculate Recency

We calculate the Recency i.e how recently a customer made a purchase by grouping the data by CustomerID and find the last purchase date for each customer, then calculate how many days have passed since that last purchase.

Python
df_recency = df.groupby(by='CustomerID', as_index=False)['PurchaseDate'].max()
df_recency.columns = ['CustomerID', 'LastPurchaseDate']
recent_date = df_recency['LastPurchaseDate'].max()
df_recency['Recency'] = df_recency['LastPurchaseDate'].apply(lambda x: (recent_date - x).days)
df_recency.head()

Output:

2

CustomerID with their Recency

5. Calculate Frequency

Next we calculate Frequency i.e how often a customer makes a purchase. We drop duplicates to ensure we count only unique purchases per customer, then group by CustomerID to count the number of purchases each customer has made.

Python
frequency_df = df.drop_duplicates().groupby(by=['CustomerID'], as_index=False)['PurchaseDate'].count()
frequency_df.columns = ['CustomerID', 'Frequency']
frequency_df.head()

Output:

3

CustomerID with Frequency

6. Calculate Monetary Value

Here, we calculate the Monetary value i.e how much a customer spends. We sum the TransactionAmount for each customer to get the total amount spent.

Python
df['Total'] = df['TransactionAmount']  # Total spent by each customer
monetary_df = df.groupby(by='CustomerID', as_index=False)['Total'].sum()
monetary_df.columns = ['CustomerID', 'Monetary']
monetary_df.head()

Output:

4

CustomerID with Monetary Value

7. Merge Recency, Frequency and Monetary Data

In this step, we merge the recency, frequency and monetary data for each customer into a single DataFrame. This will give us a comprehensive view of the customer’s behavior.

Python
rf_df = df_recency.merge(frequency_df, on='CustomerID')
rfm_df = rf_df.merge(monetary_df, on='CustomerID').drop(columns='LastPurchaseDate')
rfm_df.head()

8. Rank Customers Based on Recency, Frequency and Monetary

We rank customers based on Recency, Frequency and Monetary. Lower recency is better while higher frequency and monetary values are better. The rank() function assigns a rank to each customer.

Python
rfm_df['R_rank'] = rfm_df['Recency'].rank(ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)

9. Normalize the Ranks

We normalize the ranks to a scale of 0-100 to make them easier to compare. This makes the ranks more consistent across different customers and helps in calculating the final RFM score.

Python
rfm_df['R_rank_norm'] = (rfm_df['R_rank'] / rfm_df['R_rank'].max()) * 100
rfm_df['F_rank_norm'] = (rfm_df['F_rank'] / rfm_df['F_rank'].max()) * 100
rfm_df['M_rank_norm'] = (rfm_df['M_rank'] / rfm_df['M_rank'].max()) * 100

10. Drop Individual Ranks

Since we no longer need the individual ranks (R_rank, F_rank, M_rank) we drop them from the DataFrame to clean up the data.

Python
rfm_df.drop(columns=['R_rank', 'F_rank', 'M_rank'], inplace=True)
rfm_df.head()

Output:

6

Dataset After Dropping Individual Ranks

11. Calculate RFM Score

We calculate the RFM score by assigning different weights to Recency, Frequency and Monetary values. The weights are based on the business goals, with Monetary given the highest weight.

Python
rfm_df['RFM_Score'] = 0.15 * rfm_df['R_rank_norm'] + 0.28 * rfm_df['F_rank_norm'] + 0.57 * rfm_df['M_rank_norm']
rfm_df['RFM_Score'] *= 0.05
rfm_df = rfm_df.round(2)

12. Display RFM Score and CustomerID

Here, we display the CustomerID and RFM_Score for the first few customers to see the results.

Python
rfm_df[['CustomerID', 'RFM_Score']].head(7)

Output:

7

RFM_Score for first 7 Customers

13. Segment Customers Based on RFM Score

We classify customers into different segments based on their RFM scores. This helps to categorize them into groups like Top Customers, High Value Customers, etc.

Python
rfm_df["Customer_segment"] = np.where(rfm_df['RFM_Score'] > 4.5, "Top Customers",
                                       np.where(rfm_df['RFM_Score'] > 4, "High value Customer",
                                                np.where(rfm_df['RFM_Score'] > 3, "Medium Value Customer",
                                                         np.where(rfm_df['RFM_Score'] > 1.6, 'Low Value Customers', 'Lost Customers'))))

14. Display Customer Segments

In this step, we show the first 20 rows with CustomerID, RFM_Score and Customer_segment to see how customers have been grouped.

Python
rfm_df[['CustomerID', 'RFM_Score', 'Customer_segment']].head(20)

Output:

8

Displaying Customer Segments for 20 Customers

15. Visualize Customer Segments with a Pie Chart

Finally, we create a pie chart to visualize the distribution of customers across different segments. This helps in understanding how many customers belong to each segment.

Python
plt.pie(rfm_df.Customer_segment.value_counts(),
        labels=rfm_df.Customer_segment.value_counts().index,
        autopct='%.0f%%')
plt.show()

Output:

9

Customer Segmentation Visualization in a Pie Chart

With this simple technique business can gain insights of customer behaviour and can plan accordingly.

You can download the ipynb file for the above implementation here.



Practice Tags :

Similar Reads