The most successful companies today are the ones that know their customers so well that they can anticipate their needs. Data analysts play a key role in unlocking these in-depth insights, and segmenting the customers to better serve them.
This Repository contains indepth analysis of Online Retail Customers based in United Kingdom on three metrics,
- Recency
- Frequency
- Monetary Value
The data used to for this projects is available on following link, https://archive.ics.uci.edu/ml/datasets/online+retail
Recency(R): How recently a customer has made a purchase
Frequency(F): How often a customer makes a purchase
Monetary Value(M): How much money a customer spends on purchases
Dataset Attributes
- InvoiceNo
- StockCode
- Description
- Quantity
- InvoiceDate
- UnitPrice
- CustomerID
- Country
Sequence of Steps to be followed to Create and Analyze different Customer Segments
- Download Data from UCI Website, and load it in your Jupyter notebook.
- Understand and Explore the Data.
- Perform Data Wrangling if the data needs to be corrected.
- Create and calcluate cohort metrics based on CustomerID and Quantity.
- Calculate RFM metrics.
- Build RFM Segment and RFM Score based on RFM Metrics.
- Explore distribution of Recency, Frequency and Monetary Value.
- Pre-process Data using below steps or use StandardScaler from scikit library.
- Unskew the data -- Log Transformation
- Standardize to the same average values
- Scale to the separate standard deviation
- Store as a separate array to be used for clustering
- Visualize normalized data
- Explore Data and decide on the number of Clusters, Follow either methods to identify the number of clusters to be used, I have used the 1st Method to identify clusters
- Visual Methods - elbow criterion
- Plot the number of clusters against within-cluster sum of squared errors (SSE) - Sum of Squared distances from every data point to their cluster center. Identify the "elbow" in the plot.
- Elbow- a point representing an "optimal" number of clusters.
- Mathematical Methods - Silhouette Coefficient.
- Experimentation and Interpretation.
- Visual Methods - elbow criterion
- Lastly, Profile and Interpret Segments using Snake Plot and calculate the relative importance of Segment Attributes.
DataCamp Course https://www.datacamp.com/courses/customer-segmentation-in-python
Author: Khushal Singh Rajawat