# An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Methodology

#### 3.1. RFM

#### 3.2. Principal Component Analysis

#### 3.3. Clustering Techniques

#### 3.3.1. K-Means Algorithm

- $N$: number of data points.
- $K$: number of clusters.
- ${x}_{i}$: ith data point.
- ${\mu}_{k}$: kth cluster centroid.
- ${r}_{ik}$: indicator variable that is returned as 1 if data point ${x}_{i}$ belongs to cluster k and 0 otherwise.

#### 3.3.2. Gaussian Mixture Model (GMM) Algorithm

- ${\pi}_{k}$: proportion of data points in each cluster.
- ${\mu}_{k}$: mean vectors for each distribution.
- ${\epsilon}_{k}$: matrices that represent the shape and orientation of each distribution.

#### 3.3.3. DBSCAN Algorithm

#### 3.3.4. BIRCH Algorithm

- N: number of data points in the cluster.
- $LSUM$: sum of data points in the cluster.

#### 3.3.5. Agglomerative Algorithm

#### 3.4. Dataset

- InvoiceNo: Invoice number. A nominal, six-digit integral number uniquely assigned to each transaction. If this code starts with the letter ‘c’, it indicates a cancellation.
- StockCode: Product (item) code. A nominal, five-digit integral number uniquely assigned to each distinct product.
- Description: Product (item) name. Nominal.
- Quantity: The quantities of each product (item) per transaction. Numeric.
- InvoiceDate: Invoice date and time. Numeric, the day and time at which each transaction was generated.
- UnitPrice: Unit price. Numeric. Product price per unit in sterling.
- CustomerID: Customer number. A nominal, five-digit integral number uniquely assigned to each customer.
- Country: Country name. Nominal. The name of the country where each customer resides.

#### 3.5. Experimental Set Up

#### 3.6. Data Pre-Processing

#### 3.7. Model Evaluation

- $s\left(i\right)$ is the Silhouette Score for the data point.
- $a\left(i\right)$ is the average distance between i and other data points in the same cluster.
- $b\left(i\right)$ is the smallest average distance between i and data points in different clusters.

## 4. Result

#### 4.1. RFM Results

- $RecencyWeight$ reflects how important recency is in the analysis.
- $FrequencyWeight$ reflects how important frequency is in the analysis.
- $MonetaryWeight$ reflects how important monetary value is in the analysis.

#### 4.2. Clustering Results

#### 4.2.1. K-Means Clustering

#### 4.2.2. Gaussian Mixture Model

#### 4.2.3. DBSCAN Clustering Algorithm

#### 4.2.4. BIRCH Algorithm

#### 4.2.5. Agglomerative Clustering

#### 4.3. Model Performance Evaluation

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Lekhwar, S.; Yadav, S.; Singh, A. Big data analytics in retail. In Information and Communication Technology for Intelligent Systems: Proceedings of ICTIS 2018, Volume 2; Springer: Singapore, 2019; pp. 469–477. [Google Scholar]
- Gwozdz, W.; Steensen Nielsen, K.; Müller, T. An environmental perspective on clothing consumption: Consumer segments and their behavioral patterns. Sustainability
**2017**, 9, 762. [Google Scholar] [CrossRef] - An, J.; Kwak, H.; Jung, S.G.; Salminen, J.; Jansen, B.J. Customer segmentation using online platforms: Isolating behavioral and demographic segments for persona creation via aggregated user data. Soc. Netw. Anal. Min.
**2018**, 8, 54. [Google Scholar] [CrossRef] - Fotaki, G.; Spruit, M.; Brinkkemper, S.; Meijer, D. Exploring big data opportunities for online customer segmentation. Int. J. Bus. Intell. Res. (IJBIR)
**2014**, 5, 58–75. [Google Scholar] [CrossRef] - Hicham, N.; Karim, S. Analysis of Unsupervised Machine Learning Techniques for an Efficient Customer Segmentation using Clustering Ensemble and Spectral Clustering. Int. J. Adv. Comput. Sci. Appl.
**2022**, 13, 122–130. [Google Scholar] [CrossRef] - Turkmen, B. Customer Segmentation with Machine Learning for Online Retail Industry. Eur. J. Soc. Behav. Sci.
**2022**, 31, 111–136. [Google Scholar] [CrossRef] - Ramanathan, U.; Subramanian, N.; Yu, W.; Vijaygopal, R. Impact of customer loyalty and service operations on customer behaviour and firm performance: Empirical evidence from UK retail sector. Prod. Plan. Control
**2017**, 28, 478–488. [Google Scholar] [CrossRef] - Li, C.; Chen, Y.; Shang, Y. A review of industrial big data for decision making in intelligent manufacturing. Eng. Sci. Technol. Int. J.
**2022**, 29, 101021. [Google Scholar] [CrossRef] - Arunachalam, D.; Kumar, N. Benefit-based consumer segmentation and performance evaluation of clustering approaches: An evidence of data-driven decision-making. Expert Syst. Appl.
**2018**, 111, 11–34. [Google Scholar] [CrossRef] - Oussous, A.; Benjelloun, F.Z.; Lahcen, A.A.; Belfkih, S. Big Data technologies: A survey. J. King Saud Univ.-Comput. Inf. Sci.
**2018**, 30, 431–448. [Google Scholar] [CrossRef] - Jin, D.H.; Kim, H.J. Integrated understanding of big data, big data analysis, and business intelligence: A case study of logistics. Sustainability
**2018**, 10, 3778. [Google Scholar] [CrossRef] - Jayakrishnan, M.; Mohamad, A.K.; Yusof, M.M. Understanding big data analytics (BDA) and business intelligence (BI) towards establishing organisational performance diagnostics framework. Int. J. Recent Technol. Eng.
**2019**, 8, 128–132. [Google Scholar] - Mathew, A.; Scholar, P.G.; Jobin, T.J. Role of Big Data Analysis and Machine Learning in Ecommerce-Customer Segmentation. In Proceedings of the National Conference on Emerging Computer Applications (NCECA), Online, 17 June 2021; p. 189. [Google Scholar]
- Seyedan, M.; Mafakheri, F. Predictive big data analytics for supply chain demand forecasting: Methods, applications, and research opportunities. J. Big Data
**2020**, 7, 1–22. [Google Scholar] [CrossRef] - Ushakova, A.; Mikhaylov, S.J. Big data to the rescue? Challenges in analysing granular household electricity consumption in the United Kingdom. Energy Res. Soc. Sci.
**2020**, 64, 101428. [Google Scholar] [CrossRef] - Fontanini, A.D.; Abreu, J. A data-driven BIRCH clustering method for extracting typical load profiles for big data. In Proceedings of the 2018 IEEE Power & Energy Society General Meeting (PESGM), Portland, OR, USA, 5–10 August 2018; pp. 1–5. [Google Scholar]
- Lorbeer, B.; Kosareva, A.; Deva, B.; Softić, D.; Ruppel, P.; Küpper, A. Variations on the clustering algorithm BIRCH. Big Data Res.
**2018**, 11, 44–53. [Google Scholar] [CrossRef] - Firdaus, U.; Utama, D. Development of bank’s customer segmentation model based on rfm+ b approach. Int. J. Innov. Comput. Inf. Cont.
**2021**, 12, 17–26. [Google Scholar] - Hossain, A.S. Customer segmentation using centroid based and density based clustering algorithms. In Proceedings of the 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 7–9 December 2017; pp. 1–6. [Google Scholar]
- Punhani, R.; Arora, V.S.; Sabitha, S.; Shukla, V.K. Application of clustering algorithm for effective customer segmentation in E-commerce. In Proceedings of the 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dubai, United Arab Emirates, 17–18 March 2021; pp. 149–154. [Google Scholar]
- Diamantaras, K.I.; Kung, S.Y. Principal Component Neural Networks: Theory and Applications; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1996. [Google Scholar]
- Weingessel, A.; Hornik, K. Local PCA algorithms. IEEE Trans. Neural Netw.
**2000**, 11, 1242–1250. [Google Scholar] [PubMed] - Ogunleye, B.; Maswera, T.; Hirsch, L.; Gaudoin, J.; Brunsdon, T. Comparison of topic modelling approaches in the banking context. Appl. Sci.
**2023**, 13, 797. [Google Scholar] [CrossRef] - Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory
**1982**, 28, 129–137. [Google Scholar] [CrossRef] - Zhang, Y.; Li, M.; Wang, S.; Dai, S.; Luo, L.; Zhu, E.; Xu, H.; Zhu, X.; Yao, C.; Zhou, H. Gaussian mixture model clustering with incomplete data. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)
**2021**, 17, 1–14. [Google Scholar] [CrossRef] - Müllner, D. Modern hierarchical, agglomerative clustering algorithms. arXiv
**2011**, arXiv:1109.2378. [Google Scholar] - Shirole, R.; Salokhe, L.; Jadhav, S. Customer segmentation using rfm model and k-means clustering. Int. J. Sci. Res. Sci. Technol.
**2021**, 8, 591–597. [Google Scholar] [CrossRef] - Kansal, T.; Bahuguna, S.; Singh, V.; Choudhury, T. December Customer segmentation using K-means clustering. In Proceedings of the 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 21–22 December 2018; pp. 135–139. [Google Scholar]

Invoice No | Invoice Counts | Invoice Price (GBP) |
---|---|---|

536365 | 7 | 139.122 |

536366 | 2 | 22.20 |

536367 | 12 | 278.73 |

536368 | 4 | 70.05 |

536369 | 1 | 17.85 |

Customer ID | Recency | Frequency | Monetary (GBP) |
---|---|---|---|

12346 | 325 | 1 | 77,183.60 |

12347 | 1 | 182 | 4310.00 |

12348 | 74 | 31 | 1797.24 |

12349 | 18 | 73 | 1757.55 |

12350 | 309 | 17 | 334.40 |

Customer ID | RFM_Score |
---|---|

12346 | 0.06 |

12347 | 4.48 |

12348 | 2.09 |

12349 | 3.41 |

12350 | 1.10 |

Customer ID | RFM_Score | Customer_Segment |
---|---|---|

12346 | 0.06 | Lost customer |

12347 | 4.48 | High-value customer |

12348 | 2.09 | Low-value customer |

12349 | 3.41 | Medium-value customer |

12350 | 1.10 | Lost customer |

12352 | 3.46 | Medium-value customer |

12353 | 0.03 | Lost customer |

12354 | 2.67 | Low-value customer |

12355 | 0.93 | Lost customer |

12356 | 3.11 | Medium-value customer |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

John, J.M.; Shobayo, O.; Ogunleye, B.
An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market. *Analytics* **2023**, *2*, 809-823.
https://doi.org/10.3390/analytics2040042

**AMA Style**

John JM, Shobayo O, Ogunleye B.
An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market. *Analytics*. 2023; 2(4):809-823.
https://doi.org/10.3390/analytics2040042

**Chicago/Turabian Style**

John, Jeen Mary, Olamilekan Shobayo, and Bayode Ogunleye.
2023. "An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market" *Analytics* 2, no. 4: 809-823.
https://doi.org/10.3390/analytics2040042