Alternative Support Threshold Computation for Market Basket Analysis
Abstract
:1. Introduction
- Data scientists, experts in algorithms and programming, but lacking domain-specific knowledge related to the MBA application;
- Managers, experts in their proper applicative domain, but (possibly) lacking specific competencies in designing and interpreting association rule mining algorithms.
1.1. Definitions
- The support of an itemset B is the number of occurrences of the itemset B in ; measures the empirical probability of the occurrence of B in , where m denotes the total number of transactions in . Considering that any rule is univocally associated with the itemset , its support and empirical probability of occurrence, respectively, denoted as and , are equal to and .
- The confidence of r is the ratio between and the number of occurrences of A in :It is a measure of the reliability of r and represents how often the consequent occurs if the antecedent is verified.
- The lift of r is the ratio between the empirical probability that r occurs in and its expected value in case A and C were independent. The probability of occurrence for an itemset in is expressed as the ratio between its support and the total number of transactions m, then:
1.2. Related Work
- Candidate generation: the output of this step is the candidate set . is composed of all the items in ; with is composed of all the itemsets of cardinality k such that, if any item is removed from them, the resulting itemset belongs to .
- Pruning: the support of each itemset in is computed. If it is at least equal to t, it is included in .
2. Materials and Methods
3. Results
- The groceries dataset [28], comprising 9835 transactions and 169 different categories of items;
- The Mondrian foodmart sales dataset (https://github.com/kijiproject/kiji-modeling/tree/master/kiji-modeling-examples/src/main/datasets/foodmart, accessed on 15 January 2020) including 7824 transactions (which are assumed to be identified by their customer id) and 1559 different items;
- The retail dataset [6], comprising 88,162 transactions, with 16,470 different items;
- The online retail dataset [29], comprising 25,900 transactions, with 4194 different item descriptions;
- Real, anonymous (no detailed data will be commented, only aggregate results) data gathered by one of the largest Italian supermarket chains; this dataset is composed of 552,626 transactions, involving 51,892 different items. The transactions refer to thousands of different customers, with an average basket size of items (and a standard deviation in the basket size of items). At least one of the 100 most frequent items is included in of the transactions.
- A new test is included in group 1, setting / as minimum item support: rules will be generated;
- A new test is included in group 2, taking items into account: rules will be generated.
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kumar, P.; Manisha, K.; Nivetha, M. Market Basket Analysis for Retail Sales Optimization. In Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 22–23 February 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Record, ACM, Washington, DC, USA, 26–28 May 1993; Volume 22, pp. 207–216. [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago de Chile, Chile, 12–15 September 1994; Volume 1215, pp. 487–499. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD Record, ACM, Dallas, TX, USA, 16–18 May 2000; Volume 29, pp. 1–12. [Google Scholar]
- Zaki, M.J.; Parthasarathy, S.; Ogihara, M.; Li, W. Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov. 1997, 1, 343–373. [Google Scholar] [CrossRef]
- Brijs, T. Retail market basket data set. In Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, USA, 19 November 2003. [Google Scholar]
- Omol, E.J.; Onyango, D.A.; Mburu, L.W.; Abuonji, P.A. Apriori algorithm and market basket analysis to uncover consumer buying patterns: Case of a Kenyan supermarket. Buana Inf. Technol. Comput. Sci. (BIT CS) 2024, 5, 51–63. [Google Scholar] [CrossRef]
- Wahidi, N.; Ismailova, R. A market basket analysis of seven retail branches in Kyrgyzstan using an Apriori algorithm. Int. J. Bus. Intell. Data Min. 2025, 26, 236–255. [Google Scholar] [CrossRef]
- Alavi, F.; Hashemi, S. DFP-SEPSF: A dynamic frequent pattern tree to mine strong emerging patterns in streamwise features. Eng. Appl. Artif. Intell. 2015, 37, 54–70. [Google Scholar] [CrossRef]
- Alcan, D.; Ozdemir, K.; Ozkan, B.; Mucan, A.Y.; Ozcan, T. A comparative analysis of apriori and fp-growth algorithms for market basket analysis using multi-level association rule mining. In Global Joint Conference On Industrial Engineering And Its Application Areas; Springer: Cham, Switzerland, 2022; pp. 128–137. [Google Scholar]
- Park, J.S.; Yu, P.S.; Chen, M.S. Mining association rules with adjustable accuracy. In Proceedings of the Sixth International Conference on Information and Knowledge Management, ACM, Las Vegas, NV, USA, 10–14 November 1997; pp. 151–160. [Google Scholar]
- Tseng, M.C.; Lin, W.Y. Efficient mining of generalized association rules with non-uniform minimum support. Data Knowl. Eng. 2007, 62, 41–64. [Google Scholar] [CrossRef]
- Vo, B.; Le, B. Fast algorithm for mining generalized association rules. Int. J. Database Theory Appl. 2009, 2, 1–12. [Google Scholar]
- Baralis, E.; Cagliero, L.; Cerquitelli, T.; D’Elia, V.; Garza, P. Support driven opportunistic aggregation for generalized itemset extraction. In Proceedings of the Intelligent Systems (IS), 2010 5th IEEE International Conference, London, UK, 7–9 July 2010; IEEE: New York, NY, USA, 2010; pp. 102–107. [Google Scholar]
- Hu, Y.H.; Chen, Y.L. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Support Syst. 2006, 42, 1–24. [Google Scholar] [CrossRef] [PubMed]
- Kuo, R.; Shih, C. Association rule mining through the ant colony system for National Health Insurance Research Database in Taiwan. Comput. Math. Appl. 2007, 54, 1303–1318. [Google Scholar] [CrossRef]
- Dorigo, M.; Birattari, M. Ant colony optimization. In Encyclopedia of Machine Learning; Springer: Cham, Switzerland, 2010; pp. 36–39. [Google Scholar]
- Kuo, R.J.; Chao, C.M.; Chiu, Y. Application of particle swarm optimization to association rule mining. Appl. Soft Comput. 2011, 11, 326–336. [Google Scholar] [CrossRef]
- Kennedy, J. Particle swarm optimization. In Encyclopedia of Machine Learning; Springer: Cham, Switzerland, 2010; pp. 760–766. [Google Scholar]
- Yun, H.; Ha, D.; Hwang, B.; Ho Ryu, K. Mining association rules on significant rare data using relative support. J. Syst. Softw. 2003, 67, 181–191. [Google Scholar] [CrossRef]
- Salam, A.; Khayal, M.S.H. Mining top- k frequent patterns without minimum support threshold. Knowl. Inf. Syst. 2012, 30, 57–86. [Google Scholar] [CrossRef]
- Sadhasivam, K.S.; Angamuthu, T. Mining rare itemset with automated support thresholds. J. Comput. Sci. 2011, 7, 394. [Google Scholar] [CrossRef]
- Kirsch, A.; Mitzenmacher, M.; Pietracaprina, A.; Pucci, G.; Upfal, E.; Vandin, F. An efficient rigorous approach for identifying statistically significant frequent itemsets. J. ACM (JACM) 2012, 59, 12. [Google Scholar] [CrossRef]
- Tong, Y.; Chen, L.; Yu, P.S. Ufimt: An uncertain frequent itemset mining toolbox. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, China, 12–16 August 2012; pp. 1508–1511. [Google Scholar]
- Vats, S.; Sharma, V.; Bajaj, M.; Singh, S.; Sagar, B. Advanced frequent itemset mining algorithm (AFIM). In Uncertainty in Computational Intelligence-Based Decision Making; Elsevier: Amsterdam, The Netherlands, 2025; pp. 187–201. [Google Scholar]
- Sadeequllah, M.; Rauf, A.; Rehman, S.U.; Alnazzawi, N. Probabilistic support prediction: Fast frequent itemset mining in dense data. IEEE Access 2024, 12, 39330–39350. [Google Scholar] [CrossRef]
- Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: Hoboken, NJ, USA, 2008; Volume 2. [Google Scholar]
- Hahsler, M.; Hornik, K.; Reutterer, T. Implications of probabilistic data modeling for mining association rules. In From Data and Information Analysis to Knowledge Engineering; Springer: Cham, Switzerland, 2006; pp. 598–605. [Google Scholar]
- Chen, D.; Sain, S.L.; Guo, K. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. J. Database Mark. Cust. Strategy Manag. 2012, 19, 197–208. [Google Scholar] [CrossRef]
Dataset | |||
---|---|---|---|
Groceries | 1 | 1 | 1 |
Groceries | 2 | 9.38 | 3.83 |
Groceries | 3 | 19.27 | 9.38 |
Groceries | 4 | 33.44 | 17.37 |
Groceries | 5 | 39.64 | 26.75 |
Foodmart | 1 | 1 | 1 |
Foodmart | 2 | 2189.70 | 3.87 |
Foodmart | 3 | 2194.65 | 8.98 |
Foodmart | 4 | 2194.65 | 16.52 |
Foodmart | 5 | 2194.65 | 26.17 |
Retail | 1 | 1 | 1 |
Retail | 2 | 4.93 | 3.63 |
Retail | 3 | 26.38 | 8.35 |
Retail | 4 | 75.87 | 13.69 |
Retail | 5 | 155.25 | 21.39 |
Online Retail | 1 | 1 | 1 |
Online Retail | 2 | 148.00 | 3.91 |
Online Retail | 3 | 812.53 | 9.16 |
Online Retail | 4 | 2218.58 | 16.80 |
Online Retail | 5 | 3994.31 | 26.43 |
Supermarket Chain Data | 1 | 1 | 1 |
Supermarket Chain Data | 2 | 25.51 | 4.41 |
Supermarket Chain Data | 3 | 186.42 | 10.46 |
Supermarket Chain Data | 4 | 601.87 | 19.89 |
Supermarket Chain Data | 5 | 1344.78 | 31.77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Verda, D.; Muselli, M. Alternative Support Threshold Computation for Market Basket Analysis. AppliedMath 2025, 5, 71. https://doi.org/10.3390/appliedmath5020071
Verda D, Muselli M. Alternative Support Threshold Computation for Market Basket Analysis. AppliedMath. 2025; 5(2):71. https://doi.org/10.3390/appliedmath5020071
Chicago/Turabian StyleVerda, Damiano, and Marco Muselli. 2025. "Alternative Support Threshold Computation for Market Basket Analysis" AppliedMath 5, no. 2: 71. https://doi.org/10.3390/appliedmath5020071
APA StyleVerda, D., & Muselli, M. (2025). Alternative Support Threshold Computation for Market Basket Analysis. AppliedMath, 5(2), 71. https://doi.org/10.3390/appliedmath5020071