Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database
Abstract
1. Introduction
Pattern Mining—A Road Map
- Item: An item is a single article in the dataset or the content of a data cell of the dataset.
- Transaction: A transaction is a set of items, usually a dataset row.
- K-Item Set: A K-Item Set is the itemset of size k, i.e., the set contains k items.
- Support: The support is the number of transactions containing an itemset (XUY).
- Frequent Set: This is the set of items satisfying the minimum support threshold value.
- Confidence: The confidence of a rule is defined as follows:
- The proposal of RQFP, a novel SQL-based frequent pattern mining algorithm that utilizes recursive queries over a Mining Table to avoid candidate generation and reduce memory overhead.
- The implementation of RQFP as a visual web-based tool for step-by-step algorithmic understanding, especially in educational and instructional contexts.
- A comparative performance analysis with Apriori and FP-Growth using an academic dataset and literature-based benchmarks, with a roadmap for direct benchmarking against vertical- and closed-itemset algorithms.
2. Related Work
2.1. Apriori-Based Algorithms
2.2. Partition-Based Algorithms
2.3. Depth-First Search (DFS) and Hybrid Algorithms
2.4. Incremental Update with Apriori-Based Algorithms
2.5. SQL-Based Algorithms
2.6. Pattern-Growth Algorithms
2.7. Frequent Pattern Mining Challenges
- There are numerous real-world situations where it is impossible to enumerate all potential subsets of a given pattern length.
- Many patterns meeting a low minimum support criterion are generated due to repeated pattern mining on vast amounts of data.
- The cost of creating candidate itemsets is high (large candidate sets). For example, it takes 2,100,103 candidates to find a typical size 100, a1, a2, …, a100.
- It becomes tiresome to keep scanning the database and comparing many possibilities if you are conducting lengthy pattern mining.
- Multimedia, geographical, temporal, and other data types may be stored in the database. One system will never be able to harvest all of these data types.
- Multiple LAN and wide-area network (WAN) data sources are accessible for use. Structured, semi-structured, or unstructured data sources are all possible. However, Data Mining is more difficult due to the need to mine their knowledge.
2.8. Apriori Algorithm
- A decreased frequency of scanning a transaction database;
- A reduced number of potential applicants;
- Support for the counting of candidates by making it easier to facilitate.
2.9. Frequent Pattern Growth
2.10. Theoretical Analysis
- Time Complexity
- n is the number of transactions.
- m is the average number of items per transaction.
- |I| is the number of frequent items.
- Mining Table Construction: This takes O(n⋅m) time for a single scan to collect support counts and reorder transactions based on the list L.
- Recursive Query Processing: In the worst case (where all items are frequent and appear in all combinations), the number of recursive calls could reach , akin to Apriori. However, in practice, early pruning based on the support threshold reduces this significantly.
- Space Complexity
- Memory Storage: The algorithm avoids loading the entire dataset into memory. The Mining Table is disk-resident and indexed, which means main memory is used only for temporary recursive call states and result aggregation.
- Worst-case space usage (for recursive state and caching where d is the maximum depth of recursion) is practically bound.
- Compared to FP-Growth’s in-memory FP-tree (n⋅m) and Apriori’s candidate generation space , RQFP offers significant savings, particularly on systems with limited memory and efficient SQL execution engines.
3. Research Methodology
3.1. Formal Definition of RQFP Algorithm
- Mining Table Construction: This involves generating a frequency-based list of items (List L) and organizing the database transactions accordingly:Let I = { a1, a2, …, am } be a set of items to be stored in a transaction database DB = [T1, T2, …, Tn], where Ti (i ϵ [1…n]) in Fields F1, F2, …, Fm and Ti transaction contains a set of items in I.
- Recursive Pattern Mining: Using SQL recursion, the algorithm queries item combinations directly from the Mining Table, reducing memory consumption and pre-processing overhead.
3.2. Mining Table Design
- The Mining Table stores the frequent items of every transaction in some order, avoiding repeatedly scanning DB.
- Since only frequent items will play a role in the frequent pattern mining, it is necessary to perform one scan of DB to identify the set of frequent items and store them in list L.
- According to list L, if two transactions share some everyday items, the shared objects can be merged by running a query on that field and calculating the count for that item.
- If multiple transactions share an identical frequent itemset, they can be merged into one with its frequency registered in a separate field Fm+1, also known as Fsup.
3.3. Algorithmic Workflow
Algorithm 1 Mining Table Creation |
Input: A transaction database DB. |
Output: List L, Mining Table. |
Method: The Mining Table is constructed in the following steps. |
|
*1 This step is optional. |
3.4. Illustrative Example
Algorithm 2 Recursive Queried Frequent Patterns Algorithm |
Input: Mining Table created by Algorithm 1, Minimum support threshold St. |
Output: The complete set of frequent patterns. |
Method: |
Var FP = null; |
Var Result1 [item] [support] = “SELECT DISTINCT [F1], SUM (Fm+1 or F1) FROM MiningTable WHERE [F1] = Li GROUP BY [F1]”; *1 |
While (Result [item] ≠ NULL) |
{ |
Mining_procedure (Result, F1, St) |
} |
Mining_procedure (I[item] [support], Fi, St) |
{ |
If (I [Support] > St) |
{ |
FP = FP+I [item] [Support]; |
} |
Else If ((last field Fm) || (I [Support] < St)) |
{ |
Return Combinations (FP); *2 |
} |
Else |
{ |
Result [item] [support] = “SELECT DISTINCT [Fi+1], SUM (Fm+1 or Fi+1) FROM MiningTable WHERE [Fi] = Ii GROUP BY [Fi+1]”; |
While (Result [item] ≠ NULL) |
{ |
Mining_procedure (Result, Fi+1, St) |
} } } |
*1 If step 4 of Algorithm 1 is executed, then it is necessary to use SUM (Fm+1) else SUM (F1). *2 Combinations take the FP Set (Frequent Patterns Set) as input and return combinations of items traversed so far. Algorithm 3 defines the process of updating the mining table. |
Algorithm 3 Updating Mining Table |
Input: Mining Table created by Algorithm 1, Itemset Is or Tn+1. |
Output: Updated Mining Table, updated list L. |
Method: |
Update (Mining_Table, Is) |
{ |
INSERT INTO Mining_Table values IS into Fm. |
Loop (I0…S) |
{ |
Update item Support value of list L. |
} |
If (L changed) |
{ |
Sort L. |
Swap Changed items. |
}} |
3.5. Theoretical Foundation and Complexity Analysis
3.5.1. Formal Definitions
- DB = {T1, T2, …, Tn} and is a transactional database, where each transaction Ti is a subset of a universal itemset I = {i1, i2, …, im}.
- List L is a frequency-ordered list of frequent items (i.e., items satisfying the minimum support threshold σ).
- A Mining Table is a disk-resident structure that stores transactions using only frequent items, sorted in a descending order based on List L.
- F0…Fk represent positional fields in the Mining Table corresponding to item positions within a transaction.
- FP denotes the set of frequent item combinations generated recursively.
- σ is the user-defined minimum support threshold.
- d is the maximum recursion depth (equal to the maximum transaction length among frequent items).
3.5.2. Correctness and Downward Closure
3.5.3. Time Complexity
- n is the number of transactions in the database.
- m is the average number of items per transaction.
- |I| is the number of frequent items after pruning with σ.
- Scan DB once to compute support counts → O(n × m).
- Construct List L and reorder transactions accordingly → O(n × log m).
- In the worst case (where all items are frequent), the number of recursive branches is exponential: O(2^|I|). However, early pruning drastically limits actual expansion.
- Each recursive step performs an indexed SQL SELECT DISTINCT … GROUP BY operation on the Mining Table, bounded by O(n).
3.5.4. Space Complexity
- The main memory is used only for recursive call states and result aggregation.
- The Mining Table remains disk-resident and can be indexed for efficient access.
- Recursive call stack: O(d), where d is the depth of recursion.
- Disk storage: O(n × m) for the Mining Table (shared with transactional data).
3.5.5. Advantages over Existing Approaches
4. Experiments, Results, and Discussions
4.1. Dataset Description
4.2. System Configuration and Software Environment
4.3. Execution and Results
4.4. Comparative Analysis
5. Conclusions
6. Limitations and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sai Prasad, D.S.P.K. A Theoretical Review on Data Mining and Machine Learning Techniques for Data Analysis. Int. J. Adv. Sci. Technol. 2020, 29, 1220–1226. [Google Scholar]
- Chakraborty, M.; Biswas, S.K.; Purkayastha, B. Data Mining Using Neural Networks in the form of Classification Rules: A Review. In Proceedings of the 2020 4th International Conference on Computational Intelligence and Networks (CINE), Kolkata, India, 27–29 February 2020; pp. 1–6. [Google Scholar]
- de Sousa, L.R.; de Carvalho, V.O.; Penteado, B.E.; Affonso, F.J. A Systematic Mapping on the Use of Data Mining for the Face-to-Face School Dropout Problem. In Proceedings of the 13th International Conference on Computer Supported Education (CSEDU 2021), Online Streaming, 23–25 April 2021; Volume 1, pp. 36–47. [Google Scholar]
- Dridi, A.; Gaber, M.M.; Azad, R.M.A.; Bhogal, J. Scholarly data mining: A systematic review of its applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2021, 11, e1395. [Google Scholar] [CrossRef]
- Li, L.; Ding, P.; Chen, H.; Wu, X. Frequent Pattern Mining in Big Social Graphs. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 638–648. [Google Scholar] [CrossRef]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; pp. 207–216. [Google Scholar]
- Almoqbily, R.S.; Rauf, A.; Quradaa, F.H. A Survey of Correlated High Utility Pattern Mining. IEEE Access 2021, 9, 42786–42800. [Google Scholar] [CrossRef]
- Jazayeri, A.; Yang, C.C. Frequent Pattern Mining in Continuous-time Temporal Networks. arXiv 2021, arXiv:2105.06399. [Google Scholar] [CrossRef] [PubMed]
- Djenouri, Y.; Lin, J.C.-W.; Nørvåg, K.; Ramampiaro, H.; Yu, P.S. Exploring decomposition for solving pattern mining problems. ACM Trans. Manag. Inf. Syst. (TMIS) 2021, 12, 1–36. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M.; Pei, J. Data mining trends and research frontiers. In Data Mining; University of Illinois at Urbana–Champaign: Urbana, IL, USA, 2012; pp. 585–631. [Google Scholar]
- Chi, Y.; Wang, H.; Yu, P.S.; Muntz, R.R. Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, 1–4 November 2004; pp. 59–66. [Google Scholar]
- Fard, M.J.S.; Namin, P.A. Review of Apriori based Frequent Itemset Mining Solutions on Big Data. In Proceedings of the 2020 6th International Conference on Web Research (ICWR), Tehran, Iran, 22–23 April 2020; pp. 157–164. [Google Scholar]
- Liu, C.; Li, X. Mining Method Based on Semantic Trajectory Frequent Pattern. In Advanced Information Networking and Applications, Proceedings of the International Conference on Advanced Information Networking and Applications, Toronto, ON, Canada, 12–14 May 2021; Springer: Cham, Switzerland, 2021; pp. 146–159. [Google Scholar]
- Sornalakshmi, M.; Balamurali, S.; Venkatesulu, M.; Krishnan, M.N.; Ramasamy, L.K.; Kadry, S.; Lim, S. An efficient apriori algorithm for frequent pattern mining using mapreduce in healthcare data. Bull. Electr. Eng. Inform. 2021, 10, 390–403. [Google Scholar] [CrossRef]
- Han, J.; Fu, Y. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st VLDB Conference, Zurich, Swizerland, 11–15 September 1995; Volume 95, pp. 420–431. [Google Scholar]
- Han, J.; Pei, J.; Dong, G.; Wang, K. Efficient computation of iceberg cubes with complex measures. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, 21–24 May 2001; pp. 1–12. [Google Scholar]
- Clustering, P.-B. Grouping of Questions From a Question Bank Using Partition-Based Clustering. In Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities; Engineering Science Reference: Hershey, PA, USA, 2021. [Google Scholar]
- Han, J.; Pei, J.; Yin, Y. Mining frequent patterns without candidate generation. ACM Sigmod Rec. 2000, 29, 1–12. [Google Scholar] [CrossRef]
- Chormunge, S.; Mehta, R. Comparison Analysis of Extracting Frequent Itemsets Algorithms Using MapReduce. In Intelligent Data Communication Technologies and Internet of Things, Proceedings of the ICICI 2020, Coimbatore, India, 27–28 August 2020; Springer: Singapore, 2021; pp. 199–210. [Google Scholar]
- Agrawal, R.; Shafer, J.C. Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 1996, 8, 962–969. [Google Scholar] [CrossRef]
- Han, J.; Dong, G.; Yin, Y. Efficient mining of partial periodic patterns in time series database. In Proceedings of the 15th International Conference on Data Engineering (Cat. No. 99CB36337), Sydney, NSW, Australia, 23–26 March 1999; pp. 106–115. [Google Scholar]
- Thurachon, W.; Kreesuradej, W. Incremental Association Rule Mining With a Fast Incremental Updating Frequent Pattern Growth Algorithm. IEEE Access 2021, 9, 55726–55741. [Google Scholar] [CrossRef]
- Cong, S.; Han, J.; Padua, D. Parallel mining of closed sequential patterns. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21–24 August 2005; pp. 562–567. [Google Scholar]
- Sun, J.; Xun, Y.; Zhang, J.; Li, J. Incremental frequent itemsets mining with FCFP tree. IEEE Access 2019, 7, 136511–136524. [Google Scholar] [CrossRef]
- Zaki, M.J. Efficient enumeration of frequent sequences. In Proceedings of the Seventh International Conference on Information and Knowledge Management, Bethesda, MD, USA, 3–7 November 1998; pp. 68–75. [Google Scholar]
- Bernal, J.N.; Rodriguez, J.P.; Portella, J. DBMS and Oracle Datamining. Preprints 2021. [Google Scholar] [CrossRef]
- Teng, Y. A Critical Review of SQL-Based Mining Relational Database. Int. J. Comput. Commun. Eng. 2021, 10, 68–74. [Google Scholar] [CrossRef]
- Nasyuha, A.H.; Jama, J.; Abdullah, R.; Syahra, Y.; Azhar, Z.; Hutagalung, J.; Hasugian, B.S. Frequent pattern growth algorithm for maximizing display items. Telkomnika 2021, 19, 390–396. [Google Scholar] [CrossRef]
- Afrati, F.; Gionis, A.; Mannila, H. Approximating a collection of frequent sets. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 12–19. [Google Scholar]
- Agarwal, R.C.; Aggarwal, C.C.; Prasad, V.V.V. A tree projection algorithm for generation of frequent item sets. J. Parallel Distrib. Comput. 2001, 61, 350–371. [Google Scholar] [CrossRef]
- Aggarwal, C.C.; Yu, P.S. A new framework for itemset generation. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Seattle, WA, USA, 1–4 June 1998; pp. 18–24. [Google Scholar]
- Beil, F.; Ester, M.; Xu, X. Frequent term-based text clustering. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 436–442. [Google Scholar]
- Agrawal, R.; Srikant, R. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 6–10 March 1995; pp. 3–14. [Google Scholar]
Feature | Apriori | FP-Growth | RQFP |
---|---|---|---|
Candidate Generation | Yes | No | No |
Memory Requirement | High | Medium | Low (disk-resident) |
Number of DB Scans | Multiple | 2 | 1–2 |
Tree Construction | No | Yes | No |
SQL Native Execution | No | No | Yes (Standard SQL-99) |
Sr. No. | Unique Number | SUB1 | SUB2 | SUB3 | SUB4 |
---|---|---|---|---|---|
1 | 2021-P-01 | GE | CH | BO | ZO |
2 | 2021-P-02 | GE | CH | BO | ZO |
3 | 2021-P-03 | GE | CH | BO | ZO |
4 | 2021-P-04 | GE | CH | BO | ZO |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
431 | 2021-P-431 | GE | PH | MA | CH |
432 | 2021-P-432 | GE | PH | MA | CH |
433 | 2021-P-433 | GE | PH | MA | CH |
434 | 2021-P-434 | GE | PH | MA | CH |
Algorithm → | Apriori Algorithm | FP-Growth | RQFP Algorithm |
---|---|---|---|
Parameters ↓ | |||
Technique | Apriori Property | FP-Tree Generation | Disc Resident Table |
Search Strategy | Breadth-First Search | Divide and Conquer | Divide and Conquer |
Space Utilization | O(2d) Where d is the horizontal width of DB | O(n2) where n is the item from the database | O(n) where n is the item from the database |
Number of Scans | 1 for each candidate | 2 | 2 |
Execution Time | 00:00:06.7341631 | 00:00:03.7641633 | 00:00:03.6337641 |
Dataset | Algorithm | Support (%) | Patterns Found | Exec. Time (s) | Source/Environment |
---|---|---|---|---|---|
Academic (434) | RQFP | 75 | 12 | 3.63 | Current Study (SQL Server 2012, C#, 4GB RAM) |
Academic (434) | Apriori | 75 | 12 | 6.73 | Current Study |
Academic (434) | FP-Growth | 75 | 12 | 3.76 | Current Study |
Retail (Dunnhumby) | Apriori | 0.5 | 3205 | 18.23 | [18] |
Retail | FP-Growth | 0.5 | 3205 | 10.87 | [18] |
Chess | ECLAT | 60 | 1689 | 8.2 | [25] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, I.A.; Chen, H.-Y.; Sharma, S.; Sharma, C. Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database. Information 2025, 16, 746. https://doi.org/10.3390/info16090746
Khan IA, Chen H-Y, Sharma S, Sharma C. Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database. Information. 2025; 16(9):746. https://doi.org/10.3390/info16090746
Chicago/Turabian StyleKhan, Ishtiyaq Ahmad, Hsin-Yuan Chen, Shamneesh Sharma, and Chetan Sharma. 2025. "Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database" Information 16, no. 9: 746. https://doi.org/10.3390/info16090746
APA StyleKhan, I. A., Chen, H.-Y., Sharma, S., & Sharma, C. (2025). Recursive Queried Frequent Patterns Algorithm: Determining Frequent Pattern Sets from Database. Information, 16(9), 746. https://doi.org/10.3390/info16090746