An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns
Abstract
:1. Introduction
- The SPHUSPM provides a brand-new stability method and utilizes the time period information to a greater extent to mine more useful patterns. In the HUSPM research field, the algorithm provides a new research strategy. The addition of multiple methods also makes the mined patterns more interesting and more in line with user requirements. At the same time, in the field of practical application, the algorithm considers the maximum profit and the time period information at the same time, giving decision makers more accurate and efficient decision-making methods.
- We design a new data structure named PUL-list and a maximum stability pruning strategy in HUSPM (MLPS) to increase the effectiveness of mining. Experiments show that these two methods greatly improve the efficiency of the algorithm.
- We perform some experiments on six different datasets, which are guaranteed to be able to mine the desired SPHUSPs, while also showing excellent performance in operational efficiency and memory usage efficiency.
2. Related Work
2.1. High-Utility Sequential Pattern Mining
2.2. Periodic High-Utility Sequential Pattern Mining
2.3. Stable Periodic Frequent Pattern Mining
- In light of the above, we list the limitations of the previously proposed work.
- In the PHUSPM algorithm, it is difficult to set the threshold accurately. Some patterns have a few periodic fluctuations. However, this situation has little impact on the decision, and they are still useful patterns. If is set too small, these interesting patterns will be ignored. If is set too large, the mined patterns will have unstable periods.
- Because the SPFPM algorithm is designed specifically for FPM, it cannot be directly applied to HUSPM. In short, it did not take into account the order between itemsets and the utility values of the items.
3. Preliminaries and Problem Definitions
4. Proposed Algorithms
4.1. The Data Structure
4.1.1. Lexicographic Sequence Tree and Concatenations
4.1.2. The Period-Utility-Linked List Structure
4.2. Pruning Strategy
4.2.1. The Downward Closure Property of Upper Bound
4.2.2. Pruning Strategies
- (1)
- If is an I-concatenation candidate item of t andis less than the minimum utility threshold, should be removed from (the set of candidate items for I-concatenation with t).
- (2)
- If is a S-concatenation candidate item of t andis less than the minimum utility threshold, should be removed from (the set of candidate items for S-concatenation with t).
4.3. The SPHUSPM Algorithm
Algorithm 1: SPHUSPM |
Input: D, a quantitative sequential database; , a utility table containing the unit profit of each item; , the minimum utility threshold; , the maximum periods; , the maximum lability threshold. Output: The complete set of SPHUSPs. |
Algorithm 2: Judge-SPHUSPs |
Input: |
4.4. Total Computational Complexity
5. Experimental Evaluation
5.1. Datasets
5.2. Execution Time
5.3. Pattern Count
5.4. Memory Usage
5.5. Effectiveness of Pruning Strategies
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
PFPM | Periodic Frequency Pattern Mining |
PPFPM | Partial Periodic Frequency Pattern Mining |
SPFPM | Stable Periodic Frequent Pattern Mining |
HUIM | High-Utility Itemset Mining |
HUSPM | High-Utility Sequential Pattern Mining |
HUSPs | High Utility Sequential Patterns |
PHUSPM | Periodic High Utility Sequential Pattern Mining |
PHUSPs | Periodic High Utility Sequential Patterns |
SPHUSPM | Stable Periodic High Utility Sequential Pattern Mining |
SPHUSPs | Stable Periodic High Utility Sequential Patterns |
minimum utility threshold | |
maximum periodicity threshold | |
minimum periodicity threshold | |
average periodicity threshold | |
maximum lability threshold | |
UL-list | utility-linked-list |
PUL-list | period-utility-linked-list |
LS-tree | Lexicographic Sequence Tree |
SWU | Sequence Weighted Utilization |
PEU | Prefix Extension Utility |
RSU | Reduced Sequence Utility |
SEU | Sequence Extended Utility |
MPP | Maximum Periodic Pruning |
MLP | Maximum Lability Pruning |
MLPS | Maximum Lability Pruning in sequential pattern mining |
IPS | Irrelevant Item Pruning Strategy |
LAS | Look Ahead Strategy |
References
- Zhou, L.; Liu, Y.; Wang, J.; Shi, Y. Utility-based web path traversal pattern mining. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), Omaha, NE, USA, 28–31 October 2007; pp. 373–380. [Google Scholar]
- Truong-Chi, T.; Fournier-Viger, P. A survey of high utility sequential pattern mining. In High-Utility Pattern Mining; Springer: Berlin/Heidelberg, Germany, 2019; pp. 97–129. [Google Scholar]
- Yin, J.; Zheng, Z.; Cao, L. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 660–668. [Google Scholar]
- Wang, J.Z.; Huang, J.L.; Chen, Y.C. On efficiently mining high utility sequential patterns. Knowl. Inf. Syst. 2016, 49, 597–627. [Google Scholar] [CrossRef]
- Ishita, S.Z.; Ahmed, C.F.; Leung, C.K. New approaches for mining regular high utility sequential patterns. Appl. Intell. 2022, 52, 3781–3806. [Google Scholar] [CrossRef]
- Ahmed, C.F.; Tanbeer, S.K.; Jeong, B.S. A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases. ETRI J. 2010, 32, 676–686. [Google Scholar] [CrossRef]
- Yin, J.; Zheng, Z.; Cao, L.; Song, Y.; Wei, W. Efficiently mining top-k high utility sequential patterns. In Proceedings of the 2013 IEEE 13th international Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 1259–1264. [Google Scholar]
- Lan, G.C.; Hong, T.P.; Tseng, V.S.; Wang, S.L. Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst. Appl. 2014, 41, 5071–5081. [Google Scholar] [CrossRef]
- Alkan, O.K.; Karagoz, P. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Trans. Knowl. Data Eng. 2015, 27, 2645–2657. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.W.; Zhang, J.; Chao, H.C.; Fujita, H.; Philip, S.Y. ProUM: High utility sequential pattern mining. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 767–773. [Google Scholar]
- Gan, W.; Lin, J.C.W.; Zhang, J.; Fournier-Viger, P.; Chao, H.C.; Philip, S.Y. Fast utility mining on sequence data. IEEE Trans. Cybern. 2020, 51, 487–500. [Google Scholar] [CrossRef]
- Ahmed, C.F.; Tanbeer, S.K.; Jeong, B.S. Mining high utility web access sequences in dynamic web log data. In Proceedings of the 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, London, UK, 9–11 June 2010; pp. 76–81. [Google Scholar]
- Shie, B.E.; Yu, P.S.; Tseng, V.S. Mining interesting user behavior patterns in mobile commerce environments. Appl. Intell. 2013, 38, 418–435. [Google Scholar] [CrossRef]
- Zihayat, M.; Davoudi, H.; An, A. Top-k utility-based gene regulation sequential pattern discovery. In Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 266–273. [Google Scholar]
- Dinh, T.; Huynh, V.N.; Le, B. Mining periodic high utility sequential patterns. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Kanazawa, Japan, 3–5 April 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 545–555. [Google Scholar]
- Dinh, D.T.; Le, B.; Fournier-Viger, P.; Huynh, V.N. An efficient algorithm for mining periodic high-utility sequential patterns. Appl. Intell. 2018, 48, 4694–4714. [Google Scholar] [CrossRef]
- Afriyie, M.K.; Nofong, V.M.; Wondoh, J.; Abdel-Fatao, H. Mining non-redundant periodic frequent patterns. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Phuket, Thailand, 23–26 March 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 321–331. [Google Scholar]
- Amphawan, K.; Surarerks, A.; Lenca, P. Mining periodic-frequent itemsets with approximate periodicity using interval transaction-ids list tree. In Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Phuket, Thailand, 9–10 January 2010; pp. 245–248. [Google Scholar]
- Fournier-Viger, P.; Lin, C.W.; Duong, Q.H.; Dam, T.L.; Ševčík, L.; Uhrin, D.; Voznak, M. PFPM: Discovering periodic frequent patterns with novel periodicity measures. In Proceedings of the 2nd Czech-China Scientific Conference 2016, Ostrava, Czech Republic, 7 June 2016; IntechOpen: London, UK, 2017. [Google Scholar]
- Kiran, R.U.; Venkatesh, J.; Fournier-Viger, P.; Toyoda, M.; Reddy, P.K.; Kitsuregawa, M. Discovering periodic patterns in non-uniform temporal databases. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Chengdu, China, 16–19 May 2022; Springer: Berlin/Heidelberg, Germany, 2017; pp. 604–617. [Google Scholar]
- Fournier-Viger, P.; Yang, P.; Lin, J.C.W.; Kiran, R.U. Discovering stable periodic-frequent patterns in transactional data. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kitakyushu, Japan, 19–22 July; Springer: Berlin/Heidelberg, Germany, 2019; pp. 230–244. [Google Scholar]
- Fournier-Viger, P.; Wang, Y.; Yang, P.; Lin, J.C.W.; Yun, U.; Kiran, R.U. Tspin: Mining top-k stable periodic patterns. Appl. Intell. 2022, 52, 6917–6938. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.W.; Fournier-Viger, P.; Chao, H.C.; Hong, T.P.; Fujita, H. A survey of incremental high-utility itemset mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1242. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Wu, C.W.; Zida, S.; Tseng, V.S. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, Limassol, Cyprus, 29–31 October 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 83–92. [Google Scholar]
- Lin, C.W.; Hong, T.P.; Lu, W.H. An effective tree structure for mining high utility itemsets. Expert Syst. Appl. 2011, 38, 7419–7424. [Google Scholar] [CrossRef]
- Lin, Y.C.; Wu, C.W.; Tseng, V.S. Mining high utility itemsets in big data. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Chengdu, China, 16–19 May 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 649–661. [Google Scholar]
- Liu, M.; Qu, J. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; pp. 55–64. [Google Scholar]
- Yun, U.; Ryang, H.; Ryu, K.H. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst. Appl. 2014, 41, 3861–3878. [Google Scholar] [CrossRef]
- Zida, S.; Fournier-Viger, P.; Lin, J.C.W.; Wu, C.W.; Tseng, V.S. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of the Mexican International Conference on Artificial Intelligence, Mexico City, Mexico, 25–30 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 530–546. [Google Scholar]
- Amphawan, K.; Lenca, P.; Surarerks, A. Mining top-k periodic-frequent pattern from transactional databases without support threshold. In Proceedings of the International Conference on Advances in Information Technology, Bangkok, Thailand, 1–5 December 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 18–29. [Google Scholar]
- Kiran, R.U.; Kitsuregawa, M.; Reddy, P.K. Efficient discovery of periodic-frequent patterns in very large databases. J. Syst. Softw. 2016, 112, 110–121. [Google Scholar] [CrossRef]
- Surana, A.; Kiran, R.U.; Reddy, P.K. An efficient approach to mine periodic-frequent patterns in transactional databases. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Shenzhen, China, 24–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 254–266. [Google Scholar]
- Tanbeer, S.K.; Ahmed, C.F.; Jeong, B.S.; Lee, Y.K. Discovering periodic-frequent patterns in transactional databases. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, 27–30 April 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 242–253. [Google Scholar]
- Han, J.; Dong, G.; Yin, Y. Efficient mining of partial periodic patterns in time series database. In Proceedings of the 15th International Conference on Data Engineering (Cat. No. 99CB36337), Sydney, NSW, Australia, 23–26 March 1999; pp. 106–115. [Google Scholar]
- Yu, X.; Yu, H. An asynchronous periodic sequential patterns mining algorithm with multiple minimum item supports. In Proceedings of the 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Guangzhou, China, 8–11 November 2014; pp. 274–281. [Google Scholar]
- Fournier-Viger, P.; Lin, J.C.W.; Duong, Q.H.; Dam, T.L. PHM: Mining periodic high-utility itemsets. In Proceedings of the Industrial Conference on Data Mining, New York, NY, USA, 13–17 July 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 64–79. [Google Scholar]
- Lin, J.C.W.; Zhang, J.; Fournier-Viger, P. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, Guangzhou, China, 23–25 August 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 215–229. [Google Scholar]
- Lin, J.C.W.; Zhang, J.; Fournier-Viger, P.; Hong, T.P.; Zhang, J. A two-phase approach to mine short-period high-utility itemsets in transactional databases. Adv. Eng. Inform. 2017, 33, 29–43. [Google Scholar] [CrossRef]
- Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 23–26 July 2002; pp. 429–435. [Google Scholar]
- Fournier-Viger, P.; Lin, J.C.W.; Gomariz, A.; Gueniche, T.; Soltani, A.; Deng, Z.; Lam, H.T. The SPMF open-source data mining library version 2. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Italy, 19–23 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 36–40. [Google Scholar]
- Dong, X.; Gong, Y.; Cao, L. e-RNSP: An efficient method for mining repetition negative sequential patterns. IEEE Trans. Cybern. 2018, 50, 2084–2096. [Google Scholar] [CrossRef] [PubMed]
i | item |
X | q-itemset |
t | sequence |
s | q-sequence |
a quantitative sequence database | |
the identifier of sequence | |
the quantity of a q-item i in a q-sequence s | |
the unit profit or importance (external utility) of | |
≺ | the lexicographical order |
the utility of a q-item in a q-sequence s | |
the utility of a q-itemset X in a q-sequence s | |
the utility of a q-sequence s | |
sequence t matches q-sequence s | |
the sequence utility of a sequence t in a q-sequence s | |
the utility of t in a q-sequence database S | |
the maximum utility of a sequence t in a q-sequence s | |
the maximum utility of a sequence t in a q-sequence database S | |
the extension of a sequence t in a q-sequence s | |
the set of extension items of a sequence t in a quantitative sequential database D | |
the remaining utility of a sequence t in a q-sequence s | |
the set of q-sequences containing the sequence t | |
the period of two consecutive q-sequence and | |
periods of the sequence t | |
the lability of the sequence t | |
the concatenation of t with |
SID | Q-Sequence |
---|---|
S | <[(a,1)(b,1)(e,3)], [(c,3)(d,2)(g,3)], [(b,2)(e,1)], [(d,3)]> |
S | <[(a,3)(b,1)(c,3)(f,2)], [(a,5)(c,2)(g,5)], [(b,3)(d,2)(e,2)]> |
S | <[(b,1)(c,1)(e,2)(g,5)], [(a,3)(b,2)(e,4)(f,2)], [(b,2)(c,1)(e,2)]> |
S | <[(b,2)(c,3)], [(a,5)(e,1)], [(b,4)(d,3)(e,5)]> |
S | <[(a,4)(c,3)], [(a,2)(b,5)(c,2)(d,4)(e,3)]> |
S | <[(f,4)], [(a,5)(b,3)], [(a,3)(d,4)]> |
Item | a | b | c | d | e | f | g |
Profit | 1 | 3 | 4 | 2 | 1 | 6 | 2 |
Sequnence ID | 1 | 2 | 3 | 4 | 5 | 6 | ||||||||||||
Transaction ID | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 1 | 2 | 3 |
Items | a | a | a | a | a | a | a | a | a | |||||||||
b | b | b | b | b | b | b | b | b | b | b | ||||||||
c | c | c | c | c | c | c | c | |||||||||||
d | d | d | d | d | d | |||||||||||||
e | e | e | e | e | e | e | e | e | ||||||||||
f | f | f | ||||||||||||||||
g | g | g |
UP Information of s | [], [], [], [] |
Header Table ofs |
UP Information of s | [], [], [],[] |
Header Table ofs | |
UP Information ofs | [], [], [] |
Header Table ofs | |
UP Information ofs | [], [] |
Header Table ofs | |
UP Information ofs | [] |
Header Table ofs | |
UP Information ofs | [] |
Header Table ofs | |
The Periodic Information of |
Number of sequences | |
Number of distinct items | |
Average number of itemsets per sequence | |
Average number of items per itemset | |
Maximum number of items per sequence |
Dataset | C | T | |||
---|---|---|---|---|---|
Sign | 730 | 267 | 52.0 | 1 | 94 |
Bible | 36,369 | 13,905 | 21.6 | 1 | 100 |
Kosarak10k | 10,000 | 10,094 | 8.14 | 1 | 608 |
Leviathan | 5834 | 9025 | 33.8 | 1 | 100 |
yoochoose-buys | 234,300 | 16,004 | 1.13 | 1.97 | 21 |
MSNBC | 31,790 | 423,776 | 13.33 | 1 | 86 |
SIGN | BIBLE | ||||||
---|---|---|---|---|---|---|---|
1% | 5% | 1.2% | 306.42 | 0.5% | 0.5% | 0.5% | 1106.55 |
1% | 5% | 1.7% | 306.32 | 0.5% | 0.5% | 1% | 1112.40 |
1% | 10% | 1.2% | 307.49 | 0.5% | 1% | 0.5% | 1118.63 |
1% | 10% | 1.7% | 306.68 | 0.5% | 1% | 1% | 1125.90 |
2% | 5% | 1.2% | 307.60 | 1% | 0.5% | 0.5% | 1113.62 |
2% | 5% | 1.7% | 306.81 | 1% | 0.5% | 1% | 1133.95 |
2% | 10% | 1.2% | 310.12 | 1% | 1% | 0.5% | 1140.22 |
2% | 10% | 1.7% | 307.75 | 1% | 1% | 1% | 1134.50 |
Kosarak10k | LEVIATHAN | ||||||
maxPer | maxLa | minutil | Max memory | maxPer | maxLa | minutil | Max memory |
0.5% | 0.5% | 1.69% | 238.70 | 0.5% | 0.5% | 1% | 666.37 |
0.5% | 0.5% | 1.74% | 236.87 | 0.5% | 0.5% | 1.25% | 648.24 |
0.5% | 1% | 1.69% | 248.66 | 0.5% | 1% | 1% | 674.32 |
0.5% | 1% | 1.74% | 242.95 | 0.5% | 1% | 1.25% | 662.90 |
1% | 0.5% | 1.69% | 248.30 | 1% | 0.5% | 1% | 676.01 |
1% | 0.5% | 1.74% | 247.64 | 1% | 0.5% | 1.25% | 666.23 |
1% | 1% | 1.69% | 250.92 | 1% | 1% | 1% | 681.99 |
1% | 1% | 1.74% | 248.94 | 1% | 1% | 1.25% | 678.90 |
yoochoose-buys | MSNBC | ||||||
maxPer | maxLa | minutil | Max memory | maxPer | maxLa | minutil | Max memory |
25% | 25% | 0.024% | 549.85 | 0.5% | 0.5% | 1% | 636.57 |
25% | 25% | 0.034% | 536.48 | 0.5% | 0.5% | 2% | 620.43 |
25% | 30% | 0.024% | 579.85 | 0.5% | 1% | 1% | 640.75 |
25% | 30% | 0.034% | 560.78 | 0.5% | 1% | 2% | 626.33 |
30% | 25% | 0.024% | 582.66 | 1% | 0.5% | 1% | 641.33 |
30% | 25% | 0.034% | 567.38 | 1% | 0.5% | 2% | 634.43 |
30% | 30% | 0.024% | 586.02 | 1% | 1% | 1% | 651.61 |
30% | 30% | 0.034% | 561.23 | 1% | 1% | 2% | 637.15 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, S.; Zhao, L. An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns. Symmetry 2022, 14, 2032. https://doi.org/10.3390/sym14102032
Xie S, Zhao L. An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns. Symmetry. 2022; 14(10):2032. https://doi.org/10.3390/sym14102032
Chicago/Turabian StyleXie, Shiyong, and Long Zhao. 2022. "An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns" Symmetry 14, no. 10: 2032. https://doi.org/10.3390/sym14102032
APA StyleXie, S., & Zhao, L. (2022). An Efficient Algorithm for Mining Stable Periodic High-Utility Sequential Patterns. Symmetry, 14(10), 2032. https://doi.org/10.3390/sym14102032