Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications

Liu, Hsiu-Wen; Wu, Jei-Zheng; Wang, Ying-Hsuan

doi:10.3390/app13116386

Open AccessArticle

Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications

by

Hsiu-Wen Liu

,

Jei-Zheng Wu

^*

and

Ying-Hsuan Wang

Department of Business Administration, School of Business, Soochow University, Taipei 100, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6386; https://doi.org/10.3390/app13116386

Submission received: 10 February 2023 / Revised: 21 March 2023 / Accepted: 20 May 2023 / Published: 23 May 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download Versions Notes

Abstract

:

This study employs sequential pattern mining to analyze browsing behaviors and aid mobile app service providers in effectively promoting and recommending new products. We collected browsing history data from 66,004 mobile app users for new car info in Taiwan, totaling 1,263,614 records over two months. By utilizing sequence pattern mining, we identified frequent browsing sequences on the app that can indicate subsequence product interests and suggest new items to potential customers. The proposed method can improve the user experience for mobile app users and facilitate the development of the potential market for advertising. The study highlights the effectiveness of sequence pattern mining in recommending new products to car app users, benefiting small app vendors, improving user experience, and informing product development decisions in the automobile industry. Furthermore, the findings emphasize the importance of considering the sequential relationships between events or items in pattern mining, particularly in mobile app development. In conclusion, the proposed approach offers a cost-effective solution for small app vendors to recommend new products and improve the overall user experience, providing valuable insights for the automobile industry.

Keywords:

browsing behavior; sequential pattern analysis; new product recommendation

1. Introduction

Mobile applications have become an essential platform for businesses to communicate with users, gather valuable data on user behavior, and recommend new products. The user interface of mobile apps is a crucial aspect that can significantly impact user engagement and satisfaction [1]. An effective and well-designed product recommendation function within a mobile app can enhance the customer experience, increase customer loyalty, and provide valuable insights for new product development, ultimately leading to increased revenue for the business.

However, traditional product recommendation techniques, such as association rules or k-Nearest Neighbors [2], have limitations in capturing the sequence of events or objects, leading to significant patterns being overlooked [3,4]. This limitation is particularly significant in the case of new or technological products, as individuals are increasingly interested in the latest models and features. Sequential pattern mining has been suggested as a promising alternative to address this limitation. However, while several studies have improved the efficiency at which sequence rules can be generated [5,6,7,8,9,10,11,12], few researchers have made their software code public [4], and only a few have been carried out on mobile app data, posing a challenge for small app vendors with limited financial resources who depend on ads to provide free services to their users.

Therefore, this study proposes a cost-effective approach to new products and advertisement recommendations for small app vendors using a car info app as a case study. The study aims to address three research questions:

How can sequential pattern mining effectively recommend new products to users of a car info app, especially for small app vendors with limited financial resources?
What are the browsing behaviors of app users interested in car information, and how can these behaviors be analyzed to provide more intelligent and precise marketing recommendations?
How can automobile companies use insights from analyzing user behavior and preferences to make sound decisions about creating new products?

This study contributes to browsing data analysis and mobile app marketing by providing insights into sequential pattern mining for improving product recommendations and understanding user behavior in the context of a car information app. The study’s findings can help small app vendors provide effective recommendations for new products or advertisements and enhance their marketing strategies. Moreover, the insights can benefit automobile companies in tracking consumer trends about car model preferences to make sound decisions regarding creating new products.

The following sections include a literature review on sequential pattern mining and a description of the data collection and methodology. The study also explains the metrics used to evaluate the discovered sequence rules. Next, the results and discussion sections present the study’s findings, including insights, implications for new product recommendations, limitations, and future research. Finally, the conclusion section summarizes the study’s results.

2. Literature Review

Pattern mining is a crucial technique for discovering meaningful and useful database patterns. It was first introduced in the 1990s with the development of the Apriori algorithm [13]. However, traditional pattern mining may not always uncover important patterns in the data due to neglecting the sequential relationships between events or items [4]. Therefore, sequential pattern mining was proposed to address this issue.

2.1. Sequential Pattern Mining

Sequential pattern mining is a powerful data mining technique used to identify interesting subsequences in a sequence database while considering the order of events between different items recorded in a transaction database. This technique enables the identification of meaningful associations, summarizing customers’ buying patterns, and predicting future behavior by analyzing the links between multiple purchases and their sequences. Since its introduction, the field of sequential pattern mining has been highly dynamic, with various studies being published on the subject, including customized modifications of the technique for specific requirements [4,6,8,10,12,14].

In sequential pattern mining, the support measure is used to assess the level of interest of a subsequence. The support of a sequence s, sup(s), is the number of transactions in a database where s appears. A sequence s is regarded as a sequential pattern if and only if the sup(s) is greater than the minsup—threshold specified by the user [11]. Researchers have proposed various algorithms to discover sequential patterns, such as AprioriAll, GSP, SPADE/cSPADE, PrefixSpan, SPAM, CM-Spam, and CM-Spade. Each algorithm discovers the same set of sequential patterns if run with the same minimum support on the same database, but they differ in how they discover the patterns efficiently.

Agrawal and Srikant [14] developed the AprioriAll algorithm, one of the most well-known strategies for sequence pattern mining. Compared to the Apriori algorithm, this approach can analyze the association rule as “if item A is bought, then item B will be bought simultaneously.” On the other hand, sequence pattern recognition involves the temporal order element, interpreted as “if item A is bought, then item B will be purchased after a certain duration.” The primary purpose of the algorithm is to identify the most frequent sequences of length k from the transaction database. It then utilizes those frequent sequences to generate the candidate sequences by pairing them with the help of a hash tree as the primary storage structure. A bottom-up layer-by-layer search is then utilized to craft the longer candidate sequences based on, the shorter ones. Finally, the support of the generated sequences is evaluated to eliminate those that do not satisfy the minimum support. Once all the sequences are scrutinized, it is simple to detect the largest sequences, thus creating the sequence pattern. A downside to the AprioriAll algorithm is that it produces many possible sequences, making it time-consuming to reduce the number of candidate sequences.

Srikant and Agrawal [11] proposed the GSP (Generalized Sequential Pattern) algorithm, which is similar to the AprioriAll algorithm, but it does not need to identify all the recurrent item sets first. Instead, the GSP algorithm carries out multiple passes over the sequence database, using a depth-first search strategy to generate candidate sequences. During each pass, the number of potential sequences is reduced by pruning infrequent sequences to improve the algorithm’s efficiency.

Zaki [12] proposed the SPADE (Sequential Pattern Discovery using Equivalence Classes) algorithm, which uses a vertical data representation to illustrate the sequence database. The vertical data employed by SPADE allow for efficient support counting based on equivalence classes, reducing the number of candidate sequences that need to be generated and verified during the mining process, making SPADE more efficient than other algorithms, such as GSP and PrefixSpan, on certain datasets [15].

Pei et al. [9,10] introduced the Pattern-Growth approach as an efficient and scalable technique for mining sequential patterns, an extension of the FP-growth algorithm for mining frequent patterns. They also proposed the PrefixSpan (Prefix-projected Sequential Pattern Mining) algorithm, which uses a depth-first search strategy and prefix-projection technique to reduce the search cost of the database. In addition, PrefixSpan only requires two examinations of the transaction database, making it more efficient than other state-of-the-art algorithms for mining frequent sequential patterns. As a result, it is widely used for mining sequential patterns from large-scale datasets.

SPAM, CM-Spam, and CM-Spade have been proposed as alternatives for mining frequent sequential patterns. SPAM uses a vertical data format to represent the transaction database, resulting in decreased memory usage and computational time [16]. However, SPAM cannot efficiently mine closed and maximal sequential patterns—a subset of frequent sequential patterns. Modified algorithms such as CM-Spade and CM-Spam address this limitation.

CM-Spam and CM-Spade are two modified algorithms designed for the efficient mining of closed and maximal sequential patterns [16]. CM-Spam uses a pattern-growth approach to generate candidate patterns and mine frequent sequential patterns by recursively appending new items to existing patterns. It also uses a vertical data format to represent the transaction database, which reduces memory usage and computational time.

Similarly, CM-Spade uses a vertical data format and a depth-first search strategy to generate candidate sequences, which reduces memory usage and computational time. Both algorithms also use a pruning technique to reduce the number of candidate sequences generated and checked during mining. These algorithms are particularly useful for mining large-scale sequence databases with dense or long sequences.

2.2. Recent Extensions of Sequential Pattern Mining

In recent years, researchers have explored combining multiple methods to improve the effectiveness and efficiency of sequence pattern mining. One example is the SPAP method (Sequential rule, Periodic pattern, Association rule, and Preference), which incorporates association rules, sequential analysis, and periodic patterns to capture complex customer purchase behavior and improve personalized recommendations [5]. Another recent study applied sequential pattern mining to investigate airport ground handling operations [7]. The authors also developed SeRViz (Sequence Rule Visualization) methods to visualize the sequence rules and support the exploratory analysis of airport operations. The SeRViz method combines matrix-like visual representations and sequential rule mining. These recent extensions of sequence pattern mining highlight its versatility and potential for addressing complex real-world problems in various domains. Table 1 summarizes the different sequence mining algorithms and their characteristics. It includes algorithms such as AprioriAll, GSP, SPADE, PrefixSpan, SPAM, CM-Spam, CM-Spade, SPSP, and SeRViz, along with their key features.

2.3. Algorithm Selection for Sequential Pattern Mining

It is worth mentioning that the efficiency of sequential pattern mining algorithms has greatly improved with advancements in computer technology, despite the abundance of options available. However, it is important to remember that there is always only one correct solution to a sequential pattern mining problem, given a specific sequence database and the support threshold value.

Small application vendors may find it advantageous to use publicly available algorithms to produce the desired outcomes and detect sequence rules for multiple items. To this end, we used the cSPADE [12] algorithm, implemented with the arulesSequences package (v0.2-28) in R, in our study. The cSPADE algorithm is a modified version of the SPADE algorithm that incorporates additional constraints to guide the algorithm and achieve improved results. For example, maxlen defines the maximum number of sequential events allowed in an element. Furthermore, the cSPADE algorithm provides flexibility in parameter configuration, allowing for the efficient allocation of computer resources and reducing computational time. This advantage is particularly useful for large datasets. In addition, users can customize the algorithm to extract only relevant patterns, improving result quality. Therefore, the cSPADE algorithm with arulesSequences is a valuable tool for sequential pattern mining researchers and practitioners.

3. Method

3.1. Data

The data for this study were obtained from a Taiwanese automobile database app, which provided clickstream data. The study included only customers who viewed more than three items to analyze the sequential product relationships. The data were collected for seventy days, from 21 January 2015 to 31 March 2015, and included 66,004 members, 43 automobile companies, and 438 vehicles, resulting in a dataset comprising 1,263,614 car clickstream records. The app was established in December 2013, and its features included various specifications, high-resolution photos and videos, and up to 20 monthly test drive reviews to assist with car selection. The app owner aimed to provide an excellent user experience through a user-friendly and easy-to-navigate interface. To ensure the accuracy and currency of the information, the app owner worked with leading car news outlets, websites, and magazines to provide more than 100 pieces of car news and industry insights each month. Additionally, the company provided a digital platform for motorists to exchange information related to vehicles and repairs. All user activities were recorded in a cloud-based database, making it a valuable source for large-scale clickstream data analysis.

3.2. Method

This research utilized the cSPADE algorithm to explore the sequence patterns in the clickstream data. This algorithm can reduce the amount of I/O needed for database searches and use efficient search techniques to reduce computational costs [4,12]. The methodology consisted of five steps:

Step 1. Collecting clickstream data from the automobile database app.

Step 2. Applying the cSPADE algorithm to explore sequence patterns in the data.

Step 3. Testing the algorithm with various support thresholds to determine the appropriate parameter settings.

Step 4. Extracting frequent sequential patterns and calculating Confidence and Lift measures.

Step 5. Analyzing and interpreting the results to gain insights into user behaviors and preferences related to car information.

3.3. Advantages of Method in Car Information App Analysis and Recommendations

In the context of analyzing a car information app, the methodology used in this study has several advantages. Firstly, it can provide valuable insights into user behavior and preferences, which app developers can use to enhance the user experience and improve customer satisfaction. This can be especially useful for identifying areas of the app that may need improvement or additional features to better meet the users’ needs. Secondly, the methodology offers a cost-effective approach to new products and advertisement recommendations for small app vendors. Using sequential pattern mining, small app vendors can make more informed decisions on which products or advertisements to promote to their users, leading to more efficient use of resources and potentially higher revenue. Thirdly, using sequential pattern mining can help capture the sequence of events or objects, which may reveal patterns that would have been otherwise overlooked using other analysis methods. This can lead to more accurate and meaningful user behavior and preferences insights. Lastly, the methodology provides potential areas for product design changes. By identifying frequent sequences and associations among user behaviors, app developers can better understand the needs and preferences of their users, ultimately improving the success rate of new product launches and helping automobile manufacturers keep up with changing trends in consumer preferences.

3.4. Measures

The sequence rules were assessed using three metrics: Support, Confidence, and Lift. Support (XY) or Support (X→Y) indicates the percentage of times that “X followed by Y” happens out of the total members. The left-hand side represents X, and the right-hand side represents Y. The minimum support count refers to the number of members corresponding to the support percentage. Various sequences can be created when considering the sequence pattern, such as Support (XYZ), which indicates the occurrence of “X followed by Y and then Z.” Equations (1)–(3) exemplify the three measures for the sequences X→Y→Z.

Support (X→Y→Z) = (number of members containing X→Y→Z)/(total number of members)

(1)

Confidence (X→Y→Z) = Support (X→Y→Z)/Support (X→Y)

(2)

Lift (X \to Y \to Z) = Support (X \to Y \to Z) / [(Support (X \to Y) \times Support (Z))]

(3)

Confidence (XY) measures the association between two items and represents the percentage of times that {Y} is observed when {X} is present. This is a straightforward calculation when considering only one {X} and {Y}, but it becomes more complex when multiple steps (e.g., XYZ) are involved. Therefore, Confidence is typically not calculated if the rule includes more than two sequences. However, if the last item in the XYZ sequence is regarded as the right-hand side while the preceding items are considered the left-hand side, Confidence can be used to measure the rate at which the rule is true. For example, Confidence (XYZ) is calculated as the percentage of time that {Z} occurs if {X then Y} occurs.

Similarly, Lift (XYZ) measures the strength of the relationship between two sets (left- and right-hand side). It is calculated by dividing the support of the combination of {X then Y} and {Z} by the product of the support of {X then Y} and the support of {Z}. This metric indicates the likelihood of observing the simultaneous occurrence of {X then Y then Z} compared to the probability of the two events occurring independently.

4. Result

Settings and Results

When applying sequential pattern mining (SPM) in a recommendation system, selecting proper thresholds to create effective sequence rules poses a great challenge. Typically, these rules are employed for commercial purposes to identify items that can be sold together as a bundle to maximize customer revenue. Therefore, the thresholds are usually set to a high level. However, due to the limited number of views for the new products, we attempted to keep the support level low to make it easier to recognize the sequence rules of the new products. Subsequently, the thresholds were adjusted through tests to determine the number of rules that should be proposed for most vehicles. Thus, this study examined eleven different minimum support thresholds, and the results are presented in Table 2.

In Table 2, the first trial of SPM with a minimum support of 4% generated 61 sequential patterns, while the second one with a minimum support of 3% created 173. As the minimum support was further decreased to 0.5% in the ninth test of SPM, it yielded 22,077 patterns. Therefore, when conducting SPM, it is also important to carefully consider the support count and use this information when selecting the threshold for the support level. Based on the results, the authors chose a minimum support threshold of 0.5%, or 330 individuals, which generated 22,077 sequential patterns (setting 9).

A large number of patterns indicates that there are diverse patterns of vehicle clicks among users in the app. Therefore, analyzing and interpreting the frequent sequential patterns is important to extract meaningful insights for the app developer. Table 3 presents a random selection of the outcomes from the 22,077 rules (i.e., setting 9). The table displays the support, count, sequence of vehicle clicks, and the associated automobile models. For instance, the second rule, “{492},{582},” expresses that the user first viewed car number 492 and then car number 582. The support for this rule is 0.73%, the Confidence is 21.06%, and the Lift is 2.74. The sequence-related data provide valuable information that can be used to develop business strategies and guide management decisions.

5. Discussion

5.1. Insights for New Product Recommendation

The analysis of sequence rules provides valuable insights into users’ browsing behaviors and can be used to make effective recommendations for new products. For example, the findings suggest that certain groups view specific car models in sequence, indicating that they have noticed some association between these cars. This knowledge can be leveraged to create effective advertisements for new cars. For instance, the analysis of sequence rules for the Hyundai Santa Fe (car number 111) and Mazda CX-9 (car number 466) shows that users who viewed specific car models before are likely to be interested in these vehicles. Table 4 and Table 5 provide sequence rules for these two vehicles, respectively.

Table 4 presents sequence rules related to the “Hyundai Santa Fe,” indicating that users who viewed specific cars such as the Hyundai ix35, Mazda CX-5, and Toyota RAV4 before will likely be drawn to the Hyundai Santa Fe. Similarly, Table 5 shows that the Mazda CX-9 can be linked to various sequences, suggesting that users who viewed specific cars such as the Mazda CX-5, Audi A3 Sedan, Honda Civic, and Toyota RAV4 could potentially be interested in the Mazda CX-9. These findings highlight the close link between certain car models and their browsing patterns.

Based on these insights, app developers can find car critics or YouTubers to compare similar car models and provide recommendations to car purchasers. Businesses introducing new vehicles can also analyze the sequence in which customers look at design elements that interest them and use this information to form their development and marketing strategies. For instance, if many users who were originally interested in Crossover SUVs start to look at larger SUVs, it may indicate that users are increasingly interested in larger SUVs to meet their leisure needs, such as camping (e.g., Mazda CX-5 to Mazda CX-9). Likewise, more users looking at electric cars may indicate that consumers are paying more attention to the need for environmental protection and green energy. By employing the method proposed in this research, app owners can use these sequential rules to provide relevant product recommendations to users, enhancing user experience and leading to more effective advertising results.

5.2. Theoretical Implications

The theoretical implications of this study are on the use of sequence pattern mining to gain insights into user behavior and preferences in the context of mobile applications. By applying this technique to car info apps, this study demonstrates the potential for extracting valuable information from clickstream data and using it to make informed decisions about product recommendations and advertising strategies. This highlights the importance of understanding the sequential structure of user behavior and the potential benefits of analyzing it in various domains. Moreover, this study provides a framework for applying sequence pattern mining in recommendation systems, which can be extended to other industries and domains. Finally, this study contributes to the growing body of research on using data mining techniques to improve user experience and provide insights for business decision-making.

5.3. Managerial Implications

The findings of this study have several practical implications for the automotive industry. By analyzing the sequence patterns in user behavior, automobile companies can gain insights into their customers’ preferences, browsing behavior, and purchase intentions. This information can inform the development of new products, marketing campaigns, and sales strategies. App developers can also use the findings to create more effective product recommendation systems tailored to their users’ preferences. By leveraging sequence pattern mining techniques, small app vendors with limited financial resources can improve their product offerings and increase customer engagement. Additionally, the insights gained from analyzing user behavior and preferences can inform decision-making regarding inventory management, product placement, and pricing. By understanding frequent sequential patterns of product views, companies can optimize their product offerings and improve the customer experience. Overall, the findings of this study have important implications for the automotive industry and provide a framework for improving customer engagement and driving sales.

5.4. Limitations and Future Research

While the findings of this study offer valuable insights, some limitations must be considered. Firstly, it should be noted that the dataset used in this study only pertains to browsing behavior related to car information. Therefore, additional testing and efforts to generalize the results to other types of mobile apps or products could be helpful in further understanding the effectiveness and limitations of the proposed method. Secondly, the study only identified frequent sequence patterns and did not consider the context or content of the web pages visited by app users. Thirdly, the analysis only focused on browsing history data and did not incorporate other data sources, such as demographic information and user feedback. Future research can address these limitations by incorporating additional data sources and extending them to other domains for a more comprehensive understanding of user behavior and preferences.

Additionally, future research can combine different research methods, such as sequence pattern mining and text mining [17,18,19], to construct consumer preference trends by integrating sequence rules, product attributes, and textual data. By integrating new methods [5,20,21,22,23] into the sequence pattern extraction process, the accuracy and effectiveness of product recommendations can be further improved.

With these areas in mind, sequential pattern mining can offer many possibilities for analyzing mobile usage trends, forecasting customer interests, and identifying prevalent trends and patterns in related systems. In addition, such research can provide valuable insights into user behavior and preferences, leading to more effective recommendation systems across various industries.

6. Conclusions

Mobile applications have become an essential platform for communication between businesses and users. This study focuses on the potential and value of sequence pattern mining in analyzing user browsing behavior and recommending new products to app users in the automobile industry. Our findings demonstrate that, by leveraging user behavior data through sequential pattern mining, the proposed method can provide insights that benefit small application vendors and automobile companies seeking to track consumer trends. Moreover, by providing users with timely new product recommendations, businesses can reduce the information overload users may experience and drive customer loyalty while gathering valuable data on user behavior.

In conclusion, utilizing sequence pattern mining in mobile app development can greatly enhance the user experience, provide valuable insights for businesses, and inform product development decisions in various industries. By analyzing user behavior and considering the sequential relationships between events or items, researchers can identify opportunities to use collective knowledge to recommend new products and ads, drive revenue, and spot market trends. The present study adds to the growing body of literature on the potential of sequence pattern mining and highlights the need to explore its innovative applications further. Furthermore, the proposed method provides new opportunities to leverage user behavior data and has the potential to contribute to the social benefits of multiple parties and facilitate communication between academia and industry.

Author Contributions

Conceptualization, H.-W.L. and J.-Z.W.; methodology, H.-W.L.; software, H.-W.L.; validation, H.-W.L. and J.-Z.W.; formal analysis, H.-W.L.; investigation, H.-W.L. and Y.-H.W.; resources, J.-Z.W., H.-W.L. and Y.-H.W.; data curation, H.-W.L. and J.-Z.W.; writing—original draft preparation, H.-W.L., J.-Z.W. and Y.-H.W.; writing—review and editing, H.-W.L. and J.-Z.W.; visualization, H.-W.L.; supervision, J.-Z.W. and H.-W.L.; project administration, H.-W.L., J.-Z.W. and Y.-H.W.; funding acquisition, J.-Z.W. and H.-W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology under MOST110-2628-E-031-001-MY2 and MOST 103-2622-H-031-002-CC3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Given that our research data contains competitive information from the automotive industry, we are unable to share the raw data in order to avoid potential risks associated with releasing confidential business information.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nie, L.; Said, K.S.; Ma, L.; Zheng, Y.; Zhao, Y. A systematic mapping study for graphical user interface testing on mobile apps. IET Softw. 2023, 1–19. [Google Scholar] [CrossRef]
Park, D.H.; Kim, H.K.; Choi, I.Y.; Kim, J.K. A literature review and classification of recommender systems research. Expert Syst. Appl. 2012, 39, 10059–10072. [Google Scholar] [CrossRef]
Demiriz, A. WebSPADE: A parallel sequence mining algorithm to analyze web log data. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9–12 December 2002; IEEE: Maebashi City, Japan, 2002; pp. 755–758. [Google Scholar] [CrossRef]
Fournier-Viger, P.; Lin, J.C.W.; Kiran, R.U.; Koh, Y.S.; Thomas, R. A survey of sequential pattern mining. Data Sci. Pattern Recognit. 2017, 1, 54–77. [Google Scholar]
Chen, G.; Li, Z. A new method combining pattern prediction and preference prediction for next basket recommendation. Entropy 2021, 23, 1430. [Google Scholar] [CrossRef] [PubMed]
Gouda, K.; Hassaan, M.; Zaki, M.J. Prism: An effective approach for frequent sequence mining via prime-block encoding. J. Comput. Syst. Sci. 2010, 76, 88–102. [Google Scholar] [CrossRef]
Jalilvand, A.; Christino, L.; Paulovich, F.V. Serviz: A Visual Analytics System for the Analysis of Sequential Rules and Its Application to Airport Ground Handling Operations. 2023. Available online: https://doi.org/10.2139/ssrn.4341080 (accessed on 19 May 2023).
Mabroukeh, N.R.; Ezeife, C.I. A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 2010, 43, 1–41. [Google Scholar] [CrossRef]
Pei, J.; Han, J.; Pinto, H.; Chen, Q.; Dayal, U.; Hsu, M.C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001; pp. 215–224. [Google Scholar]
Pei, J.; Han, J.; Mortazavi-Asl, B.; Wang, J.; Pinto, H.; Chen, Q.; Dayal, U.; Hsu, M.-C. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Trans. Knowl. Data Eng. 2004, 16, 1424–1440. [Google Scholar]
Srikant, R.; Agrawal, R. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Advances in Database Technology—EDBT’96: 5th International Conference on Extending Database Technology, Avignon, France, 25–29 March 1996; Springer: Berlin/Heidelberg, Germany, 1996; Volume 5, pp. 1–17. [Google Scholar]
Zaki, M.J. SPADE: An efficient algorithm for mining frequent sequences. Mach. Learn. 2001, 42, 31–60. [Google Scholar] [CrossRef]
Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 12–15 September 1994; VLDB 94; Volume 1215, pp. 487–499. Available online: https://dl.acm.org/doi/10.5555/645920.672836 (accessed on 19 May 2023).
Agrawal, R.; Srikant, R. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, Washington, DC, USA,, 9–10 March 1995; IEEE: Taipei, Taiwan, 1995; pp. 3–14. Available online: https://10.1109/ICDE.1995.380415 (accessed on 19 May 2023).
Fournier-Viger, P.; Gomariz, A.; Campos, M.; Thomas, R. Fast vertical mining of sequential patterns using co-occurrence information. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference, Tainan, Taiwan, 13–16 May 2014; PAKDD 2014. Springer: Tainan, Taiwan, 2014; pp. 40–52. [Google Scholar] [CrossRef]
Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, Edmonton, AB, Canada, 23–26 July 2002; Association for Computing Machinery: New York, NY, USA, 2002; pp. 429–435. [Google Scholar] [CrossRef]
Ghani, R.; Probst, K.; Liu, Y.; Krema, M.; Fano, A. Text mining for product attribute extraction. ACM SIGKDD Explor. Newsl. 2006, 8, 41–48. [Google Scholar] [CrossRef]
Kumar, S.; Kar, A.K.; Ilavarasan, P.V. Applications of text mining in services management: A systematic literature review. Int. J. Inf. Manag. Data Insights 2021, 1, 100008. [Google Scholar] [CrossRef]
Srinivas, S.; Ramachandiran, S. Passenger intelligence as a competitive opportunity: Unsupervised text analytics for discovering airline-specific insights from online reviews. Ann. Oper. Res. 2023, 1–34. [Google Scholar] [CrossRef]
Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A survey of deep learning and its applications: A new paradigm to machine learning. Arch. Comput. Methods Eng. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Wei, J.; He, J.; Chen, K.; Zhou, Y.; Tang, Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 2017, 69, 29–39. [Google Scholar] [CrossRef]
Senturk, I.; Gursoy, N.K.; Oner, T.; Gursoy, A. A novel algorithmic construction for deductions of categorical polysyllogisms by Carroll’s diagrams. Inf. Sci. 2021, 578, 236–256. [Google Scholar] [CrossRef]

Table 1. Key features of various sequence mining algorithms.

Algorithm	Description of Key Features
AprioriAll [14]	AprioriAll is one of the most well-known strategies for sequence pattern mining, and it uses a hash tree as the primary storage structure to generate candidate sequences.
GSP [11]	GSP carries out multiple passes over the sequence database and uses a depth-first search strategy to generate candidate sequences.
SPADE/cSPADE [12]	SPADE employs a vertical data representation to illustrate the sequence database and allows for efficient support counting based on equivalence classes. In addition, the SPADE algorithm is enhanced by cSPADE through the incorporation of novel constraints.
PrefixSpan [10]	PrefixSpan utilizes a depth-first search strategy and prefix-projection technique to reduce the search cost of the database.
SPAM [11]	SPAM utilizes a vertical data format for transaction databases, which leads to decreased memory usage and computational time. This approach is both efficient and effective.
CM-Spam [15]	The CM-Spam uses a pattern-growth approach to generate candidate patterns and mine frequent sequential patterns by recursively appending new items to existing patterns.
CM-Spade [15]	The CM-Spade uses a vertical representation of the sequence database to mine frequent sequential patterns.
SPAP [5]	SPAP combines association rules, sequential analysis, and periodic patterns to capture complex customer purchase behavior.
SeRViz [7]	The SeRViz was developed to visualize sequence rules and aid in the exploratory analysis of airport operations.

Table 2. The results of SPM were conducted in four different settings.

Settings	Threshold of Min Support %	Min Support Count	Sequential Rules
1	Support ≥ 4.00	2640	61 Rules
2	Support ≥ 3.00	1980	173 Rules
3	Support ≥ 2.00	1320	691 Rules
4	Support ≥ 1.00	660	4449 Rules
5	Support ≥ 0.90	594	5696 Rules
6	Support ≥ 0.80	528	7460 Rules
7	Support ≥ 0.70	462	10,240 Rules
8	Support ≥ 0.60	396	14,578 Rules
9	Support ≥ 0.50	330	22,077 Rules
10	Support ≥ 0.40	264	37,345 Rules
11	Support ≥ 0.30	198	72,951 Rules

Note: Total members N equal to 66,004.

Table 3. Results of analysis of an individual’s mobile browsing history using SPM.

ID	Sequential Rule	Support	Count	Confidence	Lift
1	{3},{9},{4},{9}	0.90%	591	51.30%	2.46
2	{492},{582}	0.73%	482	21.06%	2.74
3	{53},{432},{434}	0.85%	563	25.30%	2.11
4	{645},{18},{629}	0.65%	429	23.46%	2.46
5	{634},{594}	0.65%	432	21.82%	1.23
6	{645},{120},{82}	0.61%	403	26.20%	2.51
7	{65},{455},{432}	0.69%	457	30.09%	2.44
8	{432},{455},{455}	1.30%	855	41.20%	3.72
9	{108},{486},{106}	0.74%	486	46.78%	6.94
10	{645},{4},{9}	1.00%	658	43.29%	2.08
11	{645},{340},{82}	0.70%	462	39.66%	3.79
12	{106},{66}	1.45%	959	21.56%	1.51
13	{8},{22},{8}	0.79%	521	30.38%	2.72
14	{98},{53},{29}	0.84%	556	38.69%	3.09
15	{432},{432},{648},{432}	1.00%	659	64.73%	5.25
16	{98},{594},{401}	0.61%	403	21.32%	1.65
17	{120},{645}	1.83%	1206	29.07%	2.49
18	{342},{82}	4.03%	2658	29.25%	2.80
19	{4},{645},{323}	0.80%	529	28.53%	1.48
20	{496},{494},{491},{493}	0.98%	646	55.45%	17.15
21	{98},{3},{9}	1.96%	1293	47.50%	2.28
22	{8},{9},{3},{4}	0.73%	479	31.72%	2.86
23	{644},{401},{3}	0.53%	352	25.21%	1.15
24	{434},{448}	2.84%	1876	23.65%	1.94
25	{17},{18}	2.56%	1691	40.46%	6.39
26	{431},{53}	1.79%	1179	24.14%	1.60
27	{401},{323},{445}	0.52%	342	22.02%	1.64
28	{323},{629},{82}	1.21%	801	40.01%	3.83
29	{674},{432},{431}	0.52%	344	20.81%	2.81
30	{494},{9}	1.23%	813	27.89%	1.34

Note: Total members N equal to 66,004.

Table 4. Sequential rules of car no 111 (Hyundai Santa Fe).

ID	Sequential Rule	Support	Count	Confidence	Lift
1	{111},{111}	2.62%	1732	37.97%	5.50
2	{458},{111}	2.61%	1725	35.77%	5.18
3	{665},{111}	0.56%	372	24.88%	3.60
4	{66},{458},{111}	0.69%	453	39.49%	5.72
5	{66},{111},{111}	0.68%	451	41.30%	5.98
6	{458},{458},{111}	1.03%	678	37.17%	5.38
7	{448},{458},{111}	0.74%	490	39.64%	5.74
8	{434},{458},{111}	0.83%	549	36.53%	5.29
9	{36},{458},{111}	0.68%	447	39.35%	5.69
10	{458},{111},{111}	1.03%	681	39.48%	5.71
11	{111},{458},{111}	1.05%	695	44.61%	6.46
12	{458},{458},{458},{111}	0.52%	344	39.31%	5.69
13	{458},{111},{458},{111}	0.50%	332	46.30%	6.70
14	{111},{111},{458},{111}	0.53%	348	56.86%	8.23
15	{448},{434},{111}	0.57%	378	20.36%	2.95
16	{36},{448},{111}	0.51%	339	20.50%	2.97
17	{448},{111},{111}	0.68%	447	39.59%	5.73
18	{434},{111},{111}	0.69%	458	39.38%	5.70
19	{111},{111},{111}	1.30%	855	49.36%	7.14
20	{458},{111},{111},{111}	0.52%	344	50.51%	7.31
21	{111},{458},{111},{111}	0.55%	360	51.80%	7.50
22	{111},{111},{111},{111}	0.73%	485	56.73%	8.21

Note: 111 is Hyundai Santa Fe; 458 is Hyundai ix35; 665 is Hyundai Grand Starex; 66 is Mazda CX-5; 448 is Ford Kuga; 434 is Honda CR-V; 36 is Toyota RAV4.

Table 5. Sequential rules of car no 466 (Mazda CX-9).

ID	Sequential Rule	Support	Count	Confidence	Lift
1	{466},{466}	1.77%	1168	24.79%	3.47
2	{66},{466}	4.26%	2810	29.87%	4.18
3	{98},{66},{466}	0.60%	399	31.39%	4.40
4	{90},{66},{466}	0.83%	547	32.27%	4.52
5	{65},{66},{466}	1.11%	730	29.89%	4.19
6	{644},{66},{466}	0.69%	454	33.48%	4.69
7	{553},{66},{466}	0.52%	344	29.91%	4.19
8	{521},{66},{466}	0.60%	396	37.75%	5.29
9	{466},{66},{466}	1.04%	688	30.78%	4.31
10	{463},{66},{466}	0.72%	476	32.47%	4.55
11	{458},{66},{466}	0.66%	434	32.15%	4.50
12	{455},{66},{466}	0.63%	419	34.43%	4.82
13	{434},{66},{466}	0.99%	652	31.35%	4.39
14	{432},{66},{466}	0.65%	426	31.25%	4.38
15	{431},{66},{466}	0.51%	336	35.71%	5.00
16	{36},{66},{466}	0.51%	336	31.01%	4.34
17	{342},{66},{466}	0.51%	336	31.88%	4.47
18	{339},{66},{466}	0.51%	336	30.26%	4.24
19	{29},{66},{466}	0.51%	336	30.39%	4.26
20	{126},{66},{466}	0.51%	336	30.31%	4.25
21	{111},{66},{466}	0.59%	392	32.50%	4.55
22	{66},{466},{66},{466}	0.63%	416	31.16%	4.36
23	{466},{66},{66},{466}	0.53%	352	34.17%	4.79
24	{466},{466},{466}	0.55%	360	30.82%	4.32

Note: 466 is Mazda CX-9; 66 is Mazda CX-5; 98 is Audi A3 Sedan; 90 is Luxgen U6 TURBO; 65 is Mazda 6; 644 is Infinti Q50; 553 is Subaru Impreza; 521 is Luxgen U7 Turbo; 463 is Mazda 5; 458 is Hyundai ix35; 455 is Hyundai Elantra 1.8 L; 434 is Honda CR-V; 432 is Honda Civic; 431 is Honda Accord; 36 is Toyota RAV4; 342 is BMW X6; 339 is BMW X1; 29 is Toyota Camry; 126 is Ford Fiesta; 111 is Hyundai Santa Fe.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.-W.; Wu, J.-Z.; Wang, Y.-H. Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications. Appl. Sci. 2023, 13, 6386. https://doi.org/10.3390/app13116386

AMA Style

Liu H-W, Wu J-Z, Wang Y-H. Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications. Applied Sciences. 2023; 13(11):6386. https://doi.org/10.3390/app13116386

Chicago/Turabian Style

Liu, Hsiu-Wen, Jei-Zheng Wu, and Ying-Hsuan Wang. 2023. "Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications" Applied Sciences 13, no. 11: 6386. https://doi.org/10.3390/app13116386

APA Style

Liu, H.-W., Wu, J.-Z., & Wang, Y.-H. (2023). Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications. Applied Sciences, 13(11), 6386. https://doi.org/10.3390/app13116386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Uncovering Insights for New Car Recommendations with Sequence Pattern Mining on Mobile Applications

Abstract

1. Introduction

2. Literature Review

2.1. Sequential Pattern Mining

2.2. Recent Extensions of Sequential Pattern Mining

2.3. Algorithm Selection for Sequential Pattern Mining

3. Method

3.1. Data

3.2. Method

3.3. Advantages of Method in Car Information App Analysis and Recommendations

3.4. Measures

4. Result

5. Discussion

5.1. Insights for New Product Recommendation

5.2. Theoretical Implications

5.3. Managerial Implications

5.4. Limitations and Future Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI