A New Method Combining Pattern Prediction and Preference Prediction for Next Basket Recommendation

Market basket prediction, which is the basis of product recommendation systems, is the concept of predicting what customers will buy in the next shopping basket based on analysis of their historical shopping records. Although product recommendation systems develop rapidly and have good performance in practice, state-of-the-art algorithms still have plenty of room for improvement. In this paper, we propose a new algorithm combining pattern prediction and preference prediction. In pattern prediction, sequential rules, periodic patterns and association rules are mined and probability models are established based on their statistical characteristics, e.g., the distribution of periods of a periodic pattern, to make a more precise prediction. Products that have a higher probability will have priority to be recommended. If the quantity of recommended products is insufficient, then we make a preference prediction to select more products. Preference prediction is based on the frequency and tendency of products that appear in customers’ individual shopping records, where tendency is a new concept to reflect the evolution of customers’ shopping preferences. Experiments show that our algorithm outperforms those of the baseline methods and state-of-the-art methods on three of four real-world transaction sequence datasets.


Introduction
Data mining technology is an efficient tool for business. Early in 1993, Agrawal et al. [1] proposed association rule mining for transaction databases to discover the intrinsic connection between different products and the shopping habits of customers. This technology makes sales prediction easier. By predicting what customers will buy in the next shopping basket and then recommending the products to them, retailers can improve their services and promote sales. We call such a technique market basket prediction, which is the basis of product recommendation systems.
Since Agrawal et al. proposed association rule mining, both data mining and recommendation systems have been developing rapidly. On the one hand, sequential patterns [2], sequential rules [3], coverage patterns [4], temporal patterns [5], subgraph patterns [6] and periodic patterns [7] have been proposed. Data mining, as an increasingly sophisticated technology, has been used for many domains, such as time series analysis [8], medicine [9] and image processing [10]. On the other hand, recommendation systems include other kinds of implementation methods including pattern-based models, collaborative filtering [11] and Markov chains [12]. Advances in data mining technology make pattern-based models promising. There are many efficient and ready-made algorithms for pattern mining [13], and they can be easily used to implement pattern-based recommendation systems.
A pattern reveals the relation between different products, which makes pattern-based models comprehensive. Among them, a sequential rule reveals the relation between the products in two consecutive transactions. This means that a customer bought a product at some time and will buy another product at a future time. For example, if a customer buys a computer, he or she will need a U-Disk or printer possibly when working on the computer. The higher the confidence of a sequential rule, the higher the possibility. A number of sophisticated and efficient sequential rule mining algorithms have been proposed, including RuleGen [13], ERMiner [14] and RuleGrowth [3,15]. Because all products have a limited service life, when a product is used out, we will buy another product again. Therefore, some products periodically appear in our market baskets. If we know the period of a periodic pattern, then we can predict when it will appear again. Some periodic pattern mining algorithms have been proposed, including SPP [16], MPFPS [17] and LPPM [18]. The association rule reveals the relation between the products in a basket. This means that if a customer buys a product, he or she will buy another product at the same time. Some efficient algorithms such as TopKRules [19] and TNR [20] focus on association rule mining.
As described above, pattern-based models have the advantages of popularity and comprehensibility. However, existing pattern-based recommendation algorithms are insufficient to capture customers' shopping habits. For example, Ref. [21] focused on the association rule only, and the periodicity was neglected. Furthermore, the statistical characteristics inside a pattern, e.g., the distribution of periods of a periodic pattern, are also neglected. For example, by the current definition of periodic patterns [22], when a periodic pattern is used to make a prediction, we can only predict that the pattern will reoccur between a time interval but the probability at an exact time.
In terms of the disadvantages of existing methods, our method leverages not only the association rule but also the sequential rule and periodic pattern for prediction at the same time. We call such a strategy pattern prediction. Furthermore, the frequency and tendency of a product will be considered the preference that customers have for this product and the evolution of preference, respectively, to make predictions, which we call preference prediction. Combining pattern prediction and preference prediction, we propose a new algorithm for market basket prediction, which we call SPAP (Sequential rule, Periodic pattern, Association rule, and Preference).
In this paper, first, we present a new definition of periodic patterns and tendencies. Generally, if a product is bought periodically by a customer, then the period will be nearly equal to the service life of the product. However, service lives of a kind of product may differ from one another, leading to a fluctuation in the period. What type of pattern is a periodic pattern, which has a virtue to reveal the periodicity of product purchases that have not a fixed period? Obviously, if a pattern has periods that the fluctuation is too large compared to the average period, it will not be periodic. Taking the average period and standard deviation into account, the coefficient of variation, which is the specific value of the standard deviation and mean value, is used to measure the periodicity of patterns in our definition. The concept of tendency is based on the following considerations: if a product is more frequently bought in recent baskets than in early baskets of a customer, then the customer tends to be increasingly inclined to the product. Otherwise, if a product is more frequently bought in early baskets than recent ones, the customer tends to be increasingly estranged from the product. We use a new concept of tendency to reflect this fact.
Second, we propose probability models for pattern prediction. The sequential rule reveals the relation of two patterns belonging to two consecutive transactions. The former pattern is called the antecedent, and the latter the consequent. When a sequential rule is used for predicting the next basket, the time interval between antecedent and consequent is usually neglected. That is, the consequent will follow the antecedent with a given confidence, however, we do not know for sure at what time it occurs. In this paper, we learn the statistical model for the time interval of all sequential rules in the training data. In prediction, we use the statistical model to compute the probability of the occurrence of con-sequents at an exact time stamp. For periodic patterns, the statistical model is determined by the average period and standard deviation. After training, we obtain all sequential rules, periodic patterns and association rules, along with their statistical characteristics. Consequently, we can calculate the probability of all products in a customer's next basket. Products that have a higher probability will have priority to be recommended. If the quantity of recommended products is insufficient, then we will make a preference prediction to select more products.
Preference prediction is based on this observation: if a product is more frequently bought by a customer, then we draw the conclusion that the customer has a preference for this product, and this product will have priority to be recommended. If some products have the same frequency, then the product with a higher tendency will be selected first in such a case.
Our contributions in this paper are summarized as follows: • We present a new definition for periodic patterns and the tendency of patterns. • We propose probability models for pattern prediction to predict the next basket. • We design a new algorithm combining pattern prediction and preference prediction for next basket recommendation. • Empirically, we show that our algorithm outperforms the baseline methods and stateof-the-art methods on three of four real-world transaction sequence datasets under the evaluation metrics of F1-Score and Hit-Ratio.
The remainder of this paper is organized as follows: Section 2 reviews the existing approaches. Section 3 includes the preliminary. We introduce our prediction method in Section 4. The implementation of our algorithm is described in Section 5, and the experimental analysis is reported in Section 6. Finally, we draw a conclusion in Section 7.

Related Work
Implementation methods of recommendation systems can be categorized into sequential, general, pattern-based, and hybrid models. Sequential models [23,24], mostly relying on Markov chains, explore sequential transaction data by predicting the next purchase based on the last actions to capture sequential behavior. A major advantage of this model is its ability to capture sequential behavior to provide good recommendations. The general model [25], in contrast, does not consider sequential behavior but makes recommendations based on customers' whole purchase history. The key idea is collaborative filtering. The pattern-based model bases predictions on the frequent patterns that are extracted from the shopping records of all customers [26]. Among them, the hybrid model combines the models mentioned above or other ideas, such as graph-based models [27] and recurrent neural network models [28,29]. Since there are so many works devoted to recommendation systems, it is impossible to list all here. So, we only briefly review pattern-based approaches in the next paragraph.
Fu et al. [30] first used an association rule for recommendation systems. Candidate items are listed for her in their order of support. Wang et al. [31] proposed an association rule mining algorithm with maximal nonblank for recommendation. The weighted association rule mining algorithm based on FP-tree and its application procedure in personalization recommendation was given by Wang et al. [32]. Ding et al. [33,34] proposed a method for personalized recommendation, which could decrease the number of association rules by merging different rules. Li et al. [21,35] proposed the notion of strongest association rules (SARs), and developed a matrix-based algorithm for mining SAR sets. As the subset of the entire association rule set, the SAR set includes many fewer rules with the special suitable form for personalized recommendation without information loss. Lazcorreta et al. [26] applied a modified version of the well-known apriori data mining algorithm towards personalized recommendation. Najafabadi et al. [36] applied the users' implicit interaction records with items to efficiently process massive data by employing association rules mining. It captures the multiple purchases per transaction in association rules, rather than just counting total purchases made. Chen et al. [37] mined simple association rules with a single item in consequent to avoid exponential pattern growth. The method proposed by Zhou et al. [38] involved implementation of genetic network programming and ant colony optimization to solve the sequential rule mining problem for commercial recommendations in time-related transaction sequence databases. Maske et al. [39] proposed a method describing how customer behavior predicted based on the customer purchase items by association rule mining algorithm Apriori.
In the other methods, Cumby et al. [40] proposed a predictor that embraces a usercentric vision by reformulating basket prediction as a classification problem. They build a distinct classifier for every customer and perform predictions by relying just on their personal data. Unfortunately, this approach assumes the independence of items purchased together. Wang et al. [41] employed a two-layer structure to construct a hybrid representation over customers and items purchase history from last transactions: the first layer represents the transactions by aggregating item vectors from the last transactions, while the second layer realizes the hybrid representation by aggregating the customer's vectors and the transactions representations. Guidotti et al. [42][43][44] defined a new pattern named the Temporal Annotated Recurring Sequence (TARS), which seeks to simultaneously and adaptively capture the co-occurrence, sequentiality, periodicity and recurrence of the items in the transaction sequence. Jain et al. [45] designed a business strategy prediction system for market basket analysis. Kraus et al. [46] proposed similarity matching based on subsequential dynamic time warping as a novel predictor of market baskets, and leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. Hu et al. [47] presented a k-nearest neighbors (kNN) based method to directly capture two useful patterns: repeated purchase pattern and collaborative purchase pattern that associate with personalized item frequency. Faggioli et al. [48] proposed an efficient solution to achieve the next basket recommendation, under a more general top-n recommendation framework by exploiting a set of collaborative filtering based techniques to capture customers' shopping patterns and intentions.

Preliminary
Retailers usually preserve their customers' shopping histories in a database that we call the transaction sequence database. A customer's shopping history contains many transactions. Transaction, also called basket, usually contains ID, date, product list and quantity. All transactions of a customer are sorted according to date. This is called the transaction sequence, as Table 1 shows. A transaction sequence database contains all customers' transaction sequence, as Table 2 shows.
Let C = {c 1 , ..., c n } be a set of n customers and I = {i 1 , ..., i m } be a set of m items or products in the market. The transaction sequence of customer c is denoted as B c , and B c = b 1 , ..., b r c , where b i ⊆ I, i ∈ {1, ..., r c }, denotes a transaction or basket. The terms transaction, basket and itemset will be used interchangeably, due to the fact that we are referring to an unordered set of items (or products). The size of sequence B c is denoted as |B c |, and |B c | = r c . b r c +1 denotes the next basket that will be purchased by customer c at the next time. We use indexes set {1, ..., r c } of baskets in transaction sequence rather than formal dates as timestamps to simplify the problem. The interval of two transactions is denoted as Gap(b i , b j ) and defined as Gap The transaction sequence dataset D = {B c 1 , ..., B c n } consists of transaction sequences of n customers. Problem 1. Assume a transaction sequence dataset, our aim is to predict the next basket for each customer according to their transaction sequence. Then, we will select k products to recommend to him or her. Formally, given dataset D, for all transaction sequence B c ∈ D and B c = b 1 , ..., b r c to predict b r c +1 , which contains a set of candidate items for recommendation. Let b * c denote the selected item set; then, b * c contains k items selected from b r c +1 to recommend to customer c.   Table 2. Illustrative transaction sequence database containing seven customers.

Customer ID Transaction Sequences
Given an itemset p, p ⊆ I, frequency threshold θ, and The absolute frequency of p is defined as Freq(p) = |Sup(p)|, and the relative frequency is Freq(p) = |Sup(p)|/|B|. If Freq(p) ≥ θ, then we call p a frequent itemset pattern and itemset pattern for short. If p contains only a single item, that is, |p| = 1, we call it a single item pattern.
The absolute frequency of s is defined as Freq(s) = |Sup(s)| and the relative frequency Freq(s) = |Sup(s)|/|D|. Given a threshold θ for frequency, if Freq(s) ≥ θ, then s is a frequent sequential pattern, sequential pattern for short.
In this paper, the length of s is denoted as Len(s), and defined as Len(s) = ∑ v∈{1,...,k} |p v |. If a sequence s contains only an itemset p, that is, s = p , |s| = 1, then it can be mapped into itemset p, and we have s = p. If Len(s) = 1, viz. s has only a single itemset, and this itemset contains only a single item. We call it a single item pattern. Definition 4. (Sequential Rule) Given two sequential patterns s 1 and s 2 of a transaction sequence dataset and confidence threshold η, if (1) s 1 = ∅, s 2 = ∅, (2) s = s 1 , s 2 (s 1 concatenates with s 2 ) is a sequential pattern, and (3) Freq(s)/Freq(s 1 ) ≥ η, then we call s 1 → s 2 a sequential rule. Its frequency Freq(s 1 → s 2 ) = Freq(s), and confidence Con f (s 1 → s 2 ) = Freq(s)/Freq(s 1 ). We call s 1 the antecedent, and s 2 the consequent. Property 1. Assume a sequential rule s 1 → s 2 : Con f , where s 2 = p 1 , ..., p k , then we have s 1 → p 1 : Con f 1 ,..., s 1 → p k : Con f k , and Con f ≤ Con f 1 ,..., Con f ≤ Con f k .
Proof. The proof comes from the anti-monotonicity of sequential patterns' frequency.

Framework
Our method includes pattern prediction and preference prediction. In pattern prediction, first, all sequential rules, periodic patterns and association rules are found together with their statistical characteristics. Then, probability models are built based on their statistical characteristics. Afterward, we use the probability models to calculate the probability of all products in the next basket for a customer. The products that have a higher probability will be selected preferentially to recommend to him or her. If k products have been selected, then continue to the prediction of the next customer; otherwise, make preference predictions. In preference prediction, the product that is more frequent in the individual shopping records will be selected first. If some products have the same frequency, then the product that has a higher tendency will be selected. Until all k products are selected, we continue to predict the next customer. See Algorithm 1. We introduce the probability list pl to preserve the probability of all items in b r+1 , viz. pl := {(item1 : value1), (item2 : value2), ...}.
sort b r+1 by probability in descending order; Sequential rules reveal the relation between products in two consecutive transactions. This means that a customer bought a product at some time, and he or she will buy another product at a future time. However, the sequential rule defined by the previous section has a limitation to use for the next basket prediction. For example, given sequential rule E 1 → E 2 : Con f = 0.8. If event E 1 occurs, then it will lead to event E 2 occurring at a confidence of 0.8, and the confidence is considered as the probability here, that is, P(E 2 |E 1 ) = 0.8. However, we did not know for sure at what time event E 2 occurs, and did not know the probability of event E 2 occurring at an exact time after event E 1 occurred. To address this limitation, we build a probability model for time intervals of the sequential rule. The time interval of a sequential rule, i.e., the time interval between event E 1 and event E 2 , is a random variable and represented by X here, X = Gap(E 1 , E 2 ). Generally, the larger the time interval, the lower the relevance of E 1 and E 2 , and vice versa. We suppose that the probability model nearly follows an exponential distribution with a parameter of mean (Gap(s 1 , s 2 )). Table 2

Principle of Sequential Rule Prediction
Given transaction sequence B = b 1 , ..., b r , sequential rule s 1 → s 2 : Con f , and its probability distribution P{X = Gap(s 1 , s 2 )} of time intervals between s 1 and s 2 . Suppose the consequent contains only a single itemset, that is, |s 2 | = 1 (Since if |s 2 | > 1, then we can break it down into several sequential rules, which have a consequent containing only a single itemset, according to Property 1). If s 1 ⊀ B, then P(s 1 ≺ B) = 0. Otherwise, P(s 1 ≺ b 1 , ..., b λ ) = 1, where λ = max{e|∀E ∈ Embs(s 1 ), e ∈ E}. Since sequential rule s 1 → s 2 : Con f means that P( In a similar way, A u denote the event s 2 ⊆ b r+u . If event A 1 occurs, then Gap(s 1 , In a similar way, if event A u occurs, then Gap(s 1 , First, the time variable is continuous in general. However, in our case, we use the index of baskets as the timestamp, so the time variable is discretized. The probability of an exact value of the time variable is the probability within a unit interval over this value. Second, if an item is contained in the consequents of several sequential rules, then it will be predicted several times. In such a case, we update its probability as the maximal value. See Algorithm 2.

Probability Model of Periodic Pattern
Periodic events intrinsically reoccur with a fixed period. However, in the real world, the period is influenced by different factors, leading to a fluctuation in the period. In this case, we only have an average value for periods. As we all know, the smaller the fluctuation, the better. So, we compare the fluctuation with the average period. If the fluctuation is too large compared to the average period, we do not define it as a periodic event. The coefficient of variation, which is the specific value of the standard deviation and mean value, is suitable for periodicity measure, since standard deviation is a good measure for fluctuation. The service life of a product is no exception. If a product is bought periodically by a customer, then the period will be nearly equal to the service life of the product. However, service lives of a kind of products may differ from one another, leading to a fluctuation in the period.
According to Definition 5, we define periodic patterns based on the coefficient of variation. If a pattern has a higher coefficient of variation, which means that the standard deviation is too large compared to the mean value, the pattern is not periodic. Otherwise, we classify it as a periodic pattern. Given a periodic pattern, if it occurs at some time, then it is more likely to reoccur after an average period of time. The probability at a time closer to the time after an average period later has a higher value; otherwise, it has a lower value. We suppose that the probability model of periods follows a normal distribution and has two parameters: mean(Per(p)) and std(Per(p)), as Figure 2 shows.

Principle of Periodic Pattern Prediction
Given transaction sequence B = b 1 , ..., b r , and a periodic pattern p of B, the prediction of the probability of P(p ⊆ b r+1 ) is analogous to the sequential rule prediction.

Association Rule Prediction
Given an association rule p 1 → p 2 : Con f , if itemset pattern p 1 ⊆ b, then itemset pattern p 2 ⊆ b with a probability of P( After sequential rule prediction and periodic pattern prediction, we obtain a set of candidate products with their probability in b r+1 . At the same time, we define the probability of p 1 ⊆ b r+1 as P(p 1 ⊆ b r+1 ) = min{P(i)|∀i ∈ p 1 , (i : P(i)) ∈ pl}. Therefore, we get the probability of p 2 ⊆ b r+1 as P(p 2 ⊆ b r+1 ) = P(p 1 ⊆ b r+1 ) × Con f . See Algorithm 3.
After pattern prediction, if a product has not been predicted, then it will have a default probability value of zero and is not included in b r+1 to be recommended in the pattern prediction stage. Therefore, we obtain a set of candidate products along with their probabilities in the next basket b r+1 . In b r+1 , a product with a higher probability will have priority to be recommended to customers. If |b r+1 | < k, that is, the number of candidate items is less than k in b r+1 , then we will make a preference prediction to continue to select products to recommend until we get all k products.

Preference Prediction
In pattern prediction, we selected a set of products to recommend to a customer. If the number of selected products is less than k, then we continue to select based on preference prediction.
In preference prediction, if a product is more frequently bought by a customer, then we conclude that the customer has a preference for this product. However, the preference will evolve over time. The purchase distribution of a product over a shopping history will indicate a change in preference. If a product is more frequently bought in the recent baskets than the earlier ones of a customer, then the customer is more and more inclined toward the product; otherwise, if a product is more frequently bought in earlier baskets than recent ones, then the customer tends to be increasingly estranged from the product. According to Definition 8, tendency reflects such a fact. The preference prediction is based on the frequency and tendency of a product in customers' shopping histories. We first select the products that are more frequent to recommend. If some products have the same frequency, then the product that has a higher tendency is prioritized.

Relation of Two Prediction Strategies
In pattern prediction, we must set a value to the frequency threshold, confidence threshold and threshold for the coefficient of variation in pattern mining. These parameters determine the weight of pattern prediction and preference prediction.
The confidence threshold parameter is set for sequential rule mining and association rule mining. A higher value of the confidence threshold is set, and fewer sequential rules and association rules are found. The threshold parameter for the coefficient of variation is set for periodic pattern mining. If it has a lower value, then fewer periodic patterns are found. The frequency threshold parameter is set for all three types of patterns. If a higher value is set, then fewer patterns are found. There is no parameter for preference prediction.
In general, if we have a higher value on the frequency threshold and confidence threshold and a lower value on the threshold for the coefficient of variation, then we will select fewer products in pattern prediction. Preference prediction will have a higher weight on the selected result, and pattern prediction will have the opposite effect. Conversely, preference prediction will have a lower weight on the selected result.

Optimizations
In pattern mining, the result grows exponentially as the pattern expands in size. It is a severe problem that traditional pattern mining approaches face to efficiently mine large patterns in dense datasets. The growth of the number of patterns is not proportional to the improvement of prediction performance in our method but leads to heavier workload. In the pattern prediction of our method, if an item is contained in several patterns, it will probably be repeatedly predicted several times, leading to redundancies in workload. The larger the number of patterns or the larger the size of the pattern, the more redundancies in the workload.
For the two reasons mentioned above, we use simple association rules, simple sequential rules and simple periodic patterns to implement pattern prediction. This strategy dramatically reduces the number of patterns and is easier to implement. As a result, our algorithm is incomplete. We use two ready-made algorithms introduced by Philippe Fournier-Viger et al. [13], namely, FPGrowth and RuleGen, to mine simple association rules and simple sequential rules, respectively. The implementation of simple periodic pattern mining will be discussed in the next subsection.

Data Structure
Inspired by Fabio Fumarola et al. [49], we propose a new data structure named vertical bit-list to represent the dataset. Each item i ∈ I is assigned a vertical bit-list. A vertical bit-list is made up of a bit vector and several integer arrays, as Figure 3 shows. Bit vectors have a size of the dataset size, i.e. the number of sequences in the dataset. The jth bit, j ∈ {1, ..., |D|}, in each bit vector correspond to the jth sequence in the dataset D. If the jth bit in the bit vector of a vertical bit-list is flipped, then means there exist an itemset in the jth sequence contains the item to which the vertical bit-list is assigned, and each flipped bit is assigned an integer array to preserve the occurrence list of the item in the sequence to which the flipped bit correspond. If the jth sequence contains item i, that is, there exist itemsets in the jth sequence containing item i, then the frequency and tendency of i in the jth sequence can be calculated by its corresponding integer array, and we can determine whether the item i is a simple periodic pattern in the jth sequence based on the integer array according to Definition 5, since simple periodic pattern mining is free from pattern growth.

Complexity
In the stage of pattern mining, we dramatically reduce the complexity of our algorithm by mining only simple sequential rules, simple association rules and simple periodic patterns. We mine simple sequential rules and simple association rules in O(m 2 ), where m is the number of distinct items in I since the rule of two types only contains two items. Simple periodic pattern mining and tendency calculation have a complexity of O(n × v), n is the number of all customers and v is the average count of products the customers have bought, that is, one scan over the dataset, and the complexity of the prediction stage is the same.

Experiment
To evaluate the performance of our SPAP algorithm, experiments were conducted on four read-world datasets. First, we assessed the influences of parameters on the weight of pattern prediction and preference prediction. Second, we compared our algorithm with those of the baseline methods and state-of-the-art methods in the evaluation metrics of F1-Score and Hit-Ratio. The experiments were conducted on a computer with an Intel Core I7-8550U 1.8 GHz processor and 8 GB of RAM, running Windows 10 (64-bit version). The SPAP is implemented in Java.

•
Ta-Feng is a dataset of a physical market, covering food, stationery and furniture, with a total of 23,812 different items in China. It contains 817,741 transactions made by 32,266 customers over 4 months. • The Dunnhumby (Dunnh for short) dataset contains household level transactions over two years from a group of 2500 households who are frequent shoppers at a retailer. • The X5-Retail-Hero (X5RH for short) dataset contains 183,258 transactions made by 9305 customers over 4 months and a total of 27,766 distinct items. • The T-Mall dataset records four months of online transactions of an online e-commerce website. It contains 4298 transactions belonging to 884 users and 9531 distinct brands considered as items.
In preprocessing these datasets, we remove customers who have fewer than 10 baskets for Ta-Feng, Dunnh and T-Mall, and remove customers who have fewer than 21 baskets for X5RH. For simplicity, we adopt the index of the basket in sequence as the time unit rather than the real date. Table 3 shows the details of these datasets used in our experiment.

Evaluate Metrics F1-Score and Hit-Ratio
Following Guidotti et al. [44], first, we sort the transactions by the timestamps for each customer. Then, we split the dataset into training set and testing set. Testing set contains the latest transaction of all customers for model evaluation. Training set contains the remainder of the transactions of all customers for model training. This is known as leave-one-out strategy. The product set that customer c actually buys is denoted as b c . The product set recommended to customer c is denoted as b * c . The metrics we use for evaluation, F1-Score and Hit-Ratio, are defined as where I( * ) is an indicator function. The F1-Score is reported by the average value of all customers. Furthermore, to evaluate the contribution of two prediction strategies, we introduce a new measure: Weight. Let b pa denote all the items selected in pattern prediction for all customers, and b pr denote all the items selected in preference prediction for all customers. The number of all items selected to recommend to all customers is ∑ c∈C |b * c |, and we have |b pa | + |b pr | = ∑ c∈C |b * c |. The weights of pattern prediction and preference prediction are denoted as Weight pa and Weight pr , respectively, and defined as due to the complementation of Weight pa and Weight pr , we report Weight pa only. A higher value of Weight pa means that more items are selected in pattern prediction, and vice versa.
In the remainder of this paper, we use Weight to replace Weight pa . Note that if the number of all distinct products customer c has bought is less than k, then we cannot select a set of products including more than or equal to k items to recommend to him or her, that is, |b * c | ≤ k, leading to ∑ c∈C |b * c | ≤ k × n, where |C| = n.

Influence of Parameters
Our algorithm is composed of two prediction strategies. We obtain the best performance only on the right proportion of weight on pattern prediction and preference prediction. There are four parameters in pattern prediction: relative frequency threshold θ r and confidence threshold η for sequential rule mining and association rule mining, threshold for coefficient of variation δ and absolute frequency threshold θ p for periodic pattern mining. All of these parameters have an influence on the weight of these two prediction strategies. In this subsection, we will evaluate the influences of these parameters. k is set to the average basket size of each dataset. The values for the remainder of the parameters are preset to be θ r = 0.16, η = 0.2, δ = 0.2 and θ p = 5 for Ta-Feng, θ r = 0.35, η = 0.7, δ = 0.4 and θ p = 12 for Dunnh, θ r = 0.15, η = 0.4, δ = 0.8 and θ p = 10 for X5RH, and θ r = 0.1, η = 0.4, δ = 0.8 and θ p = 5 for T-Mall.
First, we test the parameter of relative frequency threshold θ r , and the results are shown in Figure 4. We can see that when θ r rises, the total number of distinct rules, including sequential rules and association rules, and Weight decrease in all cases of the four datasets. Weight is always lower than 0.5, which means that preference prediction plays a dominant role. For the Ta-Feng dataset, when θ r has a value less than 0.16, both F1-Score and Hit-Ratio remain unchanged and achieve the optimal values, and the situation is the same on the Dunnh dataset when θ r is greater than 0.75. Both F1-Score and Hit-Ratio are constant on the X5RH dataset when θ r is greater than 0.63. For the T-Mall dataset, when θ r is near to 0.11, both F1-Score and Hit-Ratio achieve the optimal values. Figure 5 shows the influences of different values for threshold η. As η rises, the number of rules and Weight decrease on all four datasets. Both F1-Score and Hit-Ratio achieve the optimal values on the Ta-Feng dataset when η is less than 0.38, achieve the optimal values on the Dunnh when η is greater than 0.9 and achieve the optimal values on the T-Mall dataset when η is equal to 0.5, respectively. When η is greater than 0.8, F1-Score or Hit-Ratio achieves the optimal value on the X5RH dataset.
The threshold for the coefficient of variation δ is set for periodic pattern mining. A higher value for δ will lead to a larger number of periodic patterns, as Figure 6 shows. At the same time, a larger number of periodic patterns results in a higher Weight on all four datasets. When δ is less than 0.4, F1-Score or Hit-Ratio achieves the optimal value on the Ta-Feng dataset. Both F1-Score and Hit-Ratio remain unchanged and achieve the optimal values on the Dunnh dataset when δ is greater than 2.8, and the situation is the same on the X5RH and T-Mall datasets when δ is greater than 2.0.
Finally, we test the absolute frequency threshold θ p for periodic pattern mining. As shown in Figure 7, when θ p rises, the number of periodic patterns and Weight decrease on all four datasets. Both F1-Score and Hit-Ratio remain unchanged and achieve the optimal values on the Ta-Feng dataset when θ p is greater than 11, and the situation is the same on the Dunnh dataset when θ p is greater than 27. In the case of X5RH, when θ p is equal to 7, both F1-Score and Hit-Ratio achieve the optimal values. For the T-Mall dataset, F1-Score and Hit-Ratio achieve their optimal values when θ p is equal to 10 and 5, respectively.

Comparison with Baseline Methods and State-of-the-Art Methods
In this subsection, we report the comparisons of our method with baseline methods, including TOP, MC, CLF and N MF; and state-of-the-art methods, including HRM, TBP, TIFU and UPCF. Source code for MC, CLF, N MF, HRM and TBP are provided by [44] (https:// github.com/GiulioRossetti/tbp-next-basket (Accessed data: 19 October 2021)); and all the algorithms run under the recommended parameter values. Results are shown in Figures 8 and 9. TBP runs overtime on the Dunnh dataset. Figure 8 shows the results of F1-Score, we can see that our SPAP algorithm has the best F1-Score on the Ta-Feng dataset when k is less than 14, and closes to TOP on the Dunnh and X5RH datasets. Figure 9 shows the results of Hit-Ratio, SPAP outperforms the other algorithms on the Ta-Feng dataset, and closes to TOP on the Dunnh and X5RH datasets.
We noticed that all algorithms, except TBP, exhibit poor performance for F1-Score when k is too higher or lower than the average basket size. The most obvious finding is on T-Mall dataset, which has the smallest average basket size. Because F1-Score is relevant to the size of recommended basket b * c . A higher value for k means we will select more products to recommend, leading to a large size for b * c and a lower value for precison(b c , b * c ); otherwise, we have a lower value for recall(b c , b * c ). However, when the value of k increases, it results in a higher possibility of being hit, and Hit-Ratio rises naturally. Only k has a value nearer to the average basket size, and we have a fairer comparison.
As described above, almost all algorithms exhibited their best performance when k was set as a value of the average basket size. Table 4 lists the performance results in which k has a value of the average basket size, k = 6, k = 9, k = 5 and k = 3 for Ta-Feng, Dunnh, X5RH and T-Mall, respectively. TIFU has the best performances on the T-Mall dataset. However, our SPAP algorithm outperforms other algorithms on the Ta-Feng, Dunnh and X5RH datasets. Table 4. Comparison with those of the baseline methods and state-of-the-art methods when k is set as average basket size value. The bold is the maximum of the all, and the underline is the maximum of the competing algorithms.   Patterns have a virtue of reflecting customers' shopping habits. Due to the fact that people visit physical stores more regularly than online stores, our pattern-based model SPAP achieves the best performances on physical store datasets Ta-Feng, Dunnh and X5RH. Table 5 lists the improvements of SPAP compared with TOP on the Ta-Feng, Dunnh and X5RH datasets; and compared with TIFU on the T-Mall dataset.

Running Time
All of these algorithms have a running time ranging from several seconds to several hours. TOP is always the fast one [44]. In this subsection, we compare the running time (training time and prediction time) of our algorithm with TOP. Both of them are implemented in Java. The results are shown in Table 6. Source code for MC, CLF, N MF, HRM, TBP, TIFU and UPCF are implemented in Python, so we do not report their running times here. We terminate the processe of TBP on the Dunnh dataset when it runs over two days.

Conclusions
In this paper, we propose a pattern-based model for next basket prediction. The method includes pattern prediction and preference prediction. In pattern prediction, first, all sequential rules, periodic patterns and association rules are found together with their statistical characteristics. Then, probability models are built based on their statistical characteristics. Afterward, we use the probability models to calculate the probability of all products in the next basket for a customer. The products that have a higher probability will be selected to recommend to him or her. If k products have been selected, then continue to the prediction of the next customer; otherwise, make preference predictions. In preference prediction, the product that is more frequent in the individual shopping records will be selected first. If some products have the same frequency, then the product that has a higher tendency will be selected. Until all k products are selected. Experiments show that our algorithm outperforms those of the baseline methods and state-of-the-art methods on three of four real-world transaction sequence datasets.
Author Contributions: G.C. implemented the experiment and wrote the first draft of the paper, Z.L. provided funding for the paper and revised it. All authors have read and agreed to the published version of the manuscript.