Next Article in Journal
Discrete versus Continuous Algorithms in Dynamics of Affective Decision Making
Previous Article in Journal
A Pattern Recognition Analysis of Vessel Trajectories
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Online Item-Choice Behavior: A Shape-Restricted Regression Approach

1
Product Development Management Office, Recruit Co., Ltd., Tokyo 100-6640, Japan
2
Faculty of Science and Engineering, Hosei University, Tokyo 184-8584, Japan
3
Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba 305-8573, Japan
4
Erdos Inc., Yokohama 222-0033, Japan
*
Author to whom correspondence should be addressed.
Algorithms 2023, 16(9), 415; https://doi.org/10.3390/a16090415
Submission received: 13 July 2023 / Revised: 10 August 2023 / Accepted: 22 August 2023 / Published: 29 August 2023
(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Abstract

:
This paper examines the relationship between user pageview (PV) histories and their itemchoice behavior on an e-commerce website. We focus on PV sequences, which represent time series of the number of PVs for each user–item pair. We propose a shape-restricted optimization model that accurately estimates item-choice probabilities for all possible PV sequences. This model imposes monotonicity constraints on item-choice probabilities by exploiting partial orders for PV sequences, according to the recency and frequency of a user’s previous PVs. To improve the computational efficiency of our optimization model, we devise efficient algorithms for eliminating all redundant constraints according to the transitivity of the partial orders. Experimental results using real-world clickstream data demonstrate that our method achieves higher prediction performance than that of a state-of-the-art optimization model and common machine learning methods.

1. Introduction

A growing number of companies are now operating e-commerce websites that allow users to browse and purchase a variety of items [1]. Within this context, there is great potential value in analyzing users’ item-choice behavior from clickstream data, which is a record of user pageview (PV) histories on an e-commerce website. By grasping users’ purchase intention as revealed by PV histories, we can lead users to target pages or design special sales promotions, providing companies with opportunities to build profitable relationships with users [2,3]. Companies can also use clickstream data to improve the quality of operational forecasting and inventory management [4]. Meanwhile, users often find it difficult to select an appropriate item from among the plethora of choices presented by e-commerce websites [5]. Analyzing item-choice behavior can improve the performance of recommendation systems that help users discover items of interest [6]. For these reasons, a number of prior studies have investigated clickstream data from various perspectives [7]. In this study, we focused on closely examining the relationship between PV histories and item-choice behavior on an e-commerce website.
It has been demonstrated that the recency and frequency of a user’s past purchases are critical indicators for purchase prediction [8,9] and sequential pattern mining [10]. Accordingly, Iwanaga et al. [11] developed a shape-restricted optimization model for estimating item-choice probabilities from the recency and frequency of a user’s previous PVs. Their method creates a two-dimensional probability table consisting of item-choice probabilities for all recency–frequency combinations in a user’s previous PVs. Nishimura et al. [12] employed latent-class modeling to integrate item heterogeneity into a two-dimensional probability table. These prior studies demonstrated experimentally that higher prediction performance was achieved with the two-dimensional probability table than with common machine learning methods, namely, logistic regression, kernel-based support vector machines, artificial neural networks, and random forests. Notably, however, reducing PV histories to two dimensions (recency and frequency) can markedly decrease the amount of information contained in PV histories reflecting item-choice behavior.
This study focused on PV sequences, which represent time series of the number of PVs for each user–item pair. In contrast to the two-dimensional probability table, PV sequences allow us to retain detailed information contained in the PV history. However, the huge number of possible PV sequences makes it extremely difficult to accurately estimate item-choice probabilities for all of them. To overcome this difficulty, we propose a shape-restricted optimization model that imposes monotonicity constraints on item-choice probabilities based on a partially ordered set (poset) for PV sequences. While this optimization model contains a huge number of constraints, redundant constraints can be eliminated according to the transitivity of the partial order. To accomplish this, we compute a transitive reduction [13] of a directed graph representing the poset. We demonstrate the effectiveness of our method through experiments using real-world clickstream data.
The main contributions of this paper are as follows:
  • We propose a shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence. This PV sequence model exploits the monotonicity constraints to precisely estimate item-choice probabilities.
  • We derive two types of PV sequence posets according to the recency and frequency of a user’s previous PVs. Experimental results show that the monotonicity constraints based on these posets greatly enhances the prediction performance of our PV sequence model.
  • We devise constructive algorithms for transitive reduction specific to these posets. The time complexity of our algorithms is much smaller than that of general-purpose algorithms. Experimental results reveal that transitive reduction improves efficiency in terms of both the computation time and memory usage of our PV sequence model.
  • We verify experimentally that higher prediction performance is achieved with our method than with the two-dimensional probability table and common machine learning methods, namely, logistic regression, artificial neural networks, and random forests.
The remainder of this paper is organized as follows. Section 2 provides a brief review of related work. Section 3 describes the two-dimensional probability table [11], and Section 4 presents our PV sequence model. Section 5 describes our constructive algorithms for transitive reduction. Section 6 evaluates the effectiveness of our method based on experimental results. Section 7 concludes with a brief summary of this work and a discussion of future research directions.

2. Related Work

This section briefly surveys methods for predicting online user behavior and discusses some related work on shape-restricted regression.

2.1. Prediction of Online User Behavior

A number of prior studies have aimed at predicting users’ purchase behavior on e-commerce websites [14]. Mainstream research has applied stochastic or statistical models for predicting purchase sessions [9,15,16,17,18,19,20], but these approaches do not consider which items users choose.
Various machine learning methods have been used to predict online item-choice behavior, including logistic regression [21,22], association rule mining [23], support vector machines [22,24], ensemble learning methods [25,26,27,28,29], and artificial neural networks [30,31,32]. Tailored statistical models have also been proposed. For instance, Moe [33] devised a two-stage multinomial logit model that separates the decision-making process into item views and purchase decisions. Yao et al. [34] proposed a joint framework consisting of user-level factor estimation and item-level factor aggregation based on the buyer decision process. Borges and Levener [35] used Markov chain models to estimate the probability of the next link choice of a user.
These prior studies effectively utilized clickstream data in various prediction methods and showed that the consideration of time-evolving user behavior is crucial for the precise prediction of online item-choice behavior. We therefore focused on user PV sequences to estimate item-choice probabilities on e-commerce websites.
On the other hand, various predictive features have been used for analyzing online user behavior; these include user demographics, item characteristics, transaction timestamps, accessing devices, touchpoints, and user locations [14]. Indeed, Nishimura et al. [12] demonstrated that the prediction performance can be improved by combining PV histories with item categories. To further upgrade the prediction performance, we aim at exploiting the detailed information contained in the PV history rather than developing a comprehensive model that integrates various predictive features. If successful, we can improve the prediction performance of comprehensive models through ensemble learning. Additionally, user PV histories are often easily accessible in practical situations, whereas detailed personal information can be unavailable due to a privacy issue.
Recently, deep learning methods have been actively studied to predict online user behavior especially in recommender systems; these include the multilayer perceptron, the autoencorder, convolutional/recurrent neural networks, and neural attention models [36]. In particular, graph neural networks [37,38] that operate on graph data have been used effectively because most of the information about users’ item-choice behavior has a graph structure. Sophisticated methods based on graph neural networks have been proposed for purchase prediction [39,40,41,42], and these can be considered as a top line on the prediction performance. In contrast, our method has a different advantage from such deep learning methods; it is a simple interpretable nonparametric model specialized in handling PV sequences based on properties indicated by the recency and frequency. Moreover, we evaluated the validity of our method by comparison with machine learning methods that have commonly been used in prior studies.

2.2. Shape-Restricted Regression

In many practical situations, prior information is known about the relationship between explanatory and response variables. For instance, utility functions can be assumed to be increasing and concave according to economic theory [43], and option pricing functions to be monotone and convex according to finance theory [44]. Shape-restricted regression fits a nonparametric function to a set of given observations under shape restrictions such as monotonicity, convexity, concavity, or unimodality [45,46,47,48].
Isotonic regression is the most common method for shape-restricted regression. In general, isotonic regression is the problem of estimating a real-valued monotone (non-decreasing or non-increasing) function with respect to a given partial order of observations [49]. Some regularization techniques [50,51] and estimation algorithms [49,52,53] have been proposed for isotonic regression.
One of the greatest advantages of shape-restricted regression is that it mitigates overfitting, thereby improving the prediction performance of regression models [54]. To utilize this advantage, Iwanaga et al. [11] devised a shape-restricted optimization model for estimating item-choice probabilities on e-commerce websites. Along similar lines, we propose a shape-restricted optimization model based on order relations of PV sequences to improve prediction performance.

3. Two-Dimensional Probability Table

This section briefly reviews the two-dimensional probability table proposed by Iwanaga et al. [11].

3.1. Empirical Probability Table

Table 1 gives an example of a PV history for six user–item pairs. For instance, user u 1 viewed the page for item i 2 once on 1 and 3 April, respectively. We focus on user choices (e.g., revisit and purchase) on 4 April, which we call the base date. For instance, user u 1 chose item i 4 rather than item i 2 on the base date. We suppose for each user–item pair that recency and frequency are characterized by the last PV day and the total number of PVs, respectively. As shown in the table, the PV history can be summarized by the recency–frequency combination ( r , f ) R × F , where R and F are index sets representing recency and frequency, respectively.
Let n r f be the number of user–item pairs having ( r , f ) R × F , and set q r f be the number of choices occurring by user–item pairs that have ( r , f ) R × F on the base date. In the case of Table 1, the empirical probability table is calculated as
x ^ r f : = q r f n r f ( r , f ) R × F = 0 / 0 0 / 0 0 / 1 1 / 1 0 / 0 0 / 0 0 / 0 0 / 1 1 / 3 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.33 ,
where, for reasons of expediency, x ^ r f : = 0 for ( r , f ) R × F with n r f = 0 .

3.2. Two-Dimensional Monotonicity Model

It is reasonable to assume that the recency and frequency of user–item pairs are positively associated with user item-choice probabilities. To estimate user item-choice probabilities x r f for all recency–frequency combinations ( r , f ) R × F , the two-dimensional monotonicity model [11] minimizes the weighted sum of squared errors under monotonicity constraints with respect to recency and frequency.
minimize ( x r f ) ( r , f ) R × F ( r , f ) R × F n r f ( x r f x ^ r f ) 2
subject to x r f x r + 1 , f ( ( r , f ) R × F ) ,
x r f x r , f + 1 ( ( r , f ) R × F ) ,
0 x r f 1 ( ( r , f ) R × F ) .
Note, however, that PV histories are often indistinguishable according to recency and frequency. A typical example is the set of user–item pairs ( u 2 , i 3 ) , ( u 2 , i 4 ) , and ( u 3 , i 2 ) in Table 1; although their PV histories are actually different, they have the same value ( r , f ) = ( 3 , 3 ) for recency–frequency combinations. As described in the next section, we exploit the PV sequence to distinguish between such PV histories.

4. PV Sequence Model

This section presents our shape-restricted optimization model for estimating item-choice probabilities from a user’s previous PV sequence.

4.1. PV Sequence

The PV sequence for each user–item pair represents a time series of the number of PVs, and is written as
v : = ( v 1 , v 2 , , v n ) ,
where v j is the number of PVs j periods earlier for j = 1 , 2 , , n (see Table 1). Note that sequence terms are arranged in reverse chronological order; thus, v j moves into the past as the index j increases.
Throughout the paper, we express sets of consecutive integers as
[ m 1 , m 2 ] : = { m 1 , m 1 + 1 , , m 2 } Z ,
where [ m 1 , m 2 ] = when m 1 > m 2 . The set of possible PV sequences is defined as
Γ : = [ 0 , m ] n = { 0 , 1 , , m } n ,
where m is the maximum number of PVs in each period, and n is the number of periods considered.
Our objective is to estimate item-choice probabilities x v for all PV sequences v Γ . However, the huge number of PV sequences makes it extremely difficult to accurately estimate such probabilities. In the case of ( n , m ) = ( | R | , | F | ) = ( 5 , 6 ) , for instance, the number of different PV sequences is ( m + 1 ) n = 16 , 807 , whereas the number of recency–frequency combinations is only | R | · | F | = 30 . To avoid this difficulty, we effectively utilize monotonicity constraints on item-choice probabilities as in the optimization model (2)–(5). In the next section, we introduce three operations underlying the development of the monotonicity constraints.

4.2. Operations Based on Recency and Frequency

From the perspective of frequency, it is reasonable to assume that item-choice probability increases as the number of PVs in a particular period increases. To formulate this, we define the following operation:
Definition 1
( Up ). On the domain
D U : = { ( v , s ) Γ × [ 1 , n ] v s m 1 } ,
the function Up : D U Γ is defined as
( ( , v s , ) , s ) ( , v s + 1 , ) .
For instance, we have Up ( ( 0 , 1 , 1 ) , 1 ) = ( 1 , 1 , 1 ) , and Up ( ( 1 , 1 , 1 ) , 2 ) = ( 1 , 2 , 1 ) . Since this operation increases PV frequencies, the monotonicity constraint x ( 0 , 1 , 1 ) x ( 1 , 1 , 1 ) x ( 1 , 2 , 1 ) should be satisfied by item-choice probabilities.
From the perspective of recency, we assume that more-recent PVs have a larger effect on increasing item-choice probability. To formulate this, we consider the following operation for moving one PV from an old period to a new period:
Definition 2
( Move ). On the domain
D M : = { ( v , s , t ) Γ × [ 1 , n ] × [ 1 , n ] v s m 1 , v t 1 , s < t } ,
the function Move : D M Γ is defined as
( ( , v s , , v t , ) , s , t ) ( , v s + 1 , , v t 1 , ) .
For instance, we have Move ( ( 1 , 1 , 1 ) , 2 , 3 ) = ( 1 , 2 , 0 ) , and Move ( ( 1 , 2 , 0 ) , 1 , 2 ) = ( 2 , 1 , 0 ) . Because this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint x ( 1 , 1 , 1 ) x ( 1 , 2 , 0 ) x ( 2 , 1 , 0 ) .
The PV sequence v = ( 1 , 1 , 1 ) represents a user’s continued interest in a certain item over three periods. In contrast, the PV sequence v = ( 1 , 2 , 0 ) implies that a user’s interest decreased over the two most-recent periods. In this sense, the monotonicity constraint x ( 1 , 1 , 1 ) x ( 1 , 2 , 0 ) may not be validated. Accordingly, we define the following alternative operation, which exchanges numbers of PVs to increase the number of recent PVs:
Definition 3
( Swap ). On the domain
D S : = { ( v , s , t ) Γ × [ 1 , n ] × [ 1 , n ] v s < v t , s < t } ,
the function Swap : D S Γ is defined as
( ( , v s , , v t , ) , s , t ) ( , v t , , v s , ) .
We thus have Swap ( ( 1 , 0 , 2 ) , 2 , 3 ) = ( 1 , 2 , 0 ) because v 2 < v 3 , and Swap ( ( 1 , 2 , 0 ) , 1 , 2 ) = ( 2 , 1 , 0 ) because v 1 < v 2 . Since this operation increases the number of recent PVs, item-choice probabilities should satisfy the monotonicity constraint x ( 1 , 0 , 2 ) x ( 1 , 2 , 0 ) x ( 2 , 1 , 0 ) . Note that the monotonicity constraint x ( 1 , 1 , 1 ) x ( 1 , 2 , 0 ) is not implied by this operation.

4.3. Partially Ordered Sets

Let U Γ be a subset of PV sequences. The image of each operation is then defined as
Up ( U ) = { Up ( u , s ) u U , ( u , s ) D U } , Move ( U ) = { Move ( u , s , t ) u U , ( u , s , t ) D M } , Swap ( U ) = { Swap ( u , s , t ) u U , ( u , s , t ) D S } .
We define UM ( U ) : = Up ( U ) Move ( U ) for U Γ . The following definition states that the binary relation u UM v holds when u can be transformed into v by the repeated application of Up and Move:
Definition 4
( UM ). Suppose u , v Γ . We write u UM v if and only if there exists k 1 such that
v UM k ( { u } ) = UM UM UM k compositions ( { u } ) .
We also write u UM v if u UM v or u = v .
Similarly, we define US ( U ) : = Up ( U ) Swap ( U ) for U Γ . Then, the binary relation u US v holds when u can be transformed into v by the repeated application of Up and Swap.
Definition 5
( US ). Suppose u , v Γ . We write u US v if and only if there exists k 1 , such that
v US k ( { u } ) = US US US k compositions ( { u } ) .
We also write u US v if u US v or u = v .
To prove properties of these binary relations, we can use the lexicographic order, which is a well-known linear order [55]:
Definition 6
( lex ). Suppose u , v Γ . We write u lex v if and only if there exists s [ 1 , n ] , such that u s < v s and u j = v j for j [ 1 , s 1 ] . We also write u lex v if u lex v or u = v .
Each application of Up, Move, and Swap makes a PV sequence greater in the lexicographic order. Therefore, we can obtain the following lemma:
Lemma 1.
Suppose u , v Γ . If u UM v or u US v , then u lex v .
The following theorem states that a partial order of PV sequences is derived by operations Up and Move.
Theorem 1.
The pair ( Γ , UM ) is a poset.
Proof. 
From Definition 4, the relation UM is reflexive and transitive. Suppose u UM v and v UM u . It follows from Lemma 1 that u lex v and v lex u . Since the relation lex is antisymmetric, we have u = v , thus proving that the relation UM is also antisymmetric.    □
We can similarly prove the following theorem for operations Up and Swap:
Theorem 2.
The pair ( Γ , US ) is a poset.

4.4. Shape-Restricted Optimization Model

Let n v be the number of user–item pairs that have the PV sequence v Γ . Moreover, q v is the number of choices arising from user–item pairs having v Γ on the base date. Similarly to Equation (1), we can calculate empirical item-choice probabilities as
x ^ v : = q v n v ( v Γ ) .
Our shape-restricted optimization model minimizes the weighted sum of squared errors subject to the monotonicity constraint:
minimize ( x v ) v Γ v Γ n v ( x v x ^ v ) 2
subject to x u x v ( u , v Γ with u v ) ,
0 x v 1 ( v Γ ) ,
where u v in Equation (8) is defined by one of the partial orders UM or US .
The monotonicity constraint (8) enhances the estimation accuracy of item-choice probabilities. In addition, our shape-restricted optimization model can be used in a post-processing step to improve the prediction performance of other machine learning methods. Specifically, we first compute item-choice probabilities using a machine learning method and then substitute the computed values into ( x ^ v ) v Γ to solve the optimization model (7)–(9). Consequently, we can obtain item-choice probabilities corrected by the monotonicity constraint (8). Section 6.4 illustrates the usefulness of this approach.
However, since | Γ | = ( m + 1 ) n , it follows that the number of constraints in Equation (8) is O ( ( m + 1 ) 2 n ) , which can be extremely large. When ( n , m ) = ( 5 , 6 ) , for instance, we have ( m + 1 ) 2 n = 282 , 475 , 249 . The next section describes how we mitigate this difficulty by removing redundant constraints in Equation (8).

5. Algorithms for Transitive Reduction

This section describes our constructive algorithms for transitive reduction to decrease the problem size in our shape-restricted optimization model.

5.1. Transitive Reduction

A poset ( Γ , ) can be represented by a directed graph ( Γ , E ) , where Γ and E Γ × Γ are sets of nodes and directed edges, respectively. Each directed edge ( u , v ) E in this graph corresponds to the order relation u v , so the number of directed edges coincides with the number of constraints in Equation (8).
Figure 1 and Figure 2 show directed graph representations of posets ( Γ , UM ) and ( Γ , US ) , respectively. Each edge in Figure 1a and Figure 2a corresponds to one of the operations Up, Move, or Swap. Edge ( u , v ) is red if v Up ( { u } ) and black if v Move ( { u } ) or v Swap ( { u } ) . The directed graphs in Figure 1a and Figure 2a can be easily created.
Suppose there are three edges ( u , w ) , ( w , v ) , ( u , v ) E . In this case, edge ( u , v ) is implied by the other edges due to the transitivity of partial order
u w , w v u v ,
or, equivalently,
x u x w , x w x v x u x v .
As a result, edge ( u , v ) is redundant and can be removed from the directed graph.
A transitive reduction, also known as a Hasse diagram, of a directed graph ( Γ , E ) is its subgraph ( Γ , E * ) such that all redundant edges are removed using the transitivity of partial order [13]. Figure 1b and Figure 2b show transitive reductions of the directed graphs shown in Figure 1a and Figure 2a, respectively. By computing transitive reductions, the number of edges is reduced from 90 to 42 in Figure 1, and from 81 to 46 in Figure 2. This transitive reduction is known to be unique [13].

5.2. General-Purpose Algorithms

The transitive reduction ( Γ , E * ) is characterized by the following lemma [55]:
Lemma 2.
Suppose ( u , v ) Γ × Γ . Then, ( u , v ) E * holds if and only if both of the following conditions are fulfilled:
(C1)
u v , and
(C2)
if w Γ satisfies u w v , then w { u , v } .
The basic strategy in general-purpose algorithms for transitive reduction involves the following steps:
Step 1:
An exhaustive directed graph ( Γ , E ) is generated from a given poset ( Γ , ) .
Step 2:
The transitive reduction ( Γ , E * ) is computed from the directed graph ( Γ , E ) using Lemma 2.
Various algorithms for speeding up the computation in Step 2 have been proposed. Recall that | Γ | = ( m + 1 ) n in our situation. Warshall’s algorithm [56] has time complexity O ( ( m + 1 ) 3 n ) for completing Step 2 [55]. By using a sophisticated algorithm for fast matrix multiplication, this time complexity can be reduced to O ( ( m + 1 ) 2.3729 n )  [57].
However, such general-purpose algorithms are clearly inefficient, especially when n is very large, and Step 1 requires a huge number of computations. To resolve this difficulty, we devised specialized algorithms for directly constructing a transitive reduction.

5.3. Constructive Algorithms

Let ( Γ , E UM * ) be a transitive reduction of a directed graph ( Γ , E UM ) representing the poset ( Γ , UM ) . Then, the transitive reduction can be characterized by the following theorem:
Theorem 3.
Suppose ( u , v ) Γ × Γ . Then, ( u , v ) E UM * holds if and only if any one of the following conditions is fulfilled:
(UM1)
v = Up ( u , n ) , or
(UM2)
there exists s [ 1 , n ] such that v = Move ( u , s , s + 1 ) .
Proof. 
See Appendix A.1.    □
Theorem 3 provides a constructive algorithm that directly computes the transitive reduction ( Γ , E UM * ) without generating an exhaustive directed graph ( Γ , E ) . Our algorithm is based on the breadth-first search [58]. Specifically, we start with a node list L = { ( 0 , 0 , , 0 ) } Γ . At each iteration of the algorithm, we choose u L , enumerate v Γ such that ( u , v ) E UM * , and add these nodes to L.
Table 2 shows this enumeration process for u = ( 0 , 2 , 1 ) with ( n , m ) = ( 3 , 2 ) . The operations Up and Move generate v { ( 1 , 2 , 1 ) , ( 0 , 2 , 2 ) , ( 1 , 1 , 1 ) , ( 1 , 2 , 0 ) } , which amounts to searching edges ( u , v ) in Figure 1a. We next check whether each v satisfies conditions (UM1) or (UM2) in Theorem 3. As shown in Table 2, we choose v { ( 0 , 2 , 2 ) , ( 1 , 1 , 1 ) } and add them to list L; this amounts to enumerating edges ( u , v ) in Figure 1b.
Appendix B.1 presents a pseudocode for our constructive algorithm (Algorithm A1). Recalling the time complexity analysis of the breadth-first search [58], one readily sees that the time complexity of Algorithm A1 is O ( n ( m + 1 ) n ) , which is much smaller than O ( ( m + 1 ) 2.3729 n ) , as achieved by the general-purpose algorithm [57], especially when n is very large.
Next, we focus on the transitive reduction ( Γ , E US * ) of a directed graph ( Γ , E US ) representing the poset ( Γ , US ) . The transitive reduction can then be characterized by the following theorem:
Theorem 4.
Suppose ( u , v ) Γ × Γ . Then, ( u , v ) E US * holds if and only if any one of the following conditions is fulfilled:
(US1)
there exists s [ 1 , n ] such that v = Up ( u , s ) and u j { u s , u s + 1 } for all j [ s + 1 , n ] , or
(US2)
there exists ( s , t ) [ 1 , n ] × [ 1 , n ] such that v = Swap ( u , s , t ) and u j [ u s , u t ] for all j [ s + 1 , t 1 ] .
Proof. 
See Appendix A.2.    □
Theorem 4 also gives a constructive algorithm for computing the transitive reduction ( Γ , E US * ) . Let us again consider u = ( 0 , 2 , 1 ) as an example with ( n , m ) = ( 3 , 2 ) . As shown in Table 3, operations Up and Swap generate v { ( 1 , 2 , 1 ) , ( 0 , 2 , 2 ) , ( 2 , 0 , 1 ) , ( 1 , 2 , 0 ) } , and we choose v { ( 0 , 2 , 2 ) , ( 2 , 0 , 1 ) , ( 1 , 2 , 0 ) } (see also Figure 2a,b).
Appendix B.2 presents the pseudocode for our constructive algorithm (Algorithm A2). Its time complexity is estimated to be O ( n 2 ( m + 1 ) n ) , which is larger than that of Algorithm A1 but much smaller than that of the general-purpose algorithm [57], especially when n is very large.

6. Experiments

The experimental results reported in this section evaluate the effectiveness of our method for estimating item-choice probabilities. We consider the task of predicting the items that a particular user will view again from those which the user has viewed in the past. The performance evaluation methodology is explained in detail in Section 6.2.
We used real-world clickstream data collected from a Chinese e-commerce website, Tmall (https://tianchi.aliyun.com/dataset/, accessed on 21 August 2023). We used a dataset (https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0, accessed on 21 August 2023) preprocessed by Ludewig and Jannach [59]. Each record corresponds to one PV and contains information such as user ID, item ID, and a timestamp. The dataset includes 28,316,459 unique user–item pairs composed from 422,282 users and 624,221 items.

6.1. Methods for Comparison

We compared the performance of the methods listed in Table 4. All computations were performed on an Apple MacBook Pro computer (Apple Inc., Cupertino, CA, USA) with an Intel Core i7-5557U CPU (3.10 GHz) (Intel Corporation, Santa Clara, CA, USA) and 16 GB of memory.
The optimization models (2)–(5) and (7)–(9) were solved using OSQP (https://osqp.org/docs/index.html, accessed on 21 August 2023) [60], a numerical optimization package for solving convex quadratic optimization problems. As in Table 1, daily-PV sequences were calculated for each user–item pair, where m is the maximum number of daily PVs and n is the number of terms (past days) in the PV sequence. In this process, all PVs from more than n days earlier were added to the number of PVs n days earlier, and numbers of daily PVs exceeding m were rounded down to m. Similarly, the recency–frequency combinations ( r , f ) R × F were calculated using daily PVs as in Table 1, where ( | R | , | F | ) = ( n , m ) .
Other machine learning methods (LR, ANN, and RF) were respectively implemented using the LogisticRegressionCV, MLPRegressor, and RandomForestRegressor functions in scikit-learn, a Python library of machine learning tools. We tuned the following hyperparameters through three-fold cross-validation according to the parameter settings in a benchmark study [61]: Activation functions, solvers, and learning rate schedules for ANN; and the number of trees, the minimum weighted fraction at a leaf node, and the number of features considered at each split for RF. We used default values for the other hyperparameters. These machine learning methods employed the PV sequence ( v 1 , v 2 , , v n ) as n input variables for computing item-choice probabilities. We standardized each input variable and performed undersampling to improve prediction performance.

6.2. Performance Evaluation Methodology

There are five pairs of training and validation sets of clickstream data in the preprocessed dataset [59]. As shown in Table 5, each training period is 90 days, and the next day is the validation period. The first four pairs of training and validation sets, which we call the training set, were used for model estimation, and the fifth pair was used for performance evaluation. To examine how sample size affects prediction performance, we prepared small-sample training sets by randomly choosing user–item pairs from the original training set. Here, the sampling rates are 0.1%, 1%, and 10%, and the original training set is referred to as the full sample. Note that the results were averaged over 10 trials for the sampled training sets.
We considered the top-N selection task to evaluate prediction performance. Specifically, we focused on items that were viewed by a particular user during a training period. From among these items, we selected I sel , a set of top-N items for the user according to estimated item-choice probabilities. The most-recently viewed items were selected when two or more items had the same choice probability. Let I view be the set of items viewed by the user in the validation period. Then, the F1 score is defined by the harmonic average of R e c a l l : = | I sel I view | / | I view | and P r e c i s i o n : = | I sel I view | / | I sel | as
F 1 score : = 2 · Recall · Precision Recall + Precision .
In the following sections, we examine F1 scores averaged over all users. The percentage of user–item pairs leading to item choices is only 0.16%.

6.3. Effects of the Transitive Reduction

We generated constraints in Equation (8) based on the following three directed graphs:
Case 1 (Enumeration):
All edges ( u , v ) satisfying u v were enumerated.
Case 2 (Operation):
Edges corresponding to operations Up, Move, and Swap were generated as in Figure 1a and Figure 2a.
Case 3 (Reduction):
Transitive reduction was computed using our algorithms as in Figure 1b and Figure 2b.
Table 6 shows the problem size of our PV sequence model (7)–(9) for some ( n , m ) settings of the PV sequence. Here, the “#Vars” column shows the number of decision variables (i.e., ( m + 1 ) n ), and the subsequent columns show the number of constraints in Equation (8) for the three cases mentioned above.
The number of constraints grew rapidly as n and m increased in the enumeration case. In contrast, the number of constraints was always kept smallest by the transitive reduction among the three cases. When ( n , m ) = ( 5 , 6 ) , for instance, transitive reductions reduced the number of constraints in the operation case to 63,798/195,510 32.6 % for SeqUM and 85,272/144,060 59.2 % for SeqUS.
The number of constraints was larger for SeqUM than for SeqUS in the enumeration and operation cases. In contrast, the number of constraints was often smaller for SeqUM than for SeqUS in the reduction case. Thus, the transitive reduction had a greater impact on SeqUM than on SeqUS in terms of the number of constraints.
Table 7 lists the computation times required for solving the optimization problem (7)–(9) for some ( n , m ) settings of the PV sequence. Here, “OM” indicates that computation was aborted due to a lack of memory. The enumeration case often caused out-of-memory errors because of the huge number of constraints (see Table 6), but the operation and reduction cases completed the computations for all ( n , m ) settings for the PV sequence. Moreover, the transitive reduction made computations faster. A notable example is SeqUM with ( n , m ) = ( 5 , 6 ) , for which the computation time in the reduction case (86.02 s) was only one-tenth of that in the operation case (906.76 s). These results demonstrate that transitive reduction improves efficiency in terms of both computation time and memory usage.
Table 8 shows the computational performance of our optimization model (7)–(9) for some ( n , m ) settings of PV sequences. Here, for each n { 3 , 4 , 9 } , the largest m was chosen such that the computation finished within 30 min. Both SeqUM and SeqUS always delivered higher F1 scores than SeqEmp did. This indicates that our monotonicity constraint (8) works well for improving prediction performance. The F1 scores provided by SeqUM and SeqUS were very similar and were the largest with ( n , m ) = ( 7 , 3 ) . In light of these results, we use the setting ( n , m ) { ( 7 , 3 ) , ( 5 , 6 ) } in the following sections.

6.4. Prediction Performance of Our PV Sequence Model

Figure 3 shows F1 scores of the two-dimensional probability table and our PV sequence model using the sampled training sets, where the number of selected items is N { 3 , 5 , 10 } , and the setting of the PV sequence is ( n , m ) { ( 7 , 3 ) , ( 5 , 6 ) } .
When the full-sample training set was used, SeqUM and SeqUS always delivered a better prediction performance than the other methods did. When the 1%- and 10%-sampled training sets were used, the prediction performance of SeqUS decreased slightly, whereas SeqUM still performed best among all the methods. When the 0.1%-sampled training set was used, 2dimMono always performed better than SeqUS did, and 2dimMono also had the best prediction performance in the case of ( n , m ) = ( 5 , 6 ) . These results suggest that our PV sequence model performs very well, especially when the sample size is sufficiently large. The prediction performance of SeqEmp deteriorated rapidly as the sampling rate decreased, and this performance was always much worse than that of 2dimEmp. Meanwhile, SeqUM and SeqUS maintained high prediction performance even when the 0.1%-sampled training set was used. This suggests that the monotonicity constraint (8) in our PV sequence model is more effective than the monotonicity constraints (3) and (4) in the two-dimensional monotonicity model.
Figure 4 shows F1 scores for the machine learning methods (LR, ANN, and RF) and our PV sequence model (SeqUM) using the full-sample training set, where the number of selected items is N { 3 , 5 , 10 } , and the PV sequence setting is ( n , m ) { ( 7 , 3 ) , ( 5 , 6 ) } . Note that in this figure, SeqUM(*) represents the optimization model (7)–(9), where the item-choice probabilities computed by each machine learning method were substituted into ( x ^ v ) v Γ (see Section 4.4).
Prediction performance was better for SeqUM than for all the machine learning methods, except in the case of Figure 4f, where LR showed better prediction performance. Moreover, SeqUM(*) improved the prediction performance of the machine leaning methods, particularly for ANN and RF. This suggests that our monotonicity constraint (8) is also very helpful in correcting prediction values from other machine learning methods.

6.5. Analysis of Estimated Item-Choice Probabilities

Figure 5 shows item-choice probabilities estimated by our PV sequence model using the full-sample training set, where the PV sequence setting is ( n , m ) = ( 5 , 6 ) . Here, we focus on PV sequences in the form v = ( v 1 , v 2 , v 3 , 0 , 0 ) Γ and depict estimates of item-choice probabilities on ( v 1 , v 2 ) [ 0 , m ] × [ 0 , m ] for each v 3 { 0 , 1 , 2 } . Note also that the number of associated user–item pairs decreased as the value of v 3 increased.
Because SeqEmp does not consider the monotonicity constraint (8), item-choice probabilities estimated by SeqEmp have irregular shapes for v 3 { 1 , 2 } . In contrast, item-choice probabilities estimated with the monotonicity constraint (8) are relatively smooth. Because of the Up operation, item-choice probabilities estimated by SeqUM and SeqUS increase as ( v 1 , v 2 ) moves from ( 0 , 0 ) to ( 6 , 6 ) . Because of the Move operation, item-choice probabilities estimated by SeqUM also increase as ( v 1 , v 2 ) moves from ( 0 , 6 ) to ( 6 , 0 ) . item-choice probabilities estimated by SeqUS are relatively high, at around ( v 1 , v 2 ) = ( 3 , 3 ) . This highlights the difference in the monotonicity constraint (8) between posets ( Γ , UM ) and ( Γ , US ) .
Figure 6 shows item-choice probabilities estimated by our PV sequence model using the 10%-sampled training set, where the PV sequence setting is ( n , m ) = ( 5 , 6 ) . Since the sample size was reduced in this case, item-choice probabilities estimated by SeqEmp are highly unstable. In particular, item-choice probabilities were estimated to be zero for all ( v 1 , v 2 ) with v 1 3 in Figure 6c, but this is unreasonable from the perspective of frequency. In contrast, SeqUM and SeqUS estimated item-choice probabilities that increase monotonically with respect to ( v 1 , v 2 ) .

7. Conclusions

We presented a shape-restricted optimization model for estimating item-choice probabilities on an e-commerce website. Our monotonicity constraints based on tailored order relations could better estimate item-choice probabilities for all possible PV sequences. To improve the computational efficiency of our optimization model, we devised constructive algorithms for transitive reduction that remove all redundant constraints from the optimization model.
We assessed the effectiveness of our method through experiments using real-world clickstream data. Experimental results demonstrated that transitive reduction enhanced the efficiency of our optimization model in terms of both computation time and memory usage. In addition, our method delivered a better prediction performance than did the two-dimensional monotonicity model [11] and common machine learning methods. Our method was also helpful in correcting prediction values computed by other machine learning methods.
This study made three main contributions. First, we derived two types of posets by exploiting the properties of recency and frequency of a user’s previous PVs. These posets allow us to place appropriate monotonicity constraints on item-choice probabilities. Next, we developed algorithms for the transitive reduction of our posets. These algorithms are more efficient than general-purpose algorithms in terms of time complexity for transitive reduction. Finally, our method further expanded the potential of shape-restricted regression for predicting user behavior on e-commerce websites.
Once the optimization model for estimating item-choice probabilities has been solved, the obtained results can easily be put into practical use on e-commerce websites. Accurate estimates of item-choice probabilities will be useful for customizing sales promotions according to the needs of a particular user. In addition, our method can estimate user preferences from clickstream data, therefore aiding in the creation of high-quality user–item rating matrices for recommendation algorithms [6].
In future studies, we will develop new posets that further improve the prediction performance of our PV sequence model. Another direction of future research will be to incorporate user–item heterogeneity into our optimization model, as in the case of latent-class modeling with a two-dimensional probability table [12]. It is also important to compare the prediction performance of our method with that of topline graph neural networks [37,38].

Author Contributions

Conceptualization, Y.T. and J.I.; methodology, N.N. and N.S.; software, N.N. and J.I.; validation, N.N.; formal analysis, N.S.; investigation, Y.T.; resources, N.N.; data curation, N.N.; writing—original draft preparation, Y.T.; writing—review and editing, N.S. and N.N.; visualization, N.N.; supervision, J.I. and Y.T.; project administration, N.N.; funding acquisition, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by a joint research of the University of Tsukuba and Toyota Motor Corporation.

Data Availability Statement

The data set we used is available online at https://www.dropbox.com/sh/dbzmtq4zhzbj5o9/AACldzQWbw-igKjcPTBI6ZPAa?dl=0 (accessed on 21 August 2023). This data set was created by Ludewig and Jannach as a preprocessed version of the clickstream data collected from a Chinese e-commerce website, Tmall, which is also available online at https://tianchi.aliyun.com/dataset/ (accessed on 21 August 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Appendix A.1. Proof of Theorem 3

Appendix A.1.1. The “Only if” Part

Suppose ( u , v ) E UM * . We then have v UM ( { u } ) from Definition 4 and Lemma 2. We therefore consider the following two cases:
Case 1:
v = Up(u, s) for Some s ∈ [1, n]
For the sake of contradiction, assume that s n (i.e., s n 1 ). Then, there exists an index j such that s < j n . If u j > 0 , we set w = Move ( u , s , j ) and then have v = Up ( w , j ) . If u j = 0 , we set w = Up ( u , j ) and then have v = Move ( w , s , j ) . This implies that u UM w UM v , which contradicts ( u , v ) E UM * due to condition (C2) of Lemma 2.
Case 2:
v = Move(u, s, t) for Some (s, t) ∈ [1, n] × [1, n]
For the sake of contradiction, assume that t s + 1 (i.e., t s + 2 ). Then, there exists an index j such that s < j < t . If u j > 0 , we set w = Move ( u , s , j ) and then have v = Move ( w , j , t ) . If u j = 0 , we set w = Move ( u , j , t ) and then have v = Move ( w , s , j ) . This implies that u UM w UM v , which contradicts ( u , v ) E UM * due to condition (C2) of Lemma 2.

Appendix A.1.2. The “if” Part

We show that ( u , v ) E UM * in the following two cases:
Case 1:
Condition (UM1) Is Fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider w Γ such that u UM w UM v . From Lemma 1, we have u lex w lex v . Since u is next to v in the lexicographic order, we have w { u , v } .
Case 2:
Condition (UM2) Is Fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider w Γ such that u UM w UM v . From Lemma 1, we have u lex w lex v , which implies that w j = u j for all j [ 1 , s 1 ] . Therefore, we cannot apply any operations to w j for j [ 1 , s 1 ] in the process of transforming w from u into v . To keep the value of j = 1 n w j constant, we can apply only the Move operation. However, once the Move operation is applied to w j for j [ s + 2 , n ] , the resultant sequence cannot be converted into v . As a result, only Move ( · , s , s + 1 ) can be performed, and therefore w = u or w = Move ( u , s , s + 1 ) = v .

Appendix A.2. Proof of Theorem 4

Appendix A.2.1. The “Only if” Part

Suppose that ( u , v ) E US * . We then have v US ( { u } ) from Definition 5 and Lemma 2. Thus, we consider the following two cases:
Case 1:
v = Up(u, s) for Some s ∈ [1, n]
For the sake of contradiction, assume u j { u s , u s + 1 } for some j [ s + 1 , n ] . If u j = u s , we set w = Up ( u , j ) and then have v = Swap ( w , s , j ) . If u j = u s + 1 , we set w = Swap ( u , s , j ) and then have v = Up ( w , j ) . This implies that u US w US v , which contradicts ( u , v ) E US * due to condition (C2) of Lemma 2.
Case 2:
v = Swap(u, s, t) for Some (s, t) ∈ [1, n] × [1, n]
For the sake of contradiction, assume u j [ u s , u t ] for some j [ s + 1 , t 1 ] . If u s < u j < u t , we set w 1 = Swap ( u , j , t ) and w 2 = Swap ( w 1 , s , j ) and then have v = Swap ( w 2 , j , t ) . If u j = u s , we set w = Swap ( u , j , t ) and then have v = Swap ( w , s , j ) . If u j = u t , we set w = Swap ( u , s , j ) and then have v = Swap ( w , j , t ) . Each of these results contradicts ( u , v ) E US * due to condition (C2) of Lemma 2.

Appendix A.2.2. The “if” Part

We show that ( u , v ) E US * in the following two cases:
Case 1:
Condition (US1) Is Fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider w Γ such that u US w US v . From Lemma 1, we have u lex w lex v , implying that w j = u j for all j [ 1 , s 1 ] . Therefore, we cannot apply any operations to w j for j [ 1 , s 1 ] in the process of transforming w from u into v . We must apply the Up operation only once, because the value of j = 1 n w j remains the same after the Swap operation. Condition (US1) guarantees that for all j [ s + 1 , n ] , w j does not coincide with u s + 1 even if Up ( · , j ) is applied. Therefore, Swap ( · , s , j ) for j [ s + 1 , n ] never leads to w s = u s + 1 . As a result, Up ( · , s ) must be performed. Other applicable Swap operations produce a sequence that cannot be converted into v . This means that w = u or w = Up ( u , s ) = v .
Case 2:
Condition (US2) Is Fulfilled
Condition (C1) of Lemma 2 is clearly satisfied. To satisfy condition (C2), we consider w Γ such that u US w US v . From Lemma 1, we have u lex w lex v . This implies that w j = u j for all j [ 1 , s 1 ] , and that w s [ u s , u t ] . Therefore, we cannot apply any operations to w j for j [ 1 , s 1 ] in the process of transforming w from u into v . To keep the value of j = 1 n w j constant, we can apply only the Swap operation. However, once the Swap operation is applied to w j for j [ t + 1 , n ] , the resultant sequence cannot be converted into v . We cannot adopt w = Swap ( u , s , j ) for j [ s + 1 , t 1 ] due to condition (US2). If we adopt w = Swap ( u , j , t ) for j [ s + 1 , t 1 ] , we have w t u s 1 due to condition (US2); thus, the application of Swap ( · , t , j ) is unavoidable for j [ t + 1 , n ] . As a result, Swap ( · , s , t ) must be performed. Other applicable Swap operations produce a sequence that cannot be converted into v . This means that w = u or w = Swap ( u , s , t ) = v .

Appendix B. Pseudocodes of Our Algorithms

Appendix B.1. Constructive Algorithm for (Γ,E UM * )

The nodes and directed edges of graph ( Γ , E UM * ) are enumerated in a breadth-first search and are stored in two lists L and E, respectively. We use APPEND ( L , v ) , which appends a vertex v to the end of L. We similarly use APPEND ( E , ( u , v ) ) .
A queue Q is used to store nodes of L whose successors are under investigation (the “frontier” of L). The nodes in Q are listed in ascending order of depth, where the depth of v is the shortest-path length from ( 0 , 0 , , 0 ) to v . We use DEQUEUE ( Q ) , which returns and deletes the first element in Q, and ENQUEUE ( Q , v ) , which appends v to the end of Q.
Algorithm A1 summarizes our constructive algorithm for computing the transitive reduction ( Γ , E UM * ) . For a given node u in line 6, we find all nodes v satisfying condition (UM1) in lines 7–10 and those satisfying condition (UM2) in lines 11–15.
Algorithm A1 Constructive algorithm for ( Γ , E UM * )
     Input a pair ( n , m ) of positive integers
     Output transitive reduction ( Γ , E UM * )
1:
procedure
2:
     L list   consisting   of ( 0 , 0 , , 0 )           ▹ returns Γ
3:
     E empty   list                 ▹ returns E UM *
4:
     Q queue   consisting   of ( 0 , 0 , , 0 )
5:
    while Q is not empty do
6:
         u DEQUEUE(Q)
7:
        if  ( u , n ) D U then                 ▹ for (UM1)
8:
            v Up ( u , n )
9:
           APPEND ( L , v ) , APPEND ( E , ( u , v ) )
10:
            ENQUEUE ( Q , v )
11:
        for  s [ 1 , n 1 ] do               ▹ for (UM2)
12:
           if  ( u , s , s + 1 ) D M  then
13:
                v Move ( u , s , s + 1 )
14:
               APPEND ( L , v ) , APPEND ( E , ( u , v ) )
15:
               ENQUEUE ( Q , v )
By definition, the membership test for D U and D M can be performed in O ( 1 ) time. Recall that DEQUEUE, ENQUEUE, and APPEND can be performed in O ( 1 ) time. The FOR loop in lines 11–15 executes in O ( n ) time. Therefore, recalling that | Γ | = ( m + 1 ) n , we see that Algorithm A1 runs in O ( n ( m + 1 ) n ) time.

Appendix B.2. Constructive Algorithm for (Γ,E US * )

Algorithm A2 summarizes our constructive algorithm for computing the transitive reduction ( Γ , E US * ) . Here, the difference from Algorithm A1 is the method for finding nodes v satisfying conditions (US1) or (US2). For a given node u in line 6, we find all nodes v satisfying condition (US1) in lines 7–16, and those satisfying condition (US2) in lines 17–26. The following describes the latter part.
Let ( u , v ) be a directed edge added to E in line 22. Let ( s ¯ , t ¯ ) be such that v = Swap ( u , s ¯ , t ¯ ) . From line 20, we have u s ¯ < u t ¯ < b . Note that for each t in line 19, value b gives the smallest value of u j with u j > u s ¯ for j [ s ¯ + 1 , t 1 ] . Moreover, due to lines 25–26, u j u s ¯ for j [ s ¯ + 1 , t ¯ 1 ] . Combining these observations, we observe that for j [ s ¯ + 1 , t ¯ 1 ] ,
u j < u s ¯ o r u j b > u t ¯ ( m e a n i n g u j [ u s ¯ , u t ¯ ] ) .
Therefore, the pair ( u , v ) satisfies condition (US2). It is easy to verify that this process finds all vertices v satisfying condition (US2).
Since both of the double FOR loops at lines 7–16 and 17–26 execute in O ( n 2 ) time, Algorithm A2 runs in O ( n 2 ( m + 1 ) n ) time.
Algorithm A2 Constructive algorithm for ( Γ , E US * )
     Input: a pair ( n , m ) of positive integers
     Output: the transitive reduction ( Γ , E US * )
1:
procedure
2:
     L list   consisting   of ( 0 , 0 , , 0 )           ▹ returns Γ
3:
     E empty   list                  ▹ returns E US *
4:
     Q queue   consisting   of ( 0 , 0 , , 0 )
5:
    while Q is not empty do
6:
        uDEQUEUE ( Q )
7:
        for  s [ 1 , n ] do                 ▹ for (US1)
8:
           if  ( u , s ) D U  then
9:
                flag True
10:
               for  j [ s + 1 , n ]  do
11:
                   if  u j { u s , u s + 1 }  then
12:
                        flag False , break
13:
               if  flag = True  then
14:
                    v Up ( u , s )
15:
                   APPEND ( L , v ) , APPEND ( E , ( u , v ) )
16:
                   ENQUEUE ( Q , v )
17:
        for  s [ 1 , n 1 ] do               ▹ for (US2)
18:
            b m + 1
19:
           for  t [ s + 1 , n ]  do
20:
               if  ( u , s , t ) D S and u t < b  then
21:
                    v Swap ( u , s , t )
22:
                   APPEND ( L , v ) , APPEND ( E , ( u , v ) )
23:
                   ENQUEUE ( Q , v )
24:
                    b u t
25:
               else if  u t = u s  then
26:
                   break

References

  1. Turban, E.; Outland, J.; King, D.; Lee, J.K.; Liang, T.P.; Turban, D.C. Electronic Commerce 2018: A Managerial and Social Networks Perspective; Springer: Cham, Switzerland, 2017. [Google Scholar]
  2. Kannan, P.; Li, H. Digital marketing: A framework, review and research agenda. Int. J. Res. Mark. 2017, 34, 22–45. [Google Scholar] [CrossRef]
  3. Ngai, E.W.; Xiu, L.; Chau, D.C. Application of data mining techniques in customer relationship management: A literature review and classification. Expert Syst. Appl. 2009, 36, 2592–2602. [Google Scholar] [CrossRef]
  4. Huang, T.; Van Mieghem, J.A. Clickstream data and inventory management: Model and empirical analysis. Prod. Oper. Manag. 2014, 23, 333–347. [Google Scholar] [CrossRef]
  5. Aggarwal, C.C. Recommender Systems; Springer: Cham, Switzerland, 2016; Volume 1. [Google Scholar]
  6. Iwanaga, J.; Nishimura, N.; Sukegawa, N.; Takano, Y. Improving collaborative filtering recommendations by estimating user preferences from clickstream data. Electron. Commer. Res. Appl. 2019, 37, 100877. [Google Scholar] [CrossRef]
  7. Bucklin, R.E.; Sismeiro, C. Click here for Internet insight: Advances in clickstream data analysis in marketing. J. Interact. Mark. 2009, 23, 35–48. [Google Scholar] [CrossRef]
  8. Fader, P.S.; Hardie, B.G.; Lee, K.L. RFM and CLV: Using iso-value curves for customer base analysis. J. Mark. Res. 2005, 42, 415–430. [Google Scholar] [CrossRef]
  9. Van den Poel, D.; Buckinx, W. Predicting online-purchasing behaviour. Eur. J. Oper. Res. 2005, 166, 557–575. [Google Scholar] [CrossRef]
  10. Chen, Y.L.; Kuo, M.H.; Wu, S.Y.; Tang, K. Discovering recency, frequency, and monetary (RFM) sequential patterns from customers purchasing data. Electron. Commer. Res. Appl. 2009, 8, 241–251. [Google Scholar] [CrossRef]
  11. Iwanaga, J.; Nishimura, N.; Sukegawa, N.; Takano, Y. Estimating product-choice probabilities from recency and frequency of page views. Knowl.-Based Syst. 2016, 99, 157–167. [Google Scholar] [CrossRef]
  12. Nishimura, N.; Sukegawa, N.; Takano, Y.; Iwanaga, J. A latent-class model for estimating product-choice probabilities from clickstream data. Inf. Sci. 2018, 429, 406–420. [Google Scholar] [CrossRef]
  13. Aho, A.V.; Garey, M.R.; Ullman, J.D. The transitive reduction of a directed graph. SIAM J. Comput. 1972, 1, 131–137. [Google Scholar] [CrossRef]
  14. Cirqueira, D.; Hofer, M.; Nedbal, D.; Helfert, M.; Bezbradica, M. Customer purchase behavior prediction in e-commerce: A conceptual framework and research agenda. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns, Würzburg, Germany, 16 September 2019; Springer: Cham, Switzerland, 2019; pp. 119–136. [Google Scholar]
  15. Baumann, A.; Haupt, J.; Gebert, F.; Lessmann, S. Changing perspectives: Using graph metrics to predict purchase probabilities. Expert Syst. Appl. 2018, 94, 137–148. [Google Scholar] [CrossRef]
  16. Koehn, D.; Lessmann, S.; Schaal, M. Predicting online shopping behaviour from clickstream data using deep learning. Expert Syst. Appl. 2020, 150, 113342. [Google Scholar] [CrossRef]
  17. Moe, W.W.; Fader, P.S. Dynamic conversion behavior at e-commerce sites. Manag. Sci. 2004, 50, 326–335. [Google Scholar] [CrossRef]
  18. Montgomery, A.L.; Li, S.; Srinivasan, K.; Liechty, J.C. Modeling online browsing and path analysis using clickstream data. Mark. Sci. 2004, 23, 579–595. [Google Scholar] [CrossRef]
  19. Park, C.H.; Park, Y.H. Investigating purchase conversion by uncovering online visit patterns. Mark. Sci. 2016, 35, 894–914. [Google Scholar] [CrossRef]
  20. Sismeiro, C.; Bucklin, R.E. Modeling purchase behavior at an e-commerce web site: A task-completion approach. J. Mark. Res. 2004, 41, 306–323. [Google Scholar] [CrossRef]
  21. Dong, Y.; Jiang, W. Brand purchase prediction based on time-evolving user behaviors in e-commerce. Concurr. Comput. Pract. Exp. 2019, 31, e4882. [Google Scholar] [CrossRef]
  22. Zhang, Y.; Pennacchiotti, M. Predicting purchase behaviors from social media. In Proceedings of the International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1521–1532. [Google Scholar]
  23. Pitman, A.; Zanker, M. Insights from applying sequential pattern mining to e-commerce click stream data. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 December 2010; pp. 967–975. [Google Scholar]
  24. Qiu, J.; Lin, Z.; Li, Y. Predicting customer purchase behavior in the e-commerce context. Electron. Commer. Res. 2015, 15, 427–452. [Google Scholar] [CrossRef]
  25. Li, Q.; Gu, M.; Zhou, K.; Sun, X. Multi-classes feature engineering with sliding window for purchase prediction in mobile commerce. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, 14–17 November 2015; pp. 1048–1054. [Google Scholar]
  26. Li, D.; Zhao, G.; Wang, Z.; Ma, W.; Liu, Y. A method of purchase prediction based on user behavior log. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, 14–17 November 2015; pp. 1031–1039. [Google Scholar]
  27. Romov, P.; Sokolov, E. RecSys challenge 2015: Ensemble learning with categorical features. In Proceedings of the 2015 International ACM Recommender Systems Challenge, Vienna, Austria, 16–20 September 2015; pp. 1–4. [Google Scholar]
  28. Yi, Z.; Wang, D.; Hu, K.; Li, Q. Purchase behavior prediction in m-commerce with an optimized sampling methods. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, 14–17 November 2015; pp. 1085–1092. [Google Scholar]
  29. Zhao, Y.; Yao, L.; Zhang, Y. Purchase prediction using Tmall-specific features. Concurr. Comput. Pract. Exp. 2016, 28, 3879–3894. [Google Scholar] [CrossRef]
  30. Jannach, D.; Ludewig, M.; Lerche, L. Session-based item recommendation in e-commerce: On short-term intents, reminders, trends and discounts. User Model. User-Adapt. Interact. 2017, 27, 351–392. [Google Scholar] [CrossRef]
  31. Vieira, A. Predicting online user behaviour using deep learning algorithms. arXiv 2015, arXiv:1511.06247. [Google Scholar]
  32. Wu, Z.; Tan, B.H.; Duan, R.; Liu, Y.; Mong Goh, R.S. Neural modeling of buying behaviour for e-commerce from clicking patterns. In Proceedings of the 2015 International ACM Recommender Systems Challenge, Vienna, Austria, 16–20 September 2015; pp. 1–4. [Google Scholar]
  33. Moe, W.W. An empirical two-stage choice model with varying decision rules applied to internet clickstream data. J. Mark. Res. 2006, 43, 680–692. [Google Scholar] [CrossRef]
  34. Yeo, J.; Kim, S.; Koh, E.; Hwang, S.w.; Lipka, N. Predicting online purchase conversion for retargeting. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017; pp. 591–600. [Google Scholar]
  35. Borges, J.; Levene, M. Evaluating variable-length Markov chain models for analysis of user web navigation sessions. IEEE Trans. Knowl. Data Eng. 2007, 19, 441–452. [Google Scholar] [CrossRef]
  36. Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. 2019, 52, 1–38. [Google Scholar] [CrossRef]
  37. Wu, S.; Sun, F.; Zhang, W.; Xie, X.; Cui, B. Graph neural networks in recommender systems: A survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
  38. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  39. Huang, C.; Wu, X.; Zhang, X.; Zhang, C.; Zhao, J.; Yin, D.; Chawla, N.V. Online purchase prediction via multi-scale modeling of behavior dynamics. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2613–2622. [Google Scholar]
  40. Li, Z.; Xie, H.; Xu, G.; Li, Q.; Leng, M.; Zhou, C. Towards purchase prediction: A transaction-based setting and a graph-based method leveraging price information. Pattern Recognit. 2021, 113, 107824. [Google Scholar] [CrossRef]
  41. Liu, Z.; Wang, X.; Li, Y.; Yao, L.; An, J.; Bai, L.; Lim, E.P. Face to purchase: Predicting consumer choices with structured facial and behavioral traits embedding. Knowl.-Based Syst. 2022, 235, 107665. [Google Scholar] [CrossRef]
  42. Sun, Y. E-commerce purchase prediction based on graph neural networks. In Proceedings of the 2022 International Conference on Information Technology, Communication Ecosystem and Management (ITCEM), Bangkok, Thailand, 19–21 December 2022; pp. 72–75. [Google Scholar]
  43. Matzkin, R.L. Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econom. J. Econom. Soc. 1991, 59, 1315–1327. [Google Scholar] [CrossRef]
  44. Aıt-Sahalia, Y.; Duarte, J. Nonparametric option pricing under shape restrictions. J. Econom. 2003, 116, 9–47. [Google Scholar] [CrossRef]
  45. Chatterjee, S.; Guntuboyina, A.; Sen, B. On risk bounds in isotonic and other shape restricted regression problems. Ann. Stat. 2015, 43, 1774–1800. [Google Scholar] [CrossRef]
  46. Groeneboom, P.; Jongbloed, G. Nonparametric Estimation under Shape Constraints; Cambridge University Press: Cambridge, UK, 2014; Volume 38. [Google Scholar]
  47. Guntuboyina, A.; Sen, B. Nonparametric shape-restricted regression. Stat. Sci. 2018, 33, 568–594. [Google Scholar] [CrossRef]
  48. Wang, J.; Ghosh, S.K. Shape restricted nonparametric regression with Bernstein polynomials. Comput. Stat. Data Anal. 2012, 56, 2729–2741. [Google Scholar] [CrossRef]
  49. Pardalos, P.M.; Xue, G. Algorithms for a class of isotonic regression problems. Algorithmica 1999, 23, 211–222. [Google Scholar] [CrossRef]
  50. Gaines, B.R.; Kim, J.; Zhou, H. Algorithms for fitting the constrained lasso. J. Comput. Graph. Stat. 2018, 27, 861–871. [Google Scholar] [CrossRef] [PubMed]
  51. Tibshirani, R.J.; Hoefling, H.; Tibshirani, R. Nearly-isotonic regression. Technometrics 2011, 53, 54–61. [Google Scholar] [CrossRef]
  52. Han, Q.; Wang, T.; Chatterjee, S.; Samworth, R.J. Isotonic regression in general dimensions. Ann. Stat. 2019, 47, 2440–2471. [Google Scholar] [CrossRef]
  53. Stout, Q.F. Isotonic regression for multiple independent variables. Algorithmica 2015, 71, 450–470. [Google Scholar] [CrossRef]
  54. Altendorf, E.E.; Restificar, A.C.; Dietterich, T.G. Learning from sparse data by exploiting monotonicity constraints. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, Edinburgh, UK, 26–29 July 2005; pp. 18–26. [Google Scholar]
  55. Schröder, B. Ordered Sets: An Introduction with Connections from Combinatorics to Topology; Birkhäuser: Basel, Switzerland, 2016. [Google Scholar]
  56. Warshall, S. A theorem on boolean matrices. J. ACM 1962, 9, 11–12. [Google Scholar] [CrossRef]
  57. Le Gall, F. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, Kobe, Japan, 23–25 July 2014; pp. 296–303. [Google Scholar]
  58. Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  59. Ludewig, M.; Jannach, D. Evaluation of session-based recommendation algorithms. User Model. User-Adapt. Interact. 2018, 28, 331–390. [Google Scholar] [CrossRef]
  60. Stellato, B.; Banjac, G.; Goulart, P.; Bemporad, A.; Boyd, S. OSQP: An operator splitting solver for quadratic programs. Math. Program. Comput. 2020, 12, 637–672. [Google Scholar] [CrossRef]
  61. Orzechowski, P.; La Cava, W.; Moore, J.H. Where are we now? A large benchmark study of recent symbolic regression methods. In Proceedings of the Genetic and Evolutionary Computation Conference, Kyoto, Japan, 15–19 July 2018; pp. 1183–1190. [Google Scholar]
Figure 1. Directed graph representations of the poset ( Γ , UM ) with ( n , m ) = ( 3 , 2 ) .
Figure 1. Directed graph representations of the poset ( Γ , UM ) with ( n , m ) = ( 3 , 2 ) .
Algorithms 16 00415 g001
Figure 2. Directed graph representations of the poset ( Γ , US ) with ( n , m ) = ( 3 , 2 ) .
Figure 2. Directed graph representations of the poset ( Γ , US ) with ( n , m ) = ( 3 , 2 ) .
Algorithms 16 00415 g002
Figure 3. Comparison of prediction performance with the two-dimensional probability table.
Figure 3. Comparison of prediction performance with the two-dimensional probability table.
Algorithms 16 00415 g003
Figure 4. Comparison of prediction performance with machine learning methods.
Figure 4. Comparison of prediction performance with machine learning methods.
Algorithms 16 00415 g004
Figure 5. Item-choice probabilities estimated from the full-sample training set with ( n , m ) = ( 5 , 6 ) .
Figure 5. Item-choice probabilities estimated from the full-sample training set with ( n , m ) = ( 5 , 6 ) .
Algorithms 16 00415 g005
Figure 6. Item-choice probabilities estimated from the 10%-sampled training set with ( n , m ) = ( 5 , 6 ) .
Figure 6. Item-choice probabilities estimated from the 10%-sampled training set with ( n , m ) = ( 5 , 6 ) .
Algorithms 16 00415 g006
Table 1. Pageview history of six user–item pairs.
Table 1. Pageview history of six user–item pairs.
#PVs Choice
UserItem1 April2 April3 April 4 April ( r , f ) ( v 1 , v 2 , v 3 )
u 1 i 2 101 0 ( 3 , 2 ) ( 1 , 0 , 1 )
u 1 i 4 010 1 ( 2 , 1 ) ( 0 , 1 , 0 )
u 2 i 1 300 0 ( 1 , 3 ) ( 0 , 0 , 3 )
u 2 i 3 003 1 ( 3 , 3 ) ( 3 , 0 , 0 )
u 2 i 4 111 0 ( 3 , 3 ) ( 1 , 1 , 1 )
u 3 i 2 201 0 ( 3 , 3 ) ( 1 , 0 , 2 )
Table 2. Process of enumerating v Γ such that ( u , v ) E UM * .
Table 2. Process of enumerating v Γ such that ( u , v ) E UM * .
uOperationv(UM1)(UM2)
( 0 , 2 , 1 ) Up ( u , 1 ) ( 1 , 2 , 1 ) unsatisfied
Up ( u , 3 ) ( 0 , 2 , 2 ) satisfied
Move ( u , 1 , 2 ) ( 1 , 1 , 1 ) satisfied
Move ( u , 1 , 3 ) ( 1 , 2 , 0 ) unsatisfied
Table 3. Process of enumerating v Γ such that ( u , v ) E US * .
Table 3. Process of enumerating v Γ such that ( u , v ) E US * .
uOperationv(US1)(US2)
( 0 , 2 , 1 ) Up ( u , 1 ) ( 1 , 2 , 1 ) unsatisfied
Up ( u , 3 ) ( 0 , 2 , 2 ) satisfied
Swap ( u , 1 , 2 ) ( 2 , 0 , 1 ) satisfied
Swap ( u , 1 , 3 ) ( 1 , 2 , 0 ) satisfied
Table 4. Methods for comparison.
Table 4. Methods for comparison.
AbbreviationMethod
2dimEmpEmpirical probability table (1) [11]
2dimMonoTwo-dimensional monotonicity model (2)–(5) [11]
SeqEmpEmpirical probabilities (6) for PV sequences
SeqUMOur PV sequence model (7)–(9) using ( Γ , UM )
SeqUSOur PV sequence model (7)–(9) using ( Γ , US )
LR L 2 -regularized logistic regression
ANNArtificial neural networks for regression using
one fully-connected hidden layer of 100 units
RFRandom forest of regression trees
Table 5. Training and validation periods.
Table 5. Training and validation periods.
Training
Pair IDStartEndValidation
121 May 201518 August 201519 August 2015
231 May 201528 August 201529 August 2015
310 June 20157 September 20158 September 2015
420 June 201517 September 201518 September 2015
530 June 201527 September 201528 September 2015
Table 6. Problem size of our PV sequence model (7)–(9).
Table 6. Problem size of our PV sequence model (7)–(9).
#Cons in Equation (8)
Enumeration Operation Reduction
n m #VarsSeqUMSeqUS SeqUMSeqUS SeqUMSeqUS
5132430430 160160 4848
5224321,38317,945 18901620 594634
531024346,374255,260 96007680 30723546
5431253,045,4222,038,236 32,50025,000 10,50012,898
55777618,136,64511,282,058 86,40064,800 28,08036,174
5616,80782,390,14048,407,475 195,510144,060 63,79885,272
1672121 66 66
26491001861 120105 7893
3634342,90332,067 16381323 7981018
4624011,860,6221,224,030 18,81614,406 73509675
5616,80782,390,14048,407,475 195,510144,060 63,79885,272
Table 7. Computation times for our PV sequence model (7)–(9).
Table 7. Computation times for our PV sequence model (7)–(9).
Time [s]
Enumeration Operation Reduction
n m #VarsSeqUMSeqUS SeqUMSeqUS SeqUMSeqUS
51320.000.01 0.000.00 0.000.00
522432.321.66 0.090.07 0.030.02
531024558.2264.35 3.410.71 0.130.26
543125OMOM 24.0713.86 1.725.80
557776OMOM 180.5367.34 9.7136.94
5616,807OMOM 906.76522.84 86.02286.30
1670.000.00 0.000.00 0.000.00
26490.030.01 0.010.00 0.000.00
3634312.801.68 0.200.03 0.050.02
462401OMOM 8.074.09 2.122.87
5616,807OMOM 906.76522.84 86.02286.30
Table 8. Computational performance of our PV sequence model (7)–(9).
Table 8. Computational performance of our PV sequence model (7)–(9).
#Cons in Equation (8) Time [s] F1 Score [%], N = 3
n m #VarsSeqUMSeqUS SeqUMSeqUS SeqEmpSeqUMSeqUS
33029,79184,630118,850 86.72241.46 12.2512.4012.40
41228,56199,372142,800 198.82539.76 12.6812.9312.95
5616,80763,79885,272 86.02286.30 12.9013.1813.18
6415,62562,50076,506 62.92209.67 13.1413.4913.48
7316,38467,58476,818 96.08254.31 13.2313.5213.53
82656124,78625,879 19.3517.22 13.1113.3713.35
9219,68383,10686,386 244.15256.42 13.0713.4013.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nishimura, N.; Sukegawa, N.; Takano, Y.; Iwanaga, J. Predicting Online Item-Choice Behavior: A Shape-Restricted Regression Approach. Algorithms 2023, 16, 415. https://doi.org/10.3390/a16090415

AMA Style

Nishimura N, Sukegawa N, Takano Y, Iwanaga J. Predicting Online Item-Choice Behavior: A Shape-Restricted Regression Approach. Algorithms. 2023; 16(9):415. https://doi.org/10.3390/a16090415

Chicago/Turabian Style

Nishimura, Naoki, Noriyoshi Sukegawa, Yuichi Takano, and Jiro Iwanaga. 2023. "Predicting Online Item-Choice Behavior: A Shape-Restricted Regression Approach" Algorithms 16, no. 9: 415. https://doi.org/10.3390/a16090415

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop