Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation

Ding, Jiawei; Wei, Jinsheng; Lu, Guanming

doi:10.3390/electronics14010207

Open AccessArticle

Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation

by

Jiawei Ding

,

Jinsheng Wei

and

Guanming Lu

^*

School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, New Model Road, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(1), 207; https://doi.org/10.3390/electronics14010207

Submission received: 3 December 2024 / Revised: 3 January 2025 / Accepted: 4 January 2025 / Published: 6 January 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Session-based recommendation (SBR) aims to predict potential user interactions within an anonymous session. It utilizes learned user interests to recommend items. As research has progressed, researchers have shifted towards exploring user initial intent, which can provide practical guidance for item selection. However, a significant limitation of the current methodologies is that they often assume the first item in a session as the initial intent, neglecting the possibility of a random initial click. Additionally, these methods typically merge the initial intent with the session representation without considering dynamic changes in user interests. To address these challenges, we propose an innovative approach named Efficiently Exploiting Muti-level User Initial Intent for (EMUI) for session-based recommendation. This approach includes a multi-level initial-intent generation module (MIGM) and an interest matching module (IMM). Specifically, the MIGM is designed to extract a more comprehensive representation of user initial intent from various levels, effectively mitigating the issue of random initial clicks. Furthermore, we propose the IMM to ensure alignment between dynamic interests and user initial intent. The IMM identifies components within multi-level user initial intent that correlate with dynamic interests, thereby enhancing session representation and, ultimately, improving recommendation performance. In addition, in order to optimize the initial user intent at each level, we introduce a contrastive learning task to maximize the use of user initial intent at each level. A considerable number of experiments on three real-world datasets have shown that our EMUI has significantly enhanced the recommendation accuracy over state-of-the-art methods.

Keywords:

user initial intent; interest marching; contrastive learning; session-based recommendation

1. Introduction

Recommender systems can help users quickly find valuable content in massive amounts of information, thus solving the problem of information overload. Traditional recommendation algorithms use user configuration information and long-term historical recorders in the system to recommend content that users may be interested in [1]. However, it is challenging to extract user tastes from long-term historical interactions because the user profile is unknown due to privacy protection in real-world scenarios. To solve the problem mentioned above, session-based recommendation (SBR) has been proposed and attracted more and more research interest. As a subclass of sequential recommendation, SBR aims to predict the next item a user is most likely to interact with from an anonymous session [2,3,4,5].

Existing SBR methods treat sessions as sequences of items ordered by click time and typically employ recurrent neural networks (RNN) [6,7,8] or graph neural networks (GNN) [9,10,11,12] to solve the recommendation problem. GNN-based methods propagate information between neighboring items and are able to model complex item transition relations, yet they have significant superiority over RNN-based methods. These methods improve SBR performance by fusing user global and current interests to obtain a more accurate representation of the session. Most previous studies consider the last clicked item to be the user’s current interest and emphasize its contribution to recommendation performance. However, in the real world, statistical analysis of datasets shows that the probability of repeat interactions of the first few items is higher than the last. Utilizing the Tmall dataset as an exemplar, we examined valid sessions—defined by lengths exceeding two and fewer than 40 interactions—and found that the repeat click rates for the first and second items were 80.2% and 80%, respectively. In stark contrast, the repeat click rate for the final item plummeted to merely 1.4%. User repeat clicks serve as indicators of their interest in specific items; consequently, it is essential to examine the initially clicked items, as these may reflect users’ initial intent.

Qiao et al. first investigated the user initial intent by considering the first clicked item in the session and updating the session representation with the initial intent, which achieved good results [13]. Directly considering the first item as the user’s initial intent is too simple. When a user starts a session, their intention may be unclear, and there are cases of random clicks. After the user’s intention becomes clearer, such accidental clicks may deviate from the user’s main intention. Regarding the first item directly as the user’s initial intent may compromise the accuracy of the final representation of the session. In addition, existing research suggests that the interests of sessions change dynamically. The user’s initial intent may contradict the user’s dynamically changing interests. Directly fusing the user’s initial intent with the session representation (e.g., summation, averaging) introduces noise into the session representation and degrades system performance.

To solve the problems mentioned above, we propose a new model named Exploring Multi-Level User Initial Intent (EMUI) for session-based recommendation. Specifically, to solve the problem that the first item in a session may be clicked arbitrarily and cannot correctly represent the user’s initial intent, we design a multi-level initial intent generation module (MIGM). MIGM synthesizes the first few items the user clicks and mines the user’s deep-level initial intent by learning the correlations of the initially clicked items. Merging the mined multi-level user intents produces a more discriminative initial intent representation. In addition, the IMM is designed to take into account the dynamically changing interests of users. In particular, we propose a contrastive learning task that maximizes the use of user initial intent at each level. IMM retains the part of the initial intent that is compatible with the dynamic interests and fuses it with the session representation to obtain a more feature-rich session representation.

Overall, the main contributions of this paper are summarized as follows:

In this paper, we propose a new method named Exploring EMUI, which aims to further explore the role of user initial intent in the performance of SBR.
We propose the MIGM. This module obtains a more discriminative initial intent representation by mining multi-level user initial intent. As well, a contrastive learning task is constructed to maximize the use of the initial user intent at each level.
We designed the IMM. This module retains the part of the user’s initial intent that matches the dynamic interests for enhanced session recommendation.
Extensive experiments demonstrate that our model enhances the state-of-the-art approach.

2. Related Works

2.1. Traditional Methods

Traditional SBR methods are mainly based on Markov chains, which measure the transition relationship between the items that the user has interacted with and the current interaction through transfer probabilities. FPMC combines Markov chains with matrix decomposition techniques in order to learn the user long-term characteristics [14]. It needs to be emphasized that FPMC adapts to the SBR task by ignoring the user latent representation.

With the wide application of collaborative filtering (CF) techniques, SBR methods based on CF have been widely studied. Item-KNN is a collaborative filtering method that recommends similar items to the user by the cosine similarity between the last clicked item and other items in the session [15]. Zhou et al. correct the scoring bias through first-order approximation and also verify the validity of the system by using the item attribute information [16]. Wang et al. use the attention mechanism to exploit the dynamic changes in user interests in a CF-based approach [17]. However, the traditional methods proceed based on the strong independence assumption, which limits the accuracy of the recommendation.

2.2. Deep Learning-Based Methods

With the wide application and research of deep learning (DL) technology, DL-based methods show obvious advantages over traditional methods. RNN-based methods model session data as time series and enhance session recommendation by mining the sequential dependencies in them. GRU4REC models session sequential transitions using GRU to capture user preferences [6]. Li et al. propose NARM, which combines the attention mechanism with GRU to capture the user’s main intention while preserving the session sequential dependencies [7]. Furthermore, in addition to modeling session sequences using RNN, Liu et al. propose a short-term memory network STAMP based on the attention mechanism [18]. It utilizes a multilayer perceptron that takes into account the user’s long-term and current interests in extracting user preferences. CoSAN enhances session representation by using neighborhood session information through a multilayer attention mechanism [19]. However, neither RNN-based nor attention-based methods can model the co-occurrence patterns between items well.

Benefiting from the powerful advantages of graph structures, GNN-based approaches have been widely used in the SBR task in recent years. The SR-GNN model proposed by Wu et al. applies graph neural networks to SBR for the first time [20]. It learns the features of items in the current session by gated graph neural networks. With the combination of attention networks and graph neural networks, graph attention networks begin to be applied in SBR. FGNN utilizes a weighted attention layer that assigns different weights to each item in order to learn more accurate item representations [21]. In order to study the dynamic changes in user interests, GC-SAN models the session graph as a dynamic graph, and uses the attention mechanism to aggregate item features to obtain the session representation [22]. However, all the above methods learn item features and session representations based on the current session, which can only capture a limited number of user interests. But different sessions also share the same user interest representations (e.g., different sessions contain the same items.). The exploration of similar sessions helps to learn richer item features. Therefore, the global information-based SBR approaches are proposed. GCE-GNN aggregates global contextual information about sessions using a global graph while retaining the session graph for more comprehensive session embedding [23]. DHCN transforms sessions into a hypergraph, capturing the complex transitions between items using a hypergraph convolutional network [24]. Externally, the method incorporates a self-supervised learning approach to enhance recommendations. In addition to modeling the dynamic changes in user interests, some studies have begun to explore the impact of the user initial intent. AMAN treats the first clicked item of a session as the user’s original interest and captures the association of items in different micro-behavioural sequences [13]. However, this method does not consider the random clicks of the user when starting a session, leading to limitations in the learned initial user intent. In this paper, we jointly model user initial intent and dynamic interests to obtain a more accurate user initial intent representation.

2.3. Self-Supervised Learning in RS

Self-supervised learning (SSL) is a form of contrastive learning (CL) that has made significant progress in recommender systems. It supervises models by mining information from unlabelled data to better represent semantic information. In InfoGraph [25], the feature representation of the same sample is enhanced by maximizing the mutual information of different views. A novel self-supervised CL framework is proposed to detect time series faults, which greatly improves the ability of classification in [26]. SGL applies LightGCN to encode and features of nodes are extracted from subviews [27]. It takes self-supervised learning as a secondary task, which can effectively solve the long-tail problem of items. To address the data sparsity problem of sessions, DHCN constructs two views of graphs to incorporate self-supervised learning into the training, which maximizes the mutual information between session representations learned by the two graph views [24]. STGCR constructs temporal graphs and temporal hypergraph views and creates a contrastive learning task between the two views to maximize the mutual information between them to improve recommendation performance [28]. In this paper, self-supervised learning is introduced to maximize the role of initial user intent at each level to improve recommendation performance.

3. Methodology

3.1. Problem Statement

There are three main concepts in SBR: item, session sequence, and sessions. There are M unique items in the SBR system, which are denoted as the set

V = {v_{1}, v_{2}, . . ., v_{M}}

. Each item

v_{i} \in V

is embedded into the d-demension space and the initial feature of item

v_{i}

is represented as

v_{i} \in R^{d}

. The user’s interactions are denoted as the session sequence

s = {v_{s_{1}}, v_{s_{2}}, . . ., v_{s_{i}}}

in which items are ordered by click time. The session set

S = {s_{1}, s_{2}, . . . . s_{N}}

is consisting of all session sequences. SBR aims at predicting the next item that a user is most likely to interact with on the basis of an anonymous session. Common symbols can be seen in Table 1.

3.2. The Proposed EMUI

Figure 1 shows the overview of our proposed EMUI method. In this schematic, solid boxes represent established methods or specific outputs, while dashed boxes denote variable components. We utilize the MIGM to extract a more comprehensive representation of user initial intent from various levels. Furthermore, the IMM is proposed to ensure alignment between dynamic interests and user initial intent. In addition, a contrastive learning task is introduced to maximize the use of user initial intent at each level. We present each component in detail below.

3.2.1. Item Features Learning Based on HCGN

Sessions are first transformed into a hypergraph

G = (V, E)

to learn item features, where V is the item set containing M unique items and E is the hyperedge set containing N unique hyperedges. Each session s is represented as a hyperedge

e \in E

and each item is represented as a node

v_{i} \in V

. Any two items in a session are connected with each other in the hypergraph model. After the hypergraph is constructed, a hypergraph convolutional graph network (HGCN) proposed in [29,30] is developed to capture item transitions, which is defined as follows:

X_{v}^{(l + 1)} = D^{- 1} H W B^{- 1} H^{⊤} X_{v}^{(l)}

(1)

where

W \in R^{N \times N}

denotes the diagonal matrix of hyperedge weights,

H \in R^{M \times N}

denotes the hypergraph association matrix, and

D \in R^{M \times M}

and

B \in R^{N \times N}

denote the degree matrices of nodes and hyperedges, respectively. After

l + 1

layers of HGCN, the final item embedding of

v_{i}

is represented as

x_{i} \in X

.

3.2.2. User Initial Intent Learning

(i) Multi-Level User Initial Intent Generation: The user initial intent is a crucial aspect of user behavior analysis, which guides the selection of subsequent items and signifies the commencement of a session. Traditional research methodologies have often oversimplified this concept by equating the first clicked item with the user initial intent, thereby overlooking the potential randomness in the user’s initial clicks. To address this oversimplification, we develop a novel multi-level user initial intent learning module to extract user initial intent from multiple levels, thereby providing a more precise representation of the user initial intent.

To capture the multi-level initial intent of a user, we consider the first k items as the learning scope. For the k-th level of initial intent learning, we compute a similarity score between the k-th item and previous items following the method outlined in [31]. This score is then subjected to a weighted summation to yield a representation of the user initial intent at the k-th level. The process is as follows:

\begin{matrix} i_{1} = x_{1} \\ i_{2} = (x_{1}^{T} \cdot x_{2}) x_{1} + x_{2} \\ . . . . . . \\ i_{k} = \sum_{i = 1}^{k - 1} (x_{i}^{T} \cdot x_{k}) x_{i} + x_{k} \end{matrix}

(2)

where

i_{1}, i_{2}, . . ., i_{k}

represent (

1 \sim k

)-th level user initial intent, respectively.

(ii) User Initial Intent Matching: Considering the dynamic change of user interests, we are required to select the part of user initial intent at each level that matches the dynamic interests. For session

s = {v_{s_{1}}, v_{s_{2}}, . . ., v_{s_{n}}}

, we design the IMM as shown in Figure 2, which takes the embedding of last click item

x_{s_{n}}

as the query vector of the soft attention mechanism, and performs attention coefficient calculation with the user initial intent of each level. Then, the user initial intent fusion is performed according to the attention coefficients to obtain the final user initial intent representation. The mathematical formulation of this process is as follows:

α_{t} = q^{T} σ (W_{1} i_{t} + W_{2} x_{s_{n}} + b)

(3)

I = \sum_{t = 1}^{k} α_{t} i_{t}

(4)

where

q \in R^{d}

,

W_{1} \in R^{d \times d}

W_{2} \in R^{d \times d}

, and

b \in R^{d}

are the trainable parameters.

I

is the final user initial intent representation.

3.2.3. General Intent Learning

The session-based recommendation system is designed to suggest suitable items to the user by discerning the user’s intent. Typically, for a given session

s = {v_{s_{1}}, v_{s_{2}}, . . ., v_{s_{n}}}

, the item embeddings are denoted as

s = {x_{s_{1}}, x_{s_{2}}, . . ., x_{s_{n}}}

. We initially calculate the average representation of items within the session, denoted as

x_{s} = \frac{1}{n} \sum_{i = 1}^{n} x_{s_{i}}

. Subsequently, we compute the attention score for each item in the session individually, using vs as a query vector. Ultimately, the general intent representation is computed through the application of a soft attention mechanism. The specifics of this process are as follows:

β_{t} = f^{T} σ (W_{3} x_{t} + W_{4} x_{s} + c)

(5)

h_{g} = \sum_{t = 1}^{n} β_{t} x_{t}

(6)

where

f \in R^{d}

,

W_{3} \in R^{d \times d}

,

W_{4} \in R^{d \times d}

, and

c \in R^{d}

are the trainable parameters.

h_{g}

is the user general intent representation.

3.2.4. User Intent Dynamic Fusion

After obtaining the user initial intent and the global intent representation separately, a fusion process is undertaken to obtain the final user intent representation. Here, we employ a lightweight gating mechanism to calculate an adaptive weight

γ

that dynamically adjusts the importance of the user general and the initial intent to obtain the optimal final user intent representation. The equation is as follows:

γ = σ (W_{5} h_{g} + W_{6} I + d)

(7)

h_{s} = γ h_{g} + (1 - γ) I

(8)

where

W_{5} \in R^{1 \times d}

,

W_{6} \in R^{1 \times d}

, and

d \in R^{1}

are the trainable parameters. The final user intent representation

h_{s}

is utilized to make recommendations.

3.2.5. Make Recommendations

We compute the similarity scores between the final user intent representation and the candidate items and normalize them using the softmax function. The details are as follows:

\hat{y} = s o t f m a x (h_{s}^{T} x_{i})

(9)

Items with the top k similarity scores are recommended to users.

To optimize our model, we adopt the cross-entropy function as the main objective function:

L_{m} = - \sum_{i = 1}^{N} y_{i} l o g ({\hat{y}}_{i}) + (1 - y_{i}) l o g (1 - l o g ({\hat{y}}_{i}))

(10)

where

y

is the one-hot encoding vector of the ground truth. To get high-quality recommendations, the Adam optimizer is utilized to minimize

L_{m}

.

In order to exploit the user initial intent further and better leverage the role of different levels of user initial intent, we construct a multi-level initial intent contrastive learning (CL) task as an auxiliary loss to maximize the mutual information between the final user initial intent and user initial intent at different levels. The details are as follows:

L_{i c l} = - \sum_{(I, i_{t}) \in S^{+}} \sum_{(I, x_{j}) \in S^{-}} l o g σ (I^{T} i_{t} - I^{T} x_{j})

(11)

where

(I, i_{t}) \in S^{+}

denotes the set of positive samples, and

(I, x_{j}) \in S^{-}

the set of randomly sampled negative samples.

We jointly learn the main loss

L_{m}

and CL loss

L_{i c l}

with hyper-parameter

λ

as the overall loss

L

:

L = L_{m} + λ L_{i c l}

(12)

4. Experiments

In this section, the following questions will be answered:

RQ1: How does the performance of EMUI compare with other baseline methods?
RQ2: How do variants of EMUI affect the final recommendation?
RQ3: How do the depth of EMUI and MIGM affect the model performance?
RQ4: How do different fusion methods of user general and initial intent affect the model performance?
RQ5: How does hyper-parameter $λ$ impact the EMUI performance?

4.1. Experimental Settings

4.1.1. Datasets

To verify the effectiveness of EMUI, extensive experiments are conducted on three widely used public benchmark datasets: Tmall (https://tianchi.aliyun.com/dataset/dataDetail?dataId=42), RetailRockek (https://www.kaggle.com/retailrocket/ecommerce-dataset), and Taobao (https://tianchi.aliyun.com/dataset/dataDetail?dataId=47). The Tmall dataset contains anonymized users’ shopping logs in the Tmall APP. RetailRockek is an e-commerce dataset containing six months of user browsing history. The Taobao dataset is a 30-day record of user behavior collected from the Taobao App. For a fair comparison, the datasets are processed as follows [20] and the three datasets are split into training/testing sets. Initially, we exclude session data comprising fewer than two items or exceeding 40 items. For every retained session

s = \{v_{s, 1}, v_{s, 2}, . . ., v_{s, m}\}

, we generate multiple labelled sequences with the corresponding labels

([v_{s, 1}], v_{s, 2}), ([v_{s, 1}, v_{s, 2}], v_{s, 3}), . . ., ([v_{s, 1}, v_{s, 2}, . . ., v_{s, m - 1}], v_{s, m})

.

4.1.2. Competing Methods

To validate the effectiveness of our model, we compared it with three groups of related baseline methods as follows:

Traditional methods

Item-KNN [15] is a CF-based method that recommends similar items to the user by the cosine similarity scores.
FPMC [14] is the MC-based method that applies to the next-basket recommendation. It adapts to the session recommendation task by ignoring the user latent representation.

RNN-based methods

GRU4REC [6] models session sequential transitions using GRU to capture user preferences.
NARM [7] fuses RNN and attention mechanisms to learn the user’s main interests and fuse them with the sequential features.
STAMP [18] utilizes a multilayer perceptron that considers the user’s long-term and current interests to extract user preferences.

GNN-based methods

SRGNN [20] models the conversation as a graph structure and learns item features and user interests using GGNN and soft attention mechanisms.
GCE-GNN [23] aggregates global contextual information about sessions using a global graph, while retaining the session graph for more comprehensive session embedding.
DHCN [24] transforms sessions into a hypergraph, capturing the complex transitions between items using a hypergraph convolutional network. Externally, the method incorporates a self-supervised learning approach to enhance recommendations.
AMAN [13] treats the first clicked item of a session as the user’s original interest and captures the association of items in different micro-behavioural sequences.
STGCR [28] constructs temporal graph and temporal hypergraph views and creates a contrastive learning task between the two views to maximize the mutual information between them to improve recommendation performance.

4.1.3. Parameter Settings

Following previous works [20,24], the batch size and embedding size are set to 100. The Adam optimizer is adopted in the training procecess. The

L_{2}

regularization is set to

10^{- 5}

and the learning rate is initialized at 0.001 with the decay by 0.1 after every three epochs.

4.1.4. Evalution Metrics

We adopt P@K (Precision) and MRR@K (Mean Reciprocal Rank) to evaluate the model results. The K is set to 10 and 20. Higher values of P@K and MRR@K indicate a better recommendation performance. For space efficiency, we have simplified the notation MRR@K to M@K in the paper.

4.2. Overall Performance (RQ1)

To validate the effectiveness of our model, a considerable number of experiments are conducted on three datasets. The experimental results are shown in Table 2. It can be concluded from the experimental results as follows:

RNN-based methods (e.g., GRU4Rec, NARM, STAMP) achieved great improvement in all evaluation metrics compared to traditional methods (e.g., Item-KNN, FPMC), which demonstrates the effectiveness of RNN-based methods in session recommendation tasks. The traditional approach recommends items based on the user’s most recently clicked items without considering the order dependency in the session, whereas the deep learning-based methods take it into account. This shows the importance of modeling the sequential dependencies of items in a session.
GNN-based approaches (e.g., SRGNN, DHCN) have achieved better results compared to recurrent neural network-based approaches (e.g., GRU4Rec, NARM). GNN-based methods benefit from the advantage of graph structure to capture long-range dependencies between items. By modeling complex item transformation relationships, richer item representations are obtained, and recommendation performance is improved.
Contrastive learning techniques further enhance session recommendation performance (e.g., STGCR). Contrastive learning improves the proportion of valid information of the learned content by maximizing the mutual information between the target and the positive samples while broadening the distance from the negative samples to get more accurate item features. AMAN adds the initial user intent, which improves the recommendation performance to a certain extent compared to other methods.
Our proposed EMUI outperforms the baseline method in all metrics on all three datasets. EMUI upgrades the initial intent representation in AMAN by extracting multi-level user initial intent through MIGM to enrich the initial intent representation. In order to match the initial intent with the user’s dynamic interest, we newly designed IMM to retain the part of the user’s initial intent that matches the dynamic interest. In addition, to further exploit the multi-level user initial intentions, we created a comparison learning task to maximize the relationship between the final obtained user initial intentions and the initial intentions of each level. The experimental results show that our model can effectively improve recommendation performance.

4.3. Impact of EMUI Components (RQ2)

To investigate the contributions of MIGM, IMM, and contrastive learning tasks, we designed three variants of the model for ablation studies, namely, EMUI w/o MIGM, EMUI w/o IMM, and EMUI w/o CL. The three variants are used to validate the effectiveness of each of the modules by removing the different modules separately. In particular, in EMUI w/o MIGM, we remove the multi-level user initial intent representation and keep the user’s first clicked item as the initial intent. In EMUI w/o IMM, we replace the feature matching with the summation operation to obtain the user initial intent. The experimental results are shown in Table 3.

From the experimental results, we can observe that no matter which module is removed, the experimental results have decreased compared to EMUI, but have improved compared to DHCN, which proves that each module plays an irreplaceable role in our model. From the experimental results of EMUI w/o MIGM, it can be seen that only using the user’s first click item cannot accurately represent the user’s initial intention. The MIGM we designed can effectively mitigate the impact of random clicks when a user starts a session, and obtain a more accurate user initial intent representation through multi-level mining of the user’s initial intent.

Additionally, it can be observed that removing the IMM reduces the system performance according to the experimental results of EMUI w/o IMM. The direct summation does not consider the dynamic changes in user interests, which leads to the obtained user initial intent containing components that deviate from the user’s current interests, leading to a decrease in system performance. In addition, from the experimental results of EMUI w/o CL, we can find that the created comparison learning task gives a positive improvement in the system performance, and removing it decreases each evaluation index. From the results of the ablation experiments, it can be seen that all the modules in our model contribute to the system performance.

4.4. Impact of the Depth of EMUI and MIGM (RQ3)

We investigate the depth of EMUI by studying the number of layers of the hypergraph convolutional network. The number of layers of the hypergraph convolutional network is set to 1–5, and the experimental results are shown in Figure 3. According to the figure, we can find that setting the network layers to 1, 2, and 3 in the Tmall, Taobao, and RetailRocket datasets, respectively, achieves the best results. The RetailRocket dataset contains users’ auditioning behaviors, and more network layers help to capture a more accurate user interest. The difference in the layer settings between Tmall and Taobao may be due to the fact that the average session length in the Taobao dataset is larger than in the Tmall dataset, and more layers of the network are needed to learn user interests.

In addition, we set k in the range of 1–5 to investigate the depth of MIGM, and the experimental results are shown in Figure 4. It can be seen that the best results are achieved with k set to 3 for the Tmall and RetailRocket datasets, and with k set to 4 for the Taobao dataset. This suggests that multiple levels of initial user intent can improve system performance, but too many levels can introduce redundant features and cause performance degradation. The Taobao dataset requires a larger value of k, probably because its average session length is longer than in the other two datasets.

4.5. Impact of Different Fusion Methods of User General and Initial Intent (RQ4)

In this study, we evaluated the performance of our proposed adaptive fusion method against three established fusion techniques: summation, averaging, and concatenation. The outcomes of these comparative experiments are presented in Table 4.

Our adaptive fusion method demonstrated superior performance across the majority of the experimental metrics, underscoring its efficacy. Interestingly, the summation and averaging fusion methods yielded similar results. This can be attributed to their inability to discern the significance of the overall user intent relative to the initial intent, thereby treating their contributions to the system as equal.

Conversely, the splicing method exhibited the least effective performance. This could potentially be due to the interference of features across different dimensions during the training process, leading to a degradation in system performance. Further investigation is warranted to confirm this hypothesis.

4.6. Impact of the Hyper-Parameters $λ$ (RQ5)

The

λ

controls the size of the contrastive learning task, and a lower

λ

value permits the model to closely fit the training data, whereas a higher

λ

value strengthens the regularization effect, thereby mitigating overfitting. In determining the appropriate

λ

, we draw on insights from prior research and set the

λ

values to 0, 0.0001, 0.001, 0.005, and 0.01. The experimental results represented in Table 5 show that good results are obtained when the recommendation task and the contrastive learning task are trained together. The best results were obtained when the

λ

value was set to 0.001. As the

λ

value becomes larger, all the metrics decrease, which may be due to the gradient conflict between the two tasks. Additionally, different session characteristics influence the selection of

λ

. The three datasets used in the paper are all e-commerce datasets, and after data preprocessing, the session data exhibit similar characteristics. Therefore, when

λ

is set to the same value (i.e.,

λ

= 0.001), the system performs optimally, which is consistent with the experimental results.

5. Conclusions

In this paper, we propose a novel model called EMUI to improve session recommendation performance by exploiting multi-level user initial intent. In EMUI, we newly design MIGM to extract user initial intentions from different levels, and mitigate the arbitrary clicking situation that exists when only the first item is used as the user’s initial intention through the learning of different levels of user initial intentions. Additionally, in order to adapt to the dynamic change of user interests during the session, we develop IMM. This module matches the multi-level user initial intent with the user dynamic interests, retains the parts of the initial intentions that match the user’s dynamic interests, and reduces the introduction of redundant information in the model. Finally, to further leverage the multi-level user initial intent, we create a comparison learning task to maximize the mutual information between the user initial intent and the different levels of user initial intent. According to the experimental results, we can draw the conclusion that our model achieves exciting results, with all evaluation metrics higher than those of the baseline model. In future work, we plan to introduce edge information to assist in generating more accurate user initial intent, and to further explore the role of user initial intent in session recommendation tasks through side information.

Author Contributions

Formal analysis, J.D.; methodology, J.D. and J.W.; software, J.D.; supervision, G.L.; validation, J.W.; writing—original draft, J.D.; Writing—review and editing, J.W. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No.KYCX22_0950), Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications (Grant No.NY223030), Nanjing Science and Technology Innovation Foundation for Overseas Students (Grants No.RK002NLX23004), and Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (Grants No.24KJB520022).

Data Availability Statement

The data generated during the current study is available from the first author or corresponding author on reasonable request.

Acknowledgments

The authors thank the reviewers for their helpful and insightful comments.

Conflicts of Interest

The authors declare that they have no competing financial or personal interests that could have influenced this work.

Abbreviations

The following abbreviations are used in this manuscript:

SBR	Session-based recommendation
RNN	Recurrent neural network
GNN	Graph neural network
EMUI	Exploring Multi-Level User Initial Intent
MIGM	Multi-level initial intent generation module
IMM	Interest matching module

References

Wang, S.; Cao, L.; Wang, Y.; Sheng, Q.Z.; Orgun, M.A.; Lian, D. A survey on session-based recommender systems. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
Jannach, D.; Ludewig, M. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the eleventh ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; pp. 306–310. [Google Scholar]
Pan, Z.; Cai, F.; Ling, Y.; de Rijke, M. An intent-guided collaborative machine for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 24–25 July 2020; pp. 1833–1836. [Google Scholar]
Song, W.; Xiao, Z.; Wang, Y.; Charlin, L.; Zhang, M.; Tang, J. Session-based social recommendation via dynamic graph attention networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 555–563. [Google Scholar]
Wang, M.; Ren, P.; Mei, L.; Chen, Z.; Ma, J.; De Rijke, M. A collaborative session-based recommendation approach with parallel memory modules. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 345–354. [Google Scholar]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
Tan, Y.K.; Xu, X.; Liu, Y. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 17–22. [Google Scholar]
Xu, C.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Zhuang, F.; Fang, J.; Zhou, X. Graph contextualized self-attention network for session-based recommendation. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; Volume 19, pp. 3940–3946. [Google Scholar]
Chen, T.; Wong, R.C.W. Handling information loss of graph neural networks for session-based recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1172–1180. [Google Scholar]
Wang, J.; Ding, K.; Zhu, Z.; Caverlee, J. Session-based recommendation with hypergraph attention networks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), SIAM, Virtual, 29 April–1 May 2021; pp. 82–90. [Google Scholar]
Li, A.; Cheng, Z.; Liu, F.; Gao, Z.; Guan, W.; Peng, Y. Disentangled graph neural networks for session-based recommendation. IEEE Trans. Knowl. Data Eng. 2022, 35, 7870–7882. [Google Scholar] [CrossRef]
Qiao, J.; Wang, L. Modeling user micro-behaviors and original interest via Adaptive Multi-Attention Network for session-based recommendation. Knowl.-Based Syst. 2022, 244, 108567. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Zhou, X.; Shu, W.; Lin, F.; Wang, B. Confidence-weighted bias model for online collaborative filtering. Appl. Soft Comput. 2018, 70, 1042–1053. [Google Scholar] [CrossRef]
Wang, R.; Jiang, Y.; Lou, J. Attention-based dynamic user preference modeling and nonlinear feature interaction learning for collaborative filtering recommendation. Appl. Soft Comput. 2021, 110, 107652. [Google Scholar] [CrossRef]
Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1831–1839. [Google Scholar]
Luo, A.; Zhao, P.; Liu, Y.; Zhuang, F.; Wang, D.; Xu, J.; Fang, J.; Sheng, V.S. Collaborative Self-Attention Network for Session-based Recommendation. In Proceedings of the IJCAI, Yokohama, Japan, 11–17 July 2020; pp. 2591–2597. [Google Scholar]
Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 346–353. [Google Scholar]
Qiu, R.; Li, J.; Huang, Z.; Yin, H. Rethinking the item order in session-based recommendation with graph neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 579–588. [Google Scholar]
Zeng, J.; Xie, P. Contrastive self-supervised learning for graph classification. In Proceedings of the AAAI conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 10824–10832. [Google Scholar]
Wang, Z.; Wei, W.; Cong, G.; Li, X.L.; Mao, X.L.; Qiu, M. Global context enhanced graph neural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 24–25 July 2020; pp. 169–178. [Google Scholar]
Xia, X.; Yin, H.; Yu, J.; Wang, Q.; Cui, L.; Zhang, X. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 4503–4511. [Google Scholar]
Sun, F.Y.; Hoffmann, J.; Verma, V.; Tang, J. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv 2019, arXiv:1908.01000. [Google Scholar]
Ghosh, S.; Roy, M.; Ghosh, A. Semi-supervised change detection using modified self-organizing feature map neural network. Appl. Soft Comput. 2014, 15, 1–20. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, Virtual, 24–25 July 2020; pp. 639–648. [Google Scholar]
Wang, H.; Yan, S.; Wu, C.; Han, L.; Zhou, L. Cross-view temporal graph contrastive learning for session-based recommendation. Knowl.-Based Syst. 2023, 264, 110304. [Google Scholar] [CrossRef]
Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3558–3565. [Google Scholar]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 10–15 May 2019; pp. 6861–6871. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI’09, Arlington, VA, USA, 18–21 June 2009; pp. 452–461. [Google Scholar]

Figure 1. An overview of the proposed EMUI method.

Figure 2. The details of the proposed IMM.

Figure 3. Impact of layers of HGCN (

λ

= 0.001).

Figure 3. Impact of layers of HGCN (

λ

= 0.001).

Figure 4. Impact of the depth of MIGM (

λ

= 0.001).

Figure 4. Impact of the depth of MIGM (

λ

= 0.001).

Table 1. Symbol table.

Notations	Descriptions
V	Item set contains all items
S	Session set contains all sessions
$v_{i}$	An item in item set
$s_{i}$	A session in session set
E	Hyperedge set
e	A hyperedge in hyperedge set
W	The diagonal matrix of hyperedge weights
H	The hypergraph association matrix
D	The degree matrices of nodes
B	The degree matrices of hyperedges
x	Item embedding set
$x_{i}$	An item embedding
I	User initial intent set
$i_{k}$	k-th user initial intent in a session
$x_{s}$	The average item embedding in a session
$h_{g}$	User general intent representation
$h_{s}$	The final user intent representation

Table 2. Experimental results (%) on the three datasets.

Dataset	Tmall				RetailRocket				Taobao
Methods	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20
Item-KNN	6.68	3.12	9.20	3.34	21.41	9.78	35.26	16.58	0.13	0.08	0.14	0.08
FPMC	13.05	7.11	16.08	7.34	20.59	8.53	32.37	13.82	0.80	0.71	0.80	0.72
GRU4Rec	9.50	5.75	10.98	5.92	31.01	15.37	44.01	23.67	0.87	0.77	0.89	0.77
NARM	19.21	10.39	23.35	10.68	44.74	25.54	50.22	24.59	0.90	0.75	0.94	0.75
STAMP	22.46	13.08	26.44	13.35	43.14	26.65	50.96	25.17	0.77	0.44	0.84	0.44
SRGNN	23.49	13.47	27.65	13.76	44.88	26.95	50.32	26.65	17.17	11.96	23.48	12.03
GCEGNN	28.03	15.07	33.41	15.43	46.38	27.96	54.58	28.09	27.46	13.98	32.75	14.14
DHCN	26.24	14.63	31.51	15.08	46.32	27.85	53.66	27.30	25.98	13.75	30.63	14.16
STGCR	29.13	16.45	34.28	16.86	48.84	29.40	56.45	29.40	30.13	17.26	32.14	17.35
AMAN	28.64	16.13	33.52	16.37	47.45	28.92	55.84	29.17	26.00	15.55	36.20	18.17
ours (EMUI)	31.19	18.58	36.64	18.86	49.30	29.47	56.82	29.99	34.72	23.61	37.37	23.78

The best results are boldfaced.

Table 3. Performance comparison of the model variants.

Dataset	Tmall				RetailRocket				Taobao
Variant	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20
DHCN	26.24	14.63	31.51	15.08	46.32	27.85	53.66	27.3	25.98	13.75	30.63	14.16
EMUI w/o MIGM	28.92	16.93	34.38	17.17	47.98	29.14	55.97	29.32	32.59	21.87	35.44	22.53
EMUI w/o IMM	30.12	18.03	35.99	18.34	48.57	28.66	55.89	28.94	33.75	22.74	36.11	22.81
EMUI w/o CL	30.25	18.12	36.24	18.55	48.88	28.92	56.13	29.08	33.94	22.81	36.16	23.05
EMUI	31.19	18.58	36.64	18.86	49.3	29.47	56.82	29.99	34.72	23.61	37.37	23.78

The best results are boldfaced.

Table 4. Impact of different fusion methods.

Dataset	Tmall				RetailRocket				Taobao
Methods	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20
ours	31.19	18.58	36.64	18.86	49.30	29.47	56.82	29.99	34.72	23.61	37.37	23.78
averaging	30.86	18.09	36.78	18.35	49.76	29.26	56.53	29.45	34.32	24.45	37.41	23.66
summaration	30.89	17.97	36.46	18.57	49.35	29.34	56.38	29.60	33.15	23.54	36.11	23.39
concatenation	26.89	15.17	32.45	15.30	46.30	27.59	54.38	27.76	30.97	20.22	33.79	20.95

The best results are boldfaced.

Table 5. Impact of hyper-parameter

λ

.

Table 5. Impact of hyper-parameter

λ

.

Dataset	Tmall				RetailRocket				Taobao
$λ$	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20	P@10	M@10	P@20	M@20
0	30.25	18.12	36.24	18.55	48.88	28.92	56.13	29.08	33.94	22.81	36.16	23.05
0.0001	30.98	18.48	36.55	18.79	48.29	28.79	55.52	28.92	33.82	23.07	36.69	23.18
0.001	31.19	18.58	36.64	18.86	49.30	29.47	56.82	29.99	34.72	23.61	37.37	23.78
0.005	30.64	18.31	36.41	18.68	48.56	28.85	55.80	28.99	33.87	22.95	36.45	23.12
0.01	30.35	18.17	36.28	18.58	48.80	28.90	56.04	29.06	33.92	22.85	36.24	23.07

The best results are boldfaced.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, J.; Wei, J.; Lu, G. Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation. Electronics 2025, 14, 207. https://doi.org/10.3390/electronics14010207

AMA Style

Ding J, Wei J, Lu G. Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation. Electronics. 2025; 14(1):207. https://doi.org/10.3390/electronics14010207

Chicago/Turabian Style

Ding, Jiawei, Jinsheng Wei, and Guanming Lu. 2025. "Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation" Electronics 14, no. 1: 207. https://doi.org/10.3390/electronics14010207

APA Style

Ding, J., Wei, J., & Lu, G. (2025). Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation. Electronics, 14(1), 207. https://doi.org/10.3390/electronics14010207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficiently Exploiting Muti-Level User Initial Intent for Session-Based Recommendation

Abstract

1. Introduction

2. Related Works

2.1. Traditional Methods

2.2. Deep Learning-Based Methods

2.3. Self-Supervised Learning in RS

3. Methodology

3.1. Problem Statement

3.2. The Proposed EMUI

3.2.1. Item Features Learning Based on HCGN

3.2.2. User Initial Intent Learning

3.2.3. General Intent Learning

3.2.4. User Intent Dynamic Fusion

3.2.5. Make Recommendations

4. Experiments

4.1. Experimental Settings

4.1.1. Datasets

4.1.2. Competing Methods

4.1.3. Parameter Settings

4.1.4. Evalution Metrics

4.2. Overall Performance (RQ1)

4.3. Impact of EMUI Components (RQ2)

4.4. Impact of the Depth of EMUI and MIGM (RQ3)

4.5. Impact of Different Fusion Methods of User General and Initial Intent (RQ4)

4.6. Impact of the Hyper-Parameters λ (RQ5)

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.6. Impact of the Hyper-Parameters $λ$ (RQ5)