Skip-Gram and Transformer Model for Session-Based Recommendation

Celik, Enes; Ilhan Omurca, Sevinc

doi:10.3390/app14146353

Open AccessArticle

Skip-Gram and Transformer Model for Session-Based Recommendation

by

Enes Celik

^1,2,*,†

and

Sevinc Ilhan Omurca

^1,†

¹

Computer Engineering Department, Faculty of Engineering, Kocaeli University, Kocaeli 41380, Türkiye

²

Computer Science Department, Babaeski Vocational School, Kirklareli University, Kirklareli 39200, Türkiye

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(14), 6353; https://doi.org/10.3390/app14146353

Submission received: 25 June 2024 / Revised: 18 July 2024 / Accepted: 19 July 2024 / Published: 21 July 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Session-based recommendation uses past clicks and interaction sequences from anonymous users to predict the next item most likely to be clicked. Predicting the user’s subsequent behavior in online transactions becomes a problem mainly due to the lack of user information and limited behavioral information. Existing methods, such as recurrent neural network (RNN)-based models that model user’s past behavior sequences and graph neural network (GNN)-based models that capture potential relationships between items, miss different time intervals in the past behavior sequence and can only capture certain types of user interest patterns due to the characteristics of neural networks. Graphic models created to improve the current session reduce the model’s success due to the addition of irrelevant items. Moreover, attention mechanisms in recent approaches have been insufficient due to weak representations of users and products. In this study, we propose a model based on the combination of skip-gram and transformer (SkipGT) to solve the above-mentioned drawbacks in session-based recommendation systems. In the proposed method, skip-gram both captures chained user interest in the session thread through item-specific subreddits and learns complex interaction information between items. The proposed method captures short-term and long-term preference representations to predict the next click with the help of a transformer. The transformer in our proposed model overcomes many limitations in turn-based models and models longer contextual connections between items more effectively. In our proposed model, by giving the transformer trained item embeddings from the skip-gram model as input, the transformer has better performance because it does not learn item representations from scratch. By conducting extensive experiments with three real-world datasets, we confirm that SkipGT significantly outperforms state-of-the-art solutions with an average MRR score of 5.58%.

Keywords:

session-based recommendation; skip-gram; transformer; recommender system

1. Introduction

In recent times, mainly due to the impact of the pandemic, there has been a dramatic increase in individuals’ use of the Internet for information searches and shopping. This trend has led to a significant rise in the volume and variety of product offerings on e-commerce platforms, driving the rapid proliferation of recommendation systems designed to enhance users’ shopping experiences and present more attractive products [1]. Recommendation systems have become critical across various industries in the era of information abundance, aiming to reduce information overload for users [2,3]. Typically, these systems provide recommendations based on data from three primary sources: products, user characteristics, and current status [4,5]. However, the lack of profile information for new users hampers the effectiveness of recommendation systems. To address this drawback, session-based recommendation (SBR) systems, which rely on short-term interactions with anonymous users, have gained prominence. SBR has become an increasingly popular research topic and the focus of many scholars [6,7]. The basic principle of SBR is to leverage short interaction sequences of anonymous users, analyze the conversion patterns within these sequences, and predict the next most likely item to be clicked. Early implementations of SBR used a neighborhood-based model, providing recommendations based on statistical similarities between items within the current session.

With increasingly stringent information laws, users are becoming more concerned about the privacy of their personal data, making it difficult to collect user characteristics and past purchase data. This challenge highlights the importance of session-based recommendation (SBR) systems, which aim to understand and predict users’ preferences based on their most recent interactions. SBR systems operate by analyzing a specific set of user interaction records. Given the dynamic changes over time and across different contexts, effectively modeling the limited behaviors observed during a session and predicting the user’s following likely action has become crucial.

Solving the session-based recommendation problem requires significant effort [8,9,10]. Initially, many recommendation methodologies utilize interaction-based timestamps to rank items, focusing on sequence patterns to predict the next item of interest. A simple Markov chain-based approach aims to predict the user’s next click based on previous behavior using the independence assumption, which has limitations in complex interaction scenarios [11,12]. Subsequently, deep learning-based recommendation systems, including recurrent neural network (RNN) methods, which can model longer-term historical behavior based on contextual information, have become mainstream [13,14,15]. However, effectively modeling complex needs within limited user behaviors remains a challenge. In SBRs, RNN-based approaches struggle to determine item relationships and accurately assess user preferences due to limited user interaction [16]. Additionally, graph neural network (GNN) methods focus on investigating relationships between items but do not account for the user’s time-varying behavioral patterns during the discovery process [17,18,19]. These methods often break down the sequential structure of the session, concluding that priority depends on the last item in the current session. Consequently, they fail to represent the temporal dynamics in recommender systems, reducing their effectiveness. Thus, the order of clicks and the time intervals between clicks throughout a user session must be considered.

Several key factors need to be considered to better understand user behavior in session-based recommendation systems. First, sessions with the same order but different time intervals can be detected as similar behavioral patterns by RNN-based methods. High click activity occurring at shorter intervals may indicate intense user interest in particular content. Second, sessions with identical interaction structures but different click orders are often undetected by GNN-based methods. Third, users’ interest over short periods suggests significant time sensitivity, making time intervals a critical marker in determining user interests. Although extensive research has been conducted on session-based recommendation systems, user behavior patterns within session sequences have not been comprehensively examined. It has been noted that the user’s primary goal in the current session sequence is often to find the next satisfying item with an exploratory approach. Users may sometimes click on the same item more than once. The user’s click sequence can range from extensive research to individual product comparisons, viewed as a long-term activity. In this process, it becomes clear that not all items in the click sequence directly contribute to the user’s decision-making. Hence, there is a need to model user behavior in more detail.

Briefly, we highlight the main challenges encountered in session-based recommendation (SBR) systems as follows: (1) How to explicitly account for the impact of time intervals on the user’s long-term and short-term preferences. (2) How to model the user’s active exploration behavior throughout a session without disrupting the temporal order of clicked items. (3) How can these behavioral patterns be simulated from an appropriate perspective and emphasized in terms of next-click prediction, considering the unequal importance users attach to each item during the product research or comparison process? (4) How to represent instances in the session data where the user does not make any purchases or makes incorrect clicks.

In order to overcome the above-mentioned challenges and improve the session-based recommendation, we propose the SkipGT model, which combines skip-gram and transformer models. First we applied the skip-gram model to session data and create the item embeddings. Then the item embeddings were given to a transformer model. In our study, the skip-gram model which was used to predict the context items of a specific target item effectively captures representation relationships in a dense vector space where each item is represented as a vector. On the other hand, the transformer model is designed in a non-iterative structure to process sequential data. The applied transformer leverages self-attention mechanisms to evaluate the importance of items in a session regardless of their spatial distance, allowing it to handle context and connections in data more effectively than previous models. Where existing methods fall short, we improve performance in the recommender system by using skip-gram to model the context of an item in the session and the transformer method that takes into account the long-term dependencies of items in the session.

We have conducted comprehensive experiments on two real-world datasets involving retail sales rep sessions and browsing records. These experiments demonstrate that our proposed SkipGT model significantly outperforms existing state-of-the-art solutions. Consequently, the main contributions of our work are summarized as follows:

We proposed the SkipGT model to identify users’ global and current interests, behavioral patterns, and critical preferences.
We utilized skip-gram pre-training for the first time to generate session vector embeddings in session-based recommender systems.
To the best of our knowledge, this is the first model in session-based recommendation systems that combines the skip-gram model and transformer in natural language processing.
We introduced a model that captures bidirectional conversions in session-based subreddits and represents users with different interest patterns.
The proposed model’s efficacy is substantiated through rigorous experimentation, where it is compared against contemporary benchmark models across three distinct datasets.

The rest of the paper is organized as follows. In Section 2, we present the literature review of session-based recommendation studies. Section 3 briefs about problem definition, model optimization, and the proposed hybrid model. In Section 4, we describe in detail the experimental design, the datasets used, the evaluation metrics, the comparison of experimental results with baseline methods and discuss these findings in the context of existing research, the ablation experiment, and the hyperparameter analysis. Section 5 concludes the present study and gives the main points and recommendations for future work.

2. Related Work

This section presents a review of the relevant literature. Four main categories of techniques commonly used in session-based recommendation systems (SBRs) are conventional SBR, RNN-based methods, self-attention network (SAN)-based methods, and GNN-based methods. In related studies, each of these techniques has been studied and classified separately.

2.1. Conventional Session-Based Recommendation Methods

Many methods based on Collaborative Filtering (CF) have been proposed, where the user’s next click action is predicted based on past click behavior [20]. A simple approach that combines Markov chain and matrix factorization methods is used for the next recommendation [12]. Early work in session-based recommendation relies on traditional machine learning methods to explore recommendation dependencies. For instance, Item-KNN computes recommendations based on the cosine similarity between common items across sessions [21]. S-KNN determines the K most similar sessions by calculating the similarity of the current session to other sessions using methods such as Jaccard or cosine similarity [22]. However, these methods assume that all items in a session contribute equally to suggestions [7]. Additionally, traditional approaches completely neglect the transformation between items. The methods above focus on models that require past clicks of the current user, which is unsatisfactory for real-world scenarios. The CSRM model employs a dual parallel memory architecture, integrating both session-specific data and collaborative neighborhood information to facilitate the implementation of session-based recommendation systems [23].

2.2. RNN-Based Methods

Due to the limitations of conventional methods in session-based recommendation systems, solutions have been developed using better-performing deep neural network models. GRU4Rec employs multiple GRUs to model the user’s click sequence and recent engagement. It is also one of the earliest RNN-based methods applied to a session-based recommendation [13]. To improve the efficiency of recommendations, a sequential data augmentation method has been proposed [24]. HRNN, a hierarchical RNN approach, captures both intra-session and inter-session dependencies to represent the following behavior [25]. However, RNN-based methods face challenges in modeling item sequences within sessions due to gradient explosion and gradient vanishing problems inherent in their recursive structure. NARM leverages attention mechanisms and recurrent neural networks, emphasizing the hidden state corresponding to the final time step as the user’s predominant interest. Additionally, NARM employs a hybrid encoder to enhance the capabilities of RNNs [26].

2.3. SAN-Based Methods

Transformer models [27] are frequently used in various fields, including self-attention mechanism-based networks (SAN) for recommendation systems, due to their superior performance in natural language processing [28,29]. SASRec is a model that analyzes users’ transaction histories with individual attention blocks to predict the next recommendation [30]. BERT4Rec employs a bidirectional self-attention network to model user behavioral sequences, allowing each item in the sequence to integrate information from both the left and right contexts for sequential recommendations [31]. CoSAN focuses on session-based representations and predicts the last item of the current session by considering session neighbors [32]. However, SAN-based methods may only be suitable if they have tightly coupled domains of interest, such as natural language. STAMP explicitly prioritizes the most recent interest indicated by the last click, integrating an attention mechanism and Multilayer Perceptron (MLP) to capture both short-term and long-term event histories within a session [14]. SR-IEM represents an enhanced self-attention mechanism designed to more precisely gauge the significance of items, generating conclusive recommendations by amalgamating both the user’s enduring preferences and immediate interests [9].

A graph-enhanced and collaborative attention network (GCAN) addresses user interest modeling issues by utilizing graph-enhanced attention to capture user interest over item-specific subsequences and collaborative attention to model item representations across sessions [33]. CTA is an attention-driven sequential neural framework that concurrently integrates temporal and contextual cues for modeling sequential behaviors [34]. The IGT solution for session-based recommendation considers item relations and time intervals, including an interval-enhanced session graph, a Graph Transformer with time intervals to learn item interactions and a preference representation for predicting the next click based on user preferences [35]. MTD introduces a multi-task learning framework with multi-level transition dynamics for learning intra- and inter-session item transitions, incorporating a position-aware attention mechanism and a graph-structured hierarchical relation encoder for capturing item transitions [36]. GC-SAN is structured using a GNN model followed by a SAN [37]; however, the structural information from GNNs is significantly weakened when converted into arrays and passed through the SAN layer. GNN-based approaches often focus on local transformation relationships between pairs of items while tending to neglect the broader evolutionary processes occurring in users’ behavior chains.

2.4. GNN-Based Methods

Since graph neural networks (GNNs) show high performance in areas such as knowledge graph extraction and next location prediction, a large number of GNN-based recommendation system approaches have been developed recently [38,39]. SR-GNN is the first approach to use GNNs in session-based recommendations [16]. SGNN-HN model uses a Star Graph Neural Network (SGNN) to capture the complex transition relationships between items within an active user session. To prevent overfitting, Highway Networks (HNs) are utilized to dynamically select the most relevant embeddings from the item representations [40]. FGNN employs weighted graph-based attention networks to determine relationships between items by structuring sessions as weighted graphs [19]. LESSR evaluates the information loss in the graph generation process and proposes a lossless information encoding method [41]. DHCN designs a hypergraph-based convolutional network to improve recommendation tasks [42].

ICM-SR uses representation vectors of neighboring items to model the user’s latent intent and applies a session representation encoder to consider the user’s global and recent preferences [43]. CGL builds a global graph from all sessions and adopts a self-supervised learning strategy to train the recommendation model using information from current and past sessions [44]. DGS-MGNN proposes a dynamic global multi-channel GNN approach that models the global, local, and joint representation of items [45]. LDGC-SR uses an adaptive weight fusion mechanism to combine the user’s long-term trends with global contextual information [46]. GCE-GNN includes collaborative information by incorporating similar sessions or items, which may harm personalized modeling [10]. HG-GNN formulates a heterogeneous global graph neural network architecture that integrates present session preferences with informative item transitions derived from historical sessions of other users, enhancing the inference of user preferences [47].

A session-based recommendation system is characterized in terms of the capture of session-sequential features, the loss function, and the properties of the architecture involved. The characterisation of the related methods is shown in Table 1.

3. Materials and Methods

This section presents a detailed description of the proposed SkipGT model. First, the scope and characteristics of the SBR are explained, and then the essential components and functioning of the proposed model are described.

3.1. Problem Definition

The general task of an SBR is to predict the next item that anonymous users will engage with based on short-term clicks or purchases. Assume that

L = \{I_{1}, I_{2}, \dots, I_{n - 1}, I_{n}\}

represents the set of all unique items in all sessions.

I = [i_{1}, i_{2}, \dots, i_{n - 1}, i_{n}]

refers to the set of all unique items.

§ = \{S_{1}, S_{2}, \dots, S_{n - 1}, S_{n}\}

is a sequential session where

s_{i} \in L

is the ith item clicked by the user in session S and

S_{n}

specifies the length of the session. The goal of the SBR is to predict the next click

S_{n + 1}

for S. Given a session S, the proposed model SkipGT outputs an ordered list of the form

y = [y_{1}, y_{2}, \dots, y_{n - 1}, y_{n}] \in R^{n}

, where

y_{j} (1 \leq j \leq n)

, and the user shows the probability of selecting the next item

i_{j}

. Since the SBR offers users more than one recommendation item, it selects the top K items in the recommendation list in y. The main motivation of the SBR is that by estimating the probability of clicking all candidate items in the current session, the K items with the highest probability are recommended to the user.

3.2. Overview of the Proposed Model

The SkipGT model consists of two modules, as shown in Figure 1: (1) the skip-gram module, which examines user interest across multiple item-specific subreddits and captures the contexts of items in the session, and (2) the transformer module, which uses the attention mechanism to more effectively capture long-distance dependencies in the session.

3.3. Skip-Gram Layer

The skip-gram module, which is a novel component, is designed to capture sequential relationships between items within a user session in SBR tasks. It leverages the skip-gram model’s core principle to learn informative representations of items based on their co-occurrence patterns within a session. The skip-gram module takes a sequence of items representing a user’s session as input. This sequence can be padded or truncated to a fixed length if necessary. During training, a target item within the session is the last item or purchase item chosen. This target item represents the item the model aims to predict the user’s interest in, given the context of surrounding items. In skip-gram, the context window size is set to five. This window captures a fixed number of items before and after the target item, representing the user’s recent browsing history within the related session. Each item in the session sequence is mapped to its corresponding dense embedding vector using an embedding lookup table. These embedding vectors represent the items in a low-dimensional space, capturing their inherent properties and relationships. The skip-gram layer then computes the dot product between the embedding vector of the target item and the embedding vectors of items within the context window. Intuitively, items with similar contexts will have higher dot product values, indicating a stronger relationship. A loss function, negative log-likelihood, is used to measure the discrepancy between the predicted probabilities and the actual occurrence of items in the context window. Backpropagation is employed to propagate the loss signal and update the embedding lookup table, effectively adjusting the embedding vectors of items based on their co-occurrence patterns within sessions. The numerical values depicted in the diagram signify the weight matrices, which modulate the successive transformations applied to the input data to generate the output. The training process for this network involves optimizing these weight matrices’ values, enabling the output to closely approximate the training data. Upon completion of the training phase, the output layer is typically discarded, while the hidden layer, referred to as the embeddings, is utilized for subsequent processes. These embeddings (d = 128) are vector representations of each item, where items exhibiting similarities are encoded as vectors positioned in close proximity within the embedding space. Item embedding vectors are then given to the transformer’s input embedding and output embedding. The probabilities of skip-gram for items are calculated as in Equation (1):

p (i t e m_{c, j} = i t e m_{O, c} | i t e m_{I}) = \frac{\exp u_{c, j}}{\sum_{j^{'} = 1}^{V} \exp u_{c, j^{'}}}

(1)

i t e m (c, j)

is the jth item predicted on the cth context position;

i t e m (O, c)

is the actual item present on the cth context position;

i t e m (I)

is the only input item; and

u (c, j)

is the jth value in the u vector when predicting the item for cth context position. It is calculated with the loss function Equation (2), which is used in the training of the skip-gram model.

L = - \log P (i t e m_{c, 1}, i t e m_{c, 2}, \dots, i t e m_{c, C} | i t e m_{o})

(2)

= - \log \prod_{c = 1}^{C} P (i t e m_{c, i} | i t e m_{o})

(3)

= - \log \prod_{c = 1}^{C} \frac{\exp (u_{c, j^{*}})}{\sum_{j = 1}^{V} \exp (u_{c, j})}

(4)

= - \sum_{c = 1}^{C} u_{c, j^{*}} + \sum_{c = 1}^{C} \log \sum_{j = 1}^{V} \exp (u_{c, j})

(5)

As we want to maximize the probability of predicting

i t e m (c, j)

on the cth context position, we can represent the loss function L.

3.4. Transformer Layer

The transformer layer leverages a self-attention mechanism, enabling the model to dynamically assign weights to individual input tokens within sequences derived from skip-gram embeddings. This mechanism facilitates the capture of long-range dependencies and relationships between tokens across the entire input sequence. We use the transformer which is presented by Vasvani et al. [27]. The transformer comprises a sequence of sub-layers, including multi-head self-attention and position-wise feedforward networks. The multi-head self-attention mechanism projects the input sequence into multiple parallel representations through concurrent applications of attention. In this case, it enables the model to attend to diverse aspects of the input concurrently. Representations are subsequently concatenated and linearly transformed to generate the final output. The position-wise feedforward network introduces non-linearity, further augmenting the model’s capacity to capture intricate patterns within the data. Overall, the transformer layer revolutionized recommendation engines by providing a scalable and efficient architecture for processing sequential data.

Inherent to the transformer architecture is the absence of a native mechanism for capturing sequential order within its input. Positional encodings are incorporated to address this limitation. Encodings are introduced by summation with the input embeddings prior to their presentation to the transformer layers. This summation process empowers the model to distinguish between tokens based on their relative positions within the sequence. A prevalent approach for generating positional encodings leverages trigonometric functions, such as sine and cosine. The calculation of these encoding vectors considers both the position of a token and the dimensionality of the embedding space. The position encoding is calculated with the mathematical Equation (6):

P E_{(p o s, 2 i)} = \sin (\frac{p o s}{10000^{2 i / d_{m o d e l}}})

(6)

P E_{(p o s, 2 i + 1)} = \cos (\frac{p o s}{10000^{2 i / d_{m o d e l}}})

(7)

where pos is the position of the item in the sequence, i is the dimension index of the positional encoding,

d_{m o d e l}

is the dimensionality of the model’s embeddings.

These positional encodings are co-trained with the remaining model parameters during the training process. This approach furnishes the model with explicit positional information without introducing the burden of supplementary parameters. The combined application of positional encodings with skip-gram embeddings empowers the model to effectively exploit both the context information gleaned from the skip-gram embeddings and the positional context conveyed by the encodings. The mathematical model of the transformer layer can be described as follows:

As depicted in Equation (8), the self-attention mechanism is mathematically defined as:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(8)

where

Q = X W^{Q}, K = X W^{K}, V = X W^{V},

and

d_{k}

is the dimension of the key vectors. As depicted in Equation (9), multi-head attention is mathematically defined as:

M u l t i H e a d (X) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{O}

(9)

where

h e a d_{i} = A t t e n t i o n (X W_{i}^{Q}, X W_{i}^{K}, X W_{i}^{V})

, and

W^{O}

is the output projection matrix.

As depicted in Equation (10), the position-wise feedforward network is mathematically defined as:

F F N (X) = R e L U (X W_{1} + b_{1}) W_{2} + b_{2}

(10)

where

W_{1}

and

W_{2}

are weight matrices, and

b_{1}

and

b_{2}

are bias vectors. As depicted in Equation (11), layer normalization and residual connection are mathematically defined as:

L a y e r N o r m (X + M u l t i H e a d (X))

(11)

where X is the input sequence of the item, where

X = \{x_{1}, x_{2}, \dots, x_{n}\}, W^{Q}, W^{K}, W^{V}

as the weight matrices for queries, key, and values, respectively, softmax as the softmax function, ReLU as the rectified linear unit activation function,

F F N (\cdot)

as the position-wise feedforward network,

L a y e r N o r m (\cdot)

as the layer normalization operation, Attention() as the scaled dot-product attention mechanism.

3.5. Model Optimization

In the context of transformer models applied to recommendation tasks, the utilization of the cross-entropy loss function is both feasible and beneficial. Cross-entropy loss is widely adopted for optimizing recommendation models as it quantifies the dissimilarities between predicted recommendations and actual user preferences. By penalizing deviations between predicted probabilities and ground truth labels, this loss function guides the training process toward more accurate predictions. Additionally, in transformer architectures, cross-entropy loss helps mitigate common optimization challenges such as vanishing gradients and learning rate decay, thereby enhancing the model’s robustness and effectiveness. Mathematically, the cross-entropy loss between the actual value y and the predicted outcome

\hat{y}

is defined as the negative sum of the item-wise multiplication of the actual values and the logarithms of the predicted probabilities, providing a comprehensive measure of prediction accuracy across all items or labels. Cross-entropy loss calculated with Equation (12):

L (y, \hat{y}) = - \frac{1}{N} \sum_{i}^{N} \sum_{j}^{C} y_{i j} l o g (\hat{y_{i j}})

(12)

4. Experiments

We conducted a series of comprehensive experiments to evaluate the effectiveness and practical applicability of our session-based recommendation model. This section details the experimental methodology, followed by a presentation and analysis of the results. Firstly, we provide a detailed description of the experimental framework. This study investigates the following key research questions (RQs):

RQ1: Comparative Performance: How does the proposed model perform compared to existing state-of-the-art approaches?
RQ2: Component Impact: To what degree do the individual components of the model influence its effectiveness? Does each significant module contribute to improved recommendation performance?
RQ3: Transformer Parameterization: How do different parameter configurations within the transformer module affect the model’s performance?
RQ4: Hyperparameter Influence: What is the impact of the main hyperparameters on the experimental outcomes?

4.1. Dataset

To evaluate the effectiveness of our proposed model and benchmark approaches, we leverage two publicly available real-world datasets: DIGINETICA and YOOCHOOSE. These datasets capture user interaction data within distinct application domains. Table 2 summarizes the key statistics associated with each dataset. The DIGINETICA dataset, originating from the CIKM 2016 conference, comprises user purchase behaviors on a website. The YOOCHOOSE dataset, released at the RecSys 2015 conference, contains user click data collected over a six-month period on an e-commerce website.

According to prior research methodologies [16,42], sessions comprising solely one item are excluded from benchmark datasets. To ensure equitable comparisons, items occurring fewer than five times are identified, and sessions of average length across the three datasets are filtered out. Subsequently, data from the final days are designated as the test set, while the remaining data constitutes the training set. A sequence data augmentation technique is implemented, whereby for a session

S = \{s_{1}, s_{2}, \dots, s_{N - 1}, s_{N}\}

, a sequence of input sessions and corresponding labels

\{[s_{1}], s_{2}\}, \{[s_{1}, s_{2}], s_{3}\}, \dots, \{[s_{1}, \dots, s_{N - 1}], s_{N}\}

are generated. The associated label represents the last click or purchase within the current session, serving to validate the accuracy of the predicted item.

4.2. Evaluation Metrics

Leveraging prior research [10,42], this study employs two well-established evaluation metrics: Precision@K (P@K) and Mean Reciprocal Rank@K (MRR). The P@K metric measures whether the test item appears in the recommendation list. The MRR@K focuses on the ranking quality, assigning a greater score to relevant items positioned higher in the list. The calculation of these metrics is in Equations (13) and (14), respectively.

P @ K = \frac{n_{h i t}}{N}

(13)

M @ K = \frac{1}{N} \sum_{v_{t a r g e t} \in S_{t e s t}} \frac{1}{R a n k (v_{t a r g e t})}

(14)

where,

v_{t a r g e t}

represents the target item, while

R a n k (v_{t a r g e t})

denotes the ordinal position of the target item within the list of recommended items. In this paper, according to the previous studies followed, we consider Top-K (K = 20) values. All experimental results are presented as % deletion.

4.3. Implementation Details

We leverage the TensorFlow framework to implement the SkipGT model. All experiments are conducted on a server equipped with an NVIDIA A100 GPU. The Adam optimizer is consistently employed for all gradient-based optimization tasks. An embedding size of 128 is uniformly applied to items across all models, including the initial embedding layer.

We conduct a grid search for the learning rate and batch size, evaluating values within the ranges {0.0001, 0.001, …, 0.1} and {32, 64, …, 512}, respectively. With regards to hyperparameter fine-tuning, the hidden size is explored within the range {32, 64, …, 512}, attention distance {8, 1, 12}, smoothing factor {0.25, 0.2, 0.2}, and the dropout ratio is investigated across {0.0, 0.1, …, 0.9}. For the transformer module specifically, the number of multi-head attention units is selected from {1, 2, …, 16}, and the number of encoder–decoder stacks is explored within {1, 2, …, 16}. It is important to note that default hyperparameters are utilized for all baseline methods.

The model is configured with a maximum input sequence length of 32 and a hidden dimension of 128. While default hyperparameters are employed for all baseline methods, it is worth mentioning that the NARM method [26] utilizes 128-dimensional item embeddings, an initial learning rate of 0.001, and a fixed mini-batch size of 512. An early stopping strategy is implemented during the testing phase. Specifically, the training process terminates and outputs the recommendation results if no improvement in P@20 or MRR@20 metrics is observed on the validation dataset for 20 consecutive epochs.

The optimal hyperparameter settings identified through experimentation on the three datasets are as follows: layer numbers {4, 5, 6, 7}, learning rate {0.0001, 0.0005, 0.001, 0.005}, and hidden layer {64, 128, 256, 512}.

4.4. Baseline Methods

To evaluate the effectiveness of the proposed model, we compare it with state-of-the-art methods.

Item-KNN [21] employs a nearest-neighbor approach based on cosine similarity to recommend items within a session.
FPMC [12] leverages a combination of Markov chains and matrix decomposition for recommendation purposes.
GRU4Rec [13] models user interest sequences and utilizes the final state for making recommendations.
NARM [26] employs attention mechanisms and recurrent neural networks (RNNs) to capture users’ primary interests.
STAMP [14] incorporates the user’s general intents from the current session and the most recent click.
SASRec [30] utilizes multiple transformer layers to model user interests across session sequences.
SR-GNN [16] leverages gated GNNs to learn complex interactions between items within session graphs.
GC-SAN [37] employs GNNs to capture intricate item transformations and long-term dependencies within sessions while additionally incorporating self-attention networks (SANs).
GCE-GNN [10] constructs two distinct graphs to capture local and global contextual information for recommendation purposes.
DHCN [42] utilizes a hypergraph convolutional network to represent item-session-item relationships for recommendations.
MTD [36] extracts intra-session and inter-session transformation relationships using a multi-tasking and multi-level network architecture.
ICM-SR [43] retrieves representations of neighboring items while considering both a user’s general and recent preferences.
CGL [44] integrates information from the current session with global information from all sessions to generate recommendations.
DGS-MGNN [45] incorporates a dynamic global multi-channel GNN that models items at three levels: global, local, and consensus.
LDGC-SR [46] utilizes an adaptive weight fusion mechanism to combine long-term dependencies with global contextual information.
GCAN [33] employs graph-enhanced attention to capture user interest in item-specific subsequences and collaborative attention to model item representations across sessions.
CSRM [23] leverages two parallel memory modules to incorporate both current session information and collaborative neighborhood data for session-based recommendation.
FGNN [19] employs a weighted attention graph layer to learn item embeddings within a session and utilizes a readout function to obtain a session sequence embedding representing the current session’s intent.
CTA [34] implements an attention-based sequential neural architecture that jointly models temporal and contextual information for sequential behavior modeling.
SR-IEM [9] introduces an improved self-attention mechanism that more accurately estimates item importance, generating final recommendations by combining a user’s long-term preferences with their current interests.
HG-GNN [47] cleverly combines information about current session preferences with valuable item-transition data extracted from other users’ historical sessions, leading to improved user preference inference.
IGT [35] utilizes an interval-enhanced session graph along with a graph transformer with time intervals.

4.5. Results and Discussion

This subsection presents performance of the proposed method against baseline methods. The results are shown in Table 3, with the best overall performing method highlighted in bold.

When the results given in Table 3 are examined, it is clearly shown that:

The proposed SkipGT model: Achieves superior performance on all metrics compared to the baseline methods, thereby substantiating the efficacy of the proposed approach.

Traditional baselines: Traditional methods generally exhibit the weakest performance among all approaches evaluated. Item-KNN, which neglects the potential transformations between items, demonstrates the lowest average performance. Similarly, FPMC’s limitation to modeling only first-order item relationships hinders its effectiveness compared to neural network-based techniques. Overall, traditional methods suffer from a lack of expressiveness in capturing complex item relationships, ultimately leading to inferior performance compared to neural network methods. Consistent with prior findings [14,16], FPMC exhibits high memory requirements for processing large-scale datasets. This memory limitation precluded the evaluation of FPMC’s performance on the Yoochoose 1/4 dataset.

RNN-based baselines: Pioneering the application of deep neural networks (DNNs) to session-based recommendation, GRU4Rec exhibits the lowest performance among DNN-based approaches. This limitation stems from its recurrent neural network (RNN) architecture, which can only capture one-directional transformations between adjacent items within a session. Overall, neural network-based methods like GRU4Rec, NARM, and STAMP outperform traditional methods. Specifically, the effectiveness of session-based recommendation is attributed to DNN techniques, which include the recurrent structure of GRUs, attention mechanisms employed by NARM, and the prioritization of short-term memory in STAMP. Notably, STAMP outperforms both GRU4Rec and NARM on the two datasets evaluated. By incorporating both global and local memory to represent the current prediction interest, STAMP can achieve a superior recommendation effect.

SAN-based baseline: SASRec surpasses previous methods by effectively modeling users’ sequential features with a self-attention network (SAN). The SAN offers a more robust fitting capability compared to recurrent neural networks (RNNs). However, SASRec is limited to users exhibiting sequential interest patterns and may struggle with non-sequential user behavior. The SR-IEM model, equipped with an improved self-attention module, outperforms the SR-GNN model. Notably, both SR-GNN and GC-SAN achieve superior performance compared to NARM across all datasets. SR-GNN aggregates click sequences into a graph, enabling the capture of richer item transformation information, thereby highlighting the effectiveness of graph neural networks (GNNs) in session-based recommendation. GC-SAN builds upon SR-GNN by incorporating the weight advantage of its attention mechanism, leading to further enhanced recommendations. The performance of CTA is statistically indistinguishable from GNN-based methods across all datasets, demonstrating CTA’s ability to effectively combine temporal and contextual information. IGT surpasses SR-GNN, showcasing the significant benefit of incorporating time intervals for superior next-click prediction. Furthermore, IGT outperforms CTA to a certain degree, underlining the importance of its interval-enhanced graph transformer module in capturing user behavior patterns. GCAN achieves the best performance among all methods on the DIGINETICA dataset, emphasizing the value of a collaborative approach for modeling item representations across sessions, allowing it to engage users with relevant items within specific subsequences.

GNN-based baselines: In contrast to RNN-based approaches, SR-GNN, FGNN, and GCSAN leverage graph neural networks (GNNs) to aggregate session sequences into graph structures. This enables the capture of intricate transformation information among items, leading to enhanced recommendation performance. However, experimental results demonstrate that simply stacking GNNs and self-attention networks (SANs) to capture user features does not yield significant performance improvements compared to SR-GNN. Algorithms that incorporate collaborative information, such as GCE-GNN, MTD, DHCN, ICM-SR, CGL, DGS-MGNN, and LDGC-SR, exhibit notably better performance than SR-GNN. This finding suggests that information from other sessions can enrich the current session’s data, consequently improving recommendation accuracy. Notably, GCE-GNN outperforms DHCN, ICM-SR, and CGL, highlighting the potential negative impact of introducing irrelevant item embeddings into the current session’s recommendation process. Furthermore, LDGC-SR and MTD achieve superior results compared to GCE-GNN. This indicates that the combination of long-range dependencies and global context, as employed by LDGC-SR, and the fusion of current session interests with collaborative information through multi-task learning, as implemented by MTD, are beneficial for recommendation tasks. HG-GNN exhibits inferior performance on all datasets compared to other GNN-based methods. Conversely, CGL and CTA demonstrate comparable effectiveness.

Finally, our proposed SkipGT model distinguishes itself by comprehensively incorporating both the user’s behavioral patterns and the crucial factors influencing their next click. The experimental results on three datasets demonstrably position SkipGT as the superior model when compared to the alternatives. This finding confirms the effectiveness and superiority of our proposed approach.

4.6. Ablation Study

Experiments have proven that the SkipGT model outperforms baseline methods in session-based recommender systems. Furthermore, in order to investigate the contribution of the skip-gram module in the SkipGT model, only the performance of the transformer model without this proposed module was measured. Comparative results for three different datasets are given in Figure 2. To distinguish the individual contribution of the skip-gram module to SkipGT, an ablation study is performed with the transformer.

Transformer: model without skip-gram layer.

Figure 2 shows that SkipGT outperforms transformer with two evaluation metrics measured for three datasets. This proves that the proposed skip-gram module is an important component in capturing relevant items. Additionally, it is emphasized that pre-training or learning transfer is an important phenomenon.

4.7. Sensitivity Analysis of the Hyperparameters

The effect of the number of layers consisting of encoder and decoder is shown in Figure 3. As the number of layers increases, there is an increasing and then a decreasing trend. This is because more layers generally increase the capacity of the model to learn more complex relationships but at the risk of overfitting. In addition, too few layers prevent the model from capturing long-term dependencies, thus reducing model performance.

The effect of the learning rate is shown in Figure 4. It shows different experimental results as the learning rate increases. In the training phase of the model, a high learning rate leads to faster convergence, while a low learning rate leads to slower convergence. The optimal choice of learning rate shows the robustness of the model when it reaches the optimal value of around 0.001, where the model is relatively stable.

The effect of the number of hidden layers is shown in Figure 5. As can be seen, performance improves as the number of hidden layers increases. As the number of hidden layers increases, the model learns more complex relationships, achieves higher accuracy, and has more meaningful representations with strong abstraction capabilities. However, the performance decreases as the number of hidden layers exceeds 256.

5. Conclusions

The session-based recommendation is a challenging task where recommendations are made based on sequences of items clicked by the user. In this paper, we proposed SkipGT, which is a novel skip-gram- and transformer-based model for session-based recommendation. During sessions, the time intervals between user interactions are ignored. In this case, it makes it difficult to understand higher-order item-item information. SkipGT mainly focuses on problems caused by a lack of user information, hidden behavioral information, or misclicking. In particular, SkipGT captures the user’s different interactions and interest patterns. The contribution of a pre-trained model in session-based recommendation is demonstrated using skip-gram. Furthermore, SkipGT improved item representations by combining two models that capture item relationships in ordered arrays. Experimental results with three real datasets confirm that SkipGT outperforms all state-of-the-art baseline methods.

In the future, we plan to investigate side information such as time and category to help improve the performance of session-based recommender systems. Also, we plan to integrate current methods in natural language processing and bioinformatics into recommendation systems. Another aspect is that session-based recommendation systems can be improved by incorporating location information in the time-series data stream. Such recommendation systems analyze the user’s interactions at a particular time and place and provide personalized recommendations.

Author Contributions

Conceptualization, E.C. and S.I.O.; Methodology, E.C. and S.I.O.; Software, E.C.; Formal analysis, E.C. and S.I.O.; Validation, E.C. and S.I.O.; Investigation, S.I.O.; Resources, E.C. and S.I.O.; Data curation, E.C.; Writing—original draft preparation, E.C. and S.I.O.; Writing—review and editing, E.C. and S.I.O.; Visualization, E.C.; Supervision, S.I.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository. The original data presented in the study are openly available in [DIGINETICA] at [http://cikm2016.cs.iupui.edu/cikm-cup or https://competitions.codalab.org/competitions/11161, accessed on 24 June 2024]. The original data presented in the study are openly available in [YOOCHOOSE] at [http://2015.recsyschallenge.com/challege.html or https://www.kaggle.com/datasets/chadgostopp/recsys-challenge-2015, accessed on 24 June 2024].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gu, Y.; Song, J.; Liu, W.; Zou, L. HLGPS: A home location global positioning system in location-based social networks. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 901–906. [Google Scholar] [CrossRef]
Covington, P.; Adams, J.; Sargin, E. Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar] [CrossRef]
Gu, Y.; Ding, Z.; Wang, S.; Yin, D. Hierarchical user profiling for e-commerce recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining, Virtual Event, 10–13 July 2020; pp. 223–231. [Google Scholar] [CrossRef]
Wu, L.; He, X.; Wang, X.; Zhang, K.; Wang, M. A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation. IEEE Trans. Knowl. Data Eng. 2022, 35, 4425–4445. [Google Scholar] [CrossRef]
Wu, S.; Sun, F.; Zhang, W.; Xie, X.; Cui, B. Graph neural networks in recommender systems: A survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Wang, S.; Zhang, Q.; Hu, L.; Zhang, X.; Wang, Y.; Aggarwal, C. Sequential/session-based recommendations: Challenges, approaches, applications and opportunities. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11 July 2022; pp. 3425–3428. [Google Scholar] [CrossRef]
Wang, S.; Cao, L.; Wang, Y.; Sheng, Q.Z.; Orgun, M.A.; Lian, D. A survey on session-based recommender systems. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
Feng, C.; Shi, C.; Hao, S.; Zhang, Q.; Jiang, X.; Yu, D. Hierarchical social similarity-guided model with dual-mode attention for session-based recommendation. Knowl.-Based Syst. 2021, 230, 107380. [Google Scholar] [CrossRef]
Pan, Z.; Cai, F.; Ling, Y.; de Rijke, M. Rethinking item importance in session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 1837–1840. [Google Scholar] [CrossRef]
Wang, Z.; Wei, W.; Cong, G.; Li, X.L.; Mao, X.L.; Qiu, M. Global context enhanced graph neural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 169–178. [Google Scholar] [CrossRef]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar] [CrossRef]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar] [CrossRef]
Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1831–1839. [Google Scholar] [CrossRef]
Khoali, M.; Tali, A.; Laaziz, Y. Advanced recommendation systems through deep learning. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, Marrakech, Morocco, 31 March–2 April 2020; pp. 1–8. [Google Scholar] [CrossRef]
Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-based recommendation with graph neural networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 346–353. [Google Scholar] [CrossRef]
Agrawal, N.; Sirohi, A.K.; Kumar, S. No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 10775–10783. [Google Scholar] [CrossRef]
Liu, L.; Wang, L.; Lian, T. CaSe4SR: Using category sequence graph to augment session-based recommendation. Knowl.-Based Syst. 2021, 212, 106558. [Google Scholar] [CrossRef]
Qiu, R.; Li, J.; Huang, Z.; Yin, H. Rethinking the item order in session-based recommendation with graph neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 579–588. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar] [CrossRef]
Ludewig, M.; Jannach, D. Evaluation of session-based recommendation algorithms. User Model.-User-Adapt. Interact. 2018, 28, 331–390. [Google Scholar] [CrossRef]
Bonnin, G.; Jannach, D. Automated generation of music playlists: Survey and experiments. ACM Comput. Surv. 2014, 47, 1–35. [Google Scholar] [CrossRef]
Wang, M.; Ren, P.; Mei, L.; Chen, Z.; Ma, J.; De Rijke, M. A collaborative session-based recommendation approach with parallel memory modules. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 345–354. [Google Scholar] [CrossRef]
Tan, Y.K.; Xu, X.; Liu, Y. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 17–22. [Google Scholar] [CrossRef]
Quadrana, M.; Karatzoglou, A.; Hidasi, B.; Cremonesi, P. Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; pp. 130–137. [Google Scholar] [CrossRef]
Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Fan, Z.; Liu, Z.; Wang, Y.; Wang, A.; Nazari, Z.; Zheng, L.; Peng, H.; Yu, P.S. Sequential recommendation via stochastic self-attention. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2036–2047. [Google Scholar] [CrossRef]
Zhao, J.; Zhao, P.; Zhao, L.; Liu, Y.; Sheng, V.S.; Zhou, X. Variational self-attention network for sequential recommendation. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 1559–1570. [Google Scholar] [CrossRef]
Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 197–206. [Google Scholar] [CrossRef]
Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar] [CrossRef]
Luo, A.; Zhao, P.; Liu, Y.; Zhuang, F.; Wang, D.; Xu, J.; Fang, J.; Sheng, V.S. Collaborative Self-Attention Network for Session-based Recommendation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 11–17 July 2020; pp. 2591–2597. [Google Scholar]
Zhu, X.; Zhang, Y.; Wang, J.; Wang, G. Graph-enhanced and collaborative attention networks for session-based recommendation. Knowl.-Based Syst. 2024, 289, 111509. [Google Scholar] [CrossRef]
Wu, J.; Cai, R.; Wang, H. Déjà vu: A contextualized temporal attention mechanism for sequential recommendation. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2199–2209. [Google Scholar] [CrossRef]
Wang, H.; Zeng, Y.; Chen, J.; Han, N.; Chen, H. Interval-enhanced graph transformer solution for session-based recommendation. Expert Syst. Appl. 2023, 213, 118970. [Google Scholar] [CrossRef]
Huang, C.; Chen, J.; Xia, L.; Xu, Y.; Dai, P.; Chen, Y.; Bo, L.; Zhao, J.; Huang, J.X. Graph-enhanced multi-task learning of multi-level transition dynamics for session-based recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 4123–4130. [Google Scholar] [CrossRef]
Xu, C.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Zhuang, F.; Fang, J.; Zhou, X. Graph contextualized self-attention network for session-based recommendation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; Volume 19, pp. 3940–3946. [Google Scholar]
Luo, J.; He, M.; Pan, W.; Ming, Z. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation. Front. Comput. Sci. 2023, 17, 175336. [Google Scholar] [CrossRef]
Tang, G.; Zhu, X.; Guo, J.; Dietze, S. Time enhanced graph neural networks for session-based recommendation. Knowl.-Based Syst. 2022, 251, 109204. [Google Scholar] [CrossRef]
Pan, Z.; Cai, F.; Chen, W.; Chen, H.; De Rijke, M. Star graph neural networks for session-based recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Galway, Ireland, 19–23 October 2020; pp. 1195–1204. [Google Scholar] [CrossRef]
Chen, T.; Wong, R.C.W. Handling Information Loss of Graph Neural Networks for Session-based Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 23–27 August July 2020; pp. 1172–1180. [Google Scholar] [CrossRef]
Xia, X.; Yin, H.; Yu, J.; Wang, Q.; Cui, L.; Zhang, X. Self-Supervised Hypergraph Convolutional Networks for Session-based Recommendation. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 4503–4511. [Google Scholar] [CrossRef]
Pan, Z.; Cai, F.; Ling, Y.; de Rijke, M. An Intent-guided Collaborative Machine for Session-based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, Xi’an, China, 25–30 July 2020; pp. 1833–1836. [Google Scholar] [CrossRef]
Pan, Z.; Cai, F.; Chen, W.; Chen, C.; Chen, H. Collaborative Graph Learning for Session-based Recommendation. ACM Trans. Inf. Syst. 2022, 40, 1–26. [Google Scholar] [CrossRef]
Zhu, X.; Tang, G.; Wang, P.; Li, C.; Guo, J.; Dietze, S. Dynamic global structure enhanced multi-channel graph neural network for session-based recommendation. Inf. Sci. 2023, 624, 324–343. [Google Scholar] [CrossRef]
Qiu, N.; Gao, B.; Tu, H.; Huang, F.; Guan, Q.; Luo, W. LDGC-SR: Integrating long-range dependencies and global context information for session-based recommendation. Knowl.-Based Syst. 2022, 248, 108894. [Google Scholar] [CrossRef]
Pang, Y.; Wu, L.; Shen, Q.; Zhang, Y.; Wei, Z.; Xu, F.; Chang, E.; Long, B.; Pei, J. Heterogeneous global graph neural networks for personalized session-based recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA, 21–25 February 2022; pp. 775–783. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed SkipGT Model.

Figure 2. The results of ablation study: (a) Ablation results according to the precision values of the datasets. (b) Ablation results according to the mean reciprocal rank values of the datasets.

Figure 3. Effect of layer number: (a) Effect of number layers on the precision value for datasets. (b) Effect of number layers on the mean reciprocal rank value for datasets.

Figure 4. Effect of learning rate: (a) Effect of learning rate on the precision value for datasets. (b) Effect of learning rate on the mean reciprocal rank value for datasets.

Figure 5. Effect of hidden layer number: (a) Effect of hidden number layers on the precision value for datasets. (b) Effect of hidden number layers on the mean reciprocal rank value for datasets.

Table 1. Technical comparison between related methods and our method.

Models	Session-Sequential	Loss Function	Methodology
Item-KNN [21]	Session	BPR	KNN
FPMC [12]	Session	S-BPR	LFM+Markov Chains
GRU4Rec [13]	Session	BPR, TOP1	GRU
NARM [26]	Sequential	BCE	Attention
STAMP [14]	Session	BCE	Attention
SASRec [30]	Sequential	BCE	Transformer
SR-GNN [16]	Session	BCE	GNN
GC-SAN [37]	Sequential	BCE	GNN+Attention
GCE-GNN [10]	Session	BCE	GNN+Attention
DHCN [42]	Session	BCE	Graph Convolutional Network
MTD [36]	Sequential	BCE	Graph+Attention
ICM-SR [43]	Session	BCE	Graph+Collaborative
CGL [44]	Session	KLD	Graph+Collaborative
DGS-MGNN [45]	Session	BCE	GNN+Attention
LDGC-SR [46]	Session	BCE	Graph+Attention
GCAN [33]	Sequential	BCE	Graph+Collaborative
CSRM [23]	Session	BCE	Encoder+RNN
FGNN [19]	Session	BCE	GNN
CTA [34]	Sequential	NLL, BPR	Attention
SR-IEM [9]	Session	BCE	Attention
HG-GNN [47]	Sequential	BCE	GNN+Encoder
IGT [35]	Sequential	BCE	Graph+Transformer
SkipGT [Ours]	Both	BCE	Skip-gram+Transformer

Table 2. A summary of the dataset’s key statistics.

Datasets	Diginetica	Yoochoose 1/64	Yoochoose 1/4
Items	43,097	16,766	29,618
Clicks	982,961	557,248	8,326,407
Training Sets	719,470	369,895	5,917,745
Testing Sets	60,858	55,898	55,898
Avg. Length	5.12	6.16	5.71

Table 3. Performance comparison of the proposed SkipGT with the baseline methods.

Methods	Diginetica		Yoochoose 1/64		Yoochoose 1/4
Methods	P@20	MRR@20	P@20	MRR@20	P@20	MRR@20
Item-KNN	35.75	11.57	51.60	21.81	52.31	21.70
FPMC	26.53	6.95	45.62	15.01	-	-
GRU4Rec	29.45	8.33	60.64	22.89	59.53	22.60
NARM	49.70	16.17	68.32	28.63	69.73	29.23
STAMP	45.64	14.32	68.74	29.67	70.44	30.00
SASRec	52.97	18.43	68.39	29.26	68.27	29.22
SR-GNN	50.73	17.59	70.57	30.94	71.36	31.89
GC-SAN	52.33	18.37	69.49	30.25	70.37	31.05
GCE-GNN	54.22	19.04	70.90	31.26	71.40	31.49
DHCN	53.18	18.44	70.74	31.05	71.58	31.72
MTD	53.18	18.33	71.12	31.28	71.83	30.83
ICM-SR	52.28	17.63	71.11	31.23	-	-
CGL	52.11	18.64	69.32	29.96	70.05	31.13
DGS-MGNN	54.13	19.02	72.62	32.49	73.10	33.55
LDGC-SR	54.38	18.96	-	-	-	-
GCAN	54.73	19.69	-	-	-	-
CSRM	49.63	16.47	69.20	29.18	69.61	29.64
FGNN	51.21	17.21	71.03	30.74	71.49	31.08
CTA	50.51	17.19	70.25	30.65	71.25	31.11
SR-IEM	50.35	17.06	70.03	30.16	70.45	30.11
HG-GNN	49.88	16.29	69.31	29.02	70.37	30.86
IGT	40.13	17.94	71.82	31.35	72.13	31.93
SkipGT (Ours)	55.72	20.69	76.78	34.59	77.32	35.29
Improv. (%)	1.8	5.1	5.73	6.46	5.77	5.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Celik, E.; Ilhan Omurca, S. Skip-Gram and Transformer Model for Session-Based Recommendation. Appl. Sci. 2024, 14, 6353. https://doi.org/10.3390/app14146353

AMA Style

Celik E, Ilhan Omurca S. Skip-Gram and Transformer Model for Session-Based Recommendation. Applied Sciences. 2024; 14(14):6353. https://doi.org/10.3390/app14146353

Chicago/Turabian Style

Celik, Enes, and Sevinc Ilhan Omurca. 2024. "Skip-Gram and Transformer Model for Session-Based Recommendation" Applied Sciences 14, no. 14: 6353. https://doi.org/10.3390/app14146353

APA Style

Celik, E., & Ilhan Omurca, S. (2024). Skip-Gram and Transformer Model for Session-Based Recommendation. Applied Sciences, 14(14), 6353. https://doi.org/10.3390/app14146353

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Skip-Gram and Transformer Model for Session-Based Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Conventional Session-Based Recommendation Methods

2.2. RNN-Based Methods

2.3. SAN-Based Methods

2.4. GNN-Based Methods

3. Materials and Methods

3.1. Problem Definition

3.2. Overview of the Proposed Model

3.3. Skip-Gram Layer

3.4. Transformer Layer

3.5. Model Optimization

4. Experiments

4.1. Dataset

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Baseline Methods

4.5. Results and Discussion

4.6. Ablation Study

4.7. Sensitivity Analysis of the Hyperparameters

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI