Next Article in Journal
Ensemble Techniques for Robust Fake News Detection: Integrating Transformers, Natural Language Processing, and Machine Learning
Previous Article in Journal
Health Community 4.0: An Innovative Multidisciplinary Solution for Tailored Healthcare Assistance Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Contrastive Learning-Based Personalized Tag Recommendation

1
School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212000, China
2
College of Tongda, Nanjing University of Posts and Telecommunication, Yangzhou 225127, China
3
School of Computer Science, Hubei University of Technology, Wuhan 430068, China
4
Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(18), 6061; https://doi.org/10.3390/s24186061
Submission received: 10 July 2024 / Revised: 13 September 2024 / Accepted: 14 September 2024 / Published: 19 September 2024
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Personalized tag recommendation algorithms generate personalized tag lists for users by learning the tagging preferences of users. Traditional personalized tag recommendation systems are limited by the problem of data sparsity, making the personalized tag recommendation models unable to accurately learn the embeddings of users, items, and tags. To address this issue, we propose a contrastive learning-based personalized tag recommendation algorithm, namely CLPTR. Specifically, CLPTR generates augmented views of user–tag and item–tag interaction graphs by injecting noises into implicit feature representations rather than dropping nodes and edges. Hence, CLPTR is able to greatly preserve the underlying semantics of the original user–tag or the item–tag interaction graphs and avoid destroying their structural information. In addition, we integrate the contrastive learning module into a graph neural network-based personalized tag recommendation model, which enables the model to extract self-supervised signals from user–tag and item–tag interaction graphs. We conduct extensive experiments on real-world datasets, and the experimental results demonstrate the state-of-the-art performance of our proposed CLPTR compared with traditional personalized tag recommendation models.

1. Introduction

As an important subfield of recommendation systems [1,2,3], personalized tag recommendation (i.e., PTR) systems [4,5,6,7,8], has become increasingly popular in both academia and industry. By modeling historical behaviors among entities, the PTR system generates a personalized tag list for each user. These tags are very useful for users to manage and retrieve items. Classic PTR algorithms include HOSVD [4], RTF [5], and PITF [6]. Although classic PTR algorithms model the third-order interaction relationships among entities within a unified framework, they ignore high-order collaborative signals among users, items, and tags, which is harmful to the expressive capacity of PTR models. Recently, the deep learning technique has been widely used in computer vision [9,10,11] and natural language processing [12,13,14]. Meanwhile, some researchers [15,16,17] utilize Graph Convolutional Neural Networks (i.e., GCNs [18]) to capture high-order collaborative signals among entities and propose some GCN-based PTR models. Essentially, these methods capture the similarities among high-order entities through GCN, which improves the performance of PTR models. Although these works have shown that integrating GCN into PTR models is a promising approach, GCN-based PTR models are still limited by data sparsity.
For recommendation systems, the contrastive learning technique [19,20,21] is an effective solution to alleviate the problem of data sparsity. Most contrastive learning-based item recommendation models adopt the principle of InfoMax [20,22,23], which maximizes the consistency between positive pairs and pulls them closer while minimizing the consistency between negative pairs and pushing them away. However, they usually adopt random augmentation strategies (i.e., drop node, drop edge) to obtain two augmented views, which is unable to guarantee that the two augmentations are positively correlated. As shown in Figure 1, benzoic acid is a common food preservative. If we drop a carbon atom from benzoic acid by dropping operation, benzoic acid will turn into benzene. However, benzene is a highly toxic chemical substance with completely different chemical properties compared to those of benzoid acid. In addition, if we break the chemical bond between oxygen and carbon atoms in benzoic acid through dropping edge operation, benzoic acid will transform into benzaldehide. Unlike benzoid acid, benzaldehide is widely utilized in food seasoning. Therefore, adopting random augmentation strategies to obtain augmented views will change the structural information of the original graphs.
For personalized item recommendation, random augmentation strategies would change the underlying semantics of the original user–item bipartite graph. Moreover, random augmentation strategies may also introduce false negative samples. For instance, assume user u 1 has purchased items I 1 = { C h a i r , V R , M u g } and user u 2 has purchased items I 2 = { V R , M u g , T o y } . According to the historical records of users u 1 and u 2 , the original graph is presented in Figure 2. By utilizing dropping edge operation, we can obtain the first augmented view, i.e., G r a p h u 1 and G r a p h u 2 , and the second augmented view, i.e., G r a p h u 1 and G r a p h u 2 . According to the principle of contrastive learning, u 1 and u 1 should be a positive pair. Meanwhile, u 1 and u 2 should be a negative pair. However, u 1 and u 2 share common preferences since they visit the same set of items. Hence, u 1 and u 2 is a false negative pair. In general, the random augmentation may introduce false negative samples, resulting in the recommendation models unable to accurately distinguish positive and negative pairs.
To tackle the above issues, we propose a contrastive learning-based personalized tag recommendation model, namely CLPTR. Specifically, we inject noises into the embeddings of users, items, user-specific tags, and item-specific tags to generate augmented views. In this way, we preserve the underlying semantic structures of original user–tag and item–tag interaction graphs and avoid the problem of false negatives caused by inappropriate augmentation strategies. In addition, we integrate the contrastive learning module into PTR and learn the embeddings of entities by maximizing the consistency between augmented views. We summarizes the main contributions of this paper as follows:
  • We utilize a noise augmentation strategy to generate augmented views of user–tag and item–tag interaction graphs, which effectively guarantees that the underlying semantics of original interaction graphs remain unchanged and avoids the problem of false negatives.
  • We integrate the contrastive learning module into PTR, which is able to effectively alleviate the problem of data sparsity.
  • We conduct extensive experiments on real-world datasets, and the experimental results demonstrate the superior performance of our proposed CLPTR compared with traditional PTR models.

2. Related Work

2.1. Personalized Tag Recommendation Algorithms

Classic PTR algorithms include HOSVD [4], RTF [5] and PITF [6]. For example, HOSVD [4] utilizes a high-order singular value decomposition technique to factorize the third-order tensor of the user–item tags and captures the latent semantic associations among entities. However, HOSVD focuses on the user ratings for tags and ignores the rankings between tags. To tackle this issue, Rendle et al. [5] proposed ranking with tensor factorization. But HOSVD and RTF are unable to scale to large datasets due to their high computational complexity. In order to reduce the computational cost, Rendle et al. [6] extended the BPR criteria to the PTR and proposed PITF. In addition, Fang et al. [7] proposed NLTF, which utilizes Gaussian kernels to enhance the capacity of modeling the complex relations among entities. Although the above works model underlying semantic relationships among entities by tensor decomposition, they only consider first-order collaborative signals and ignore the higher-order collaborative signals. Recently, some researchers utilized deep learning techniques to boost the performance of PTR models. For instance, Yuan et al. [8] proposed ABNT. ABNT employed multi-layer perceptron to capture the nonlinear relationships among entities. In addition, Chen et al. [16] proposed GNN-PTR, which integrates GCN into PTR models. To reduce the training difficulty of the PTR model, Yu et al. [17] proposed LNGTR, which is a lightweight variant of GNN-PTR. However, GCN-based PTR models still suffer from data sparsity.

2.2. Contrastive Learning-Based Recommendation Models

Contrastive learning is able to extract self-supervised signals from raw data, effectively reducing the problem of data sparsity in recommendation systems. For personalized item recommendations, Wu et al. [19] proposed self-supervised graph learning for recommendation, namely SGL, which integrates the contrastive learning module into a personalized item recommendation model. In addition, SGL utilizes a random augmentation strategy to generate augmented views and employs InfoNCE to learn the embeddings of nodes of the user–item bipartite graph. However, it is necessary for SGL to reconstruct the adjacency matrix in each training epoch, which is time consuming. To address this issue, Yu et al. [21] proposed SimGCL. SimGCL utilizes the noise augmentation strategy to improve the efficiency of model training. In order to reduce the popularity bias, Liu et al. [24] proposed a popularity-aware debiased contrastive loss for collaborative filtering. For sequential recommendation, Xie et al. [25] proposed contrastive learning for sequential recommendation, namely CL4SRec, which integrates the contrastive learning module into the transformer-based sequential recommendation model. Moreover, CoSeRec [26] utilizes substitute and insert operations to generate robust augmented sequences. Chen et al. [27] proposed intent contrastive learning for sequential recommendation, employing clustering methods to learn the potential intentions of users in item sequences, and injecting the user intentions into contrastive learning-based sequential recommendation. For tag-enhanced item recommendation, Wu et al. [28] proposed intent-aware multi-source contrastive alignment for item recommendation, namely IMCAT. IMCAT utilizes a user intent-aware self-supervised pairing process to make the pairing process more fine-grained and avoid embedding collapse. Moreover, Xu et al. [29] proposed tag-aware graph contrastive learning, namely TAGCL. TAGCL jointly optimizes TAGCL through negative tag loss and TransT regularization, which helps the model learn high-quality features. Although most existing works enhance the performance of recommendation models to a certain extent, they destroy the underlying semantic relationship of user–item interaction information and introduce false negative samples, since they utilize random augmentation strategies to obtain augmented views. Moreover, the contrastive learning technique has not been explored in the PTR systems. In this paper, we integrate the contrastive learning module into the PTR model and alleviate the problem of data sparsity.

3. Contrastive Learning-Based Personalized Tag Recommendation

The contrastive learning technique is a promising approach to reduce the problem of data sparsity and is investigated by other recommendations, such as item recommendations and sequential recommendations. For item recommendation, SGL [19] and SimGCL [21] utilize contrastive learning to improve the performance of models. For sequential recommendation, representative contrastive learning-enhanced methods include CL4SRec [25] and CoSeRec [26]. These above classical methods demonstrate that the contrastive learning technique is a promising approach to improve the performance of recommendation models.
In this section, we introduce the proposed contrastive learning-based personalized tag recommendation model, which is called CLPTR. CLPTR mainly consists of two important components, i.e., a graph convolution module and a contrastive learning module. We utilize the graph convolution module to capture the collaborative signals and inject the collaborative signals into the embeddings of entities. In addition, the contrastive learning module generates augmented views of user–tag and item–tag interaction graphs by injecting noises into implicit feature representations rather than dropping nodes and edges. Moreover, the contrastive learning module utilizes InfoNCE to learn the embeddings of entities by pulling the embeddings of positive pairs close and pushing the embeddings of negative pairs away. The framework of the contrastive learning-based personalized tag recommendation model is illustrated in Figure 3.

3.1. Problem Description

In this article, we focus on PTR tasks. The PTR systems typically include three entities: the set of users U = { u 1 , u 2 , , u | U | } , the set of items I = { i 1 , i 2 , , i | I | } and the set of tags T = { t 1 , t 2 , , t | T | } . We utilize S U × I × T to denote the users’ historical tagging behaviors. A ternary ( u , i , t ) S indicates that the item i is annotated with the tag t by user u. The main purpose of PTR systems is to compute the probability of the user u annotating tags to item i and recommend the top-N tags with the highest probabilities to user u.

3.2. Graph Convolution Module

The graph convolution module is mainly utilized to capture high-order collaborative signals among entities and enrich the semantic information of the representations of entities. Each component of the graph convolution module is described in the following sections.

3.2.1. Embedding Layer

Given a triplet ( u , i , j ) , we map entities (i.e., users, items, and tags) to a low-dimensional embedding space based on their ID, formally,
e u ( 0 ) = l o o k u p ( U , u ) , e i ( 0 ) = l o o k u p ( I , i ) , e u t ( 0 ) = l o o k u p ( T U , t ) , e i t ( 0 ) = l o o k u p ( T I , t ) ,
where the l o o k u p ( · ) operation retrieves the latent feature vector from the embedding matrix based on the ID of the entity. U R | U | × K is the latent user feature matrix, I R | I | × K is the latent item feature matrix, T U R | T | × K is the latent user-specific tag feature matrix and T I R | T | × K is the latent item-specific tag feature matrix. K denotes the embedding size.

3.2.2. Embedding Propagation Layer

PTR systems include user–item, user–tag and item–tag interaction graphs. However, the user–item interaction will vanish for predicting rankings for Bayesian Personalized Ranking optimization. Hence, similar to PITF, GNN-PTR, and LNGTR, we only take the user–tag and item–tag interaction graphs into account. The main purpose of the embedding propagation layer is to utilize the message-passing mechanism to capture high-order collaborative signals among entities on the interaction graphs. Taking the user–tag interaction graph as an example, we employ the lightweight GCN model to capture the high-order collaborative signals. At the l t h layer, the embeddings of user e u ( l ) and user-specific tag e u t ( l ) are formulated as follows
e u ( l ) = A g g u ( { e u t ( l 1 ) , t N u } ) , e u t ( l ) = A g g u t ( { e u t ( l 1 ) , u N t } ) ,
where e u ( l 1 ) and e u t ( l 1 ) indicate the embeddings of user u and user-specific tag t at the ( l 1 ) t h embedding propagation layer, respectively. A g g u ( · ) and A g g u t ( · ) are the user aggregation function and user-specific tag aggregation function, respectively. N u is the set of tags that connect to user u in the user–tag interaction graph, while N t is the set of users that interacts with tag t. Similar to LightGCN [30], the weight sum strategy is utilized to compute the user and user-specific tag aggregation functions, formally,
e u ( l ) = t N u 1 | N u | | N t | e u t ( l 1 ) , e u t ( l ) = u N t 1 | N u | | N t | e u ( l 1 ) .
According to Equation (3), e u ( 1 ) and e u t ( 1 ) capture the one-hop connectivity information by aggregating direct neighbor information. By recursively stacking more embedding propagation layers, we are able to inject the high-order collaborative signals into the embeddings of users and tags. Similarly, on item–tag interactions graph, we also recursively stack the embedding propagation layer to capture the high-order collaborative signals between items and tags.

3.2.3. Prediction Layer

By stacking L embedding propagation layers, we obtain the sets of embeddings of entities:
{ e u ( 1 ) , e u ( 2 ) , , e u ( L ) } , { e i ( 1 ) , e i ( 2 ) , , e i ( L ) } , { e u t ( 1 ) , e u t ( 2 ) , , e u t ( L ) } , { e i t ( 1 ) , e i t ( 2 ) , , e i t ( L ) } ,
Each element of one set describes the various aspects of entity characteristics. We obtain the final representations of each entity by combining all corresponding elements to simultaneously encode both low-order and high-order collaborative signals, formally,
e u * = l = 0 L α l · e u ( l ) , e i * = l = 0 L α l · e i ( l ) , e u t * = l = 0 L α l · e u t ( l ) , e i t * = l = 0 L α l · e i t ( l ) ,
where α l = 1 L + 1 is the weight coefficient for the embedding of entities obtained at the l t h layer. In this way, we enrich the semantics of entities involved in three-order interaction relationships. Finally, given the final representations of user u, item i, and tag t, we predict the score of user u annotating item i with the tag t as follows,
p ^ u , i , t = e u * e u t * + e i * e i t * ,
where ⊙ denotes the dot product operation.

3.3. Contrastive Learning Module

The contrastive learning module firstly generates augmented views of user–tag and item–tag interaction graphs by injecting noises into implicit feature representations. Then, the contrastive learning module utilizes InfoNCE to learn the embeddings of entities by pulling the embeddings of positive pairs close and pushing the embeddings of negative pairs away.

3.3.1. Noise Augmentation

In order not to destroy the original structural relationships among interaction graphs and introduce false negatives, CLPTR injects noises into the implicit feature representations to obtain augmented views. On the user–tag interaction graph, the embeddings of user and user-specific tag via two noise augmentation operations are as follows:
e u * = l = 0 L α l · [ e u ( l ) + ( Δ u ( l ) ) ] , e u * = l = 0 L α l · [ e u ( l ) + ( Δ u ( l ) ) ] , e u t * = l = 0 L α l · [ e u t ( l ) + ( Δ u t ( l ) ) ] , e u t * = l = 0 L α l · [ e u t ( l ) + ( Δ u t ( l ) ) ] ,
where ( Δ u ( l ) ) , ( Δ u ( l ) ) , ( Δ u t ( l ) ) , and ( Δ u t ( l ) ) are noise vectors, and | | Δ ( l ) | | 2 = ε , Δ ( l ) = Δ ¯ ( l ) s i g n ( e * ( l ) ) . ε controls the magnitude of noise Δ ( l ) , and Δ ( l ) is regarded as a point on a hypersphere with radius ε . Δ ¯ ( l ) follows a uniform distribution, i.e., Δ ¯ ( l ) U ( 0 , 1 ) . In addition, for the item–tag interaction graph, we utilize similar operations to generate the embeddings of items and items-specific tags, i.e., e i * , e i * , e i t * , e i t * . In fact, the process of noise augmentation is to rotate small angles of the original feature vectors. This augmentation scheme is able to effectively preserve the original semantics of entities and keep the structure of user–tag and item–tag interaction graphs unchanged.

3.3.2. Contrastive Loss

We utilize InfoNCE to calculate the contrastive loss of CLPTR. Specifically, we treat two augmented views derived from the same node as positive pairs and other nodes as negative pairs. InfoNCE maximizes the similarities among the positive pairs and minimizes the similarities among negative pairs. For the user–tag interaction graph, we calculate the contrastive loss of user side L c l U and the contrastive loss of tag side L c l U T as follows,
L c l U = u U log exp ( s i m ( e u * , e u * ) / τ ) v U exp ( s i m ( e u * , e v * ) / τ ) , L c l U T = t T log exp ( s i m ( e u t * , e u t * ) / τ ) v T exp ( s i m ( e u t * , e u v * ) / τ ) ,
where s i m ( · , · ) is the cosine similarity. τ is the temperature parameter. Moreover, for the item–tag interaction graph, we also compute the the contrastive loss of item side L c l I and the contrastive loss of tag side L c l I T , formally,
L c l I = i I log exp ( s i m ( e i * , e i * ) / τ ) v I exp ( s i m ( e i * , e v * ) / τ ) , L c l I T = t T log exp ( s i m ( e i t * , e i t * ) / τ ) v T exp ( s i m ( e i t * , e i v * ) / τ ) ,
Hence, the contrastive loss of CLPTR is defined as
L c l = L c l U + L c l U T + L c l I + L c l I T .

3.4. Objective Function

If the item i is annotated with the tag t by user u, we assume that user u prefers tag t over other tags t ( t T t ) , i.e., the partial relationship t > u , i t holds. We employ the BPR criterion [31] to learn model parameters. Supposing that all users and all partial relationships are independent, CLPTR optimizes the parameters of model by minimizing L r e c :
L r e c = ln ( u , i , t ) S t T t P ( t > u , i t | Θ ) P ( Θ ) = ( u , i , t ) S t T t ln σ ( p ^ u , i , t p ^ u , i , t ) + λ Θ | | Θ | | F 2 ,
where P ( t > u , i t | Θ ) denotes the probability that user u prefers tag t over tag t when annotating item i. Θ = { U , I , T U , T I } is the set of model parameters. Similar to traditional contrastive learning-based recommendation models, we jointly optimize the recommendation task L r e c and the contrastive learning task L c l via the multi-task training strategy. Hence, the objective function of CLPTR is formalized as
L = L r e c + λ c l L c l ,
where λ c l balances the contributions of contrastive loss for our proposed method.

4. Experiment

We conduct several groups of experiments on real-world datasets to evaluate the effectiveness of CLPTR via comparing against other state-of-the-art PTR models.

4.1. Datasets

We selected Last.fm and ML-10M (https://grouplens.org/datasets/hetrec-2011/, accessed on 1 September 2024) for experimental analysis. Specifically, the ML-10M dataset is an extension of the Movielens10M dataset, which was published by the GroupLeans research group. This dataset contains rating and tagging information assigned to movies by users. In addition, the Last.fm dataset is collected from the Last.fm online music system and contains social network, tagging and music listening information from the set of around 2 K users. Each user has a list of their most listened artists, tag assignments, i.e., tuples (user, artist, tag), and friend relations within the social network. Moreover, we remove users, items and tags that appear less than p times for all datasets. In our experiment, we set p to 5 or 10 and obtained four datasets, i.e., lastfm-5, lastfm-10, ml10m-5, and ml10m-10. Table 1 summarizes the statistics of all datasets.

4.2. Evaluation Metrics and Experimental Settings

Two widely used rank-oriented metrics, i.e., P r e c i s i o n @ N and R e c a l l @ N , are chosen to evaluate all compared methods since PTR essentially is a ranking problem. For both metrics, we set N to 3, 5, and 10, respectively. In addition, we choose the following baselines for comparison:
  • NGCF [32]: NGCF integrates GCN into personalized item recommendation models. In our experiment, we utilize user–tag interaction information as the input.
  • PITF [6]: PITF models the three-order interactions among entities and utilizes BPR criteria to optimize the model parameters.
  • NLTF [7]: NLTF utilizes Gaussian kernel to enhance the capacity of modeling the complex relations among entities.
  • ABNT [8]: ABNT models the nonlinearity relationships among entities through multi-layer perceptron.
  • GNN-PTR [16]: GNN-PTR integrates GCN into the PTR model and utilizes GCN to capture high-order collaborative signals among entities.
  • LNGTR [17]: LNGTR utilizes the lightweight GCN to alleviate the training difficulty for GNN-PTR.
  • GHPTR [33]: GHPTR explicitly injects higher-order relevance into entity representation through the message propagation and aggregation mechanism of GNN and leverages hyperbolic embedding to alleviate the problem of embedding distortion.
For all models, the learning rate η is selected from {0.001, 0.005, 0.01, 0.05, 0.1}, the embedding size K is set to 64, and the batch size is 1024. In addition, the regularization coefficient λ Θ is tuned within {0, 0.00001, 0.0001, 0.001, 0.01}. For NGCF, GNN-PTR, LNGTR, CLPTR and GHPTR, the number of embedding propagation layers L is chosen from {1, 2, 3}. For GHPTR, the curvature is set to 1, and the target point is set to the origin of hyperbolic space. For CLPTR, the contrastive coefficient λ c l and noise coefficient ε both vary in {0.001, 0.01, 0.1, 0.2, 0.5, 1, 2, 5}. In addition, the Adam optimizer is utilized to optimize all model parameters. Each dataset is split into two parts, i.e., the training set S t r a i n and the test set S t e s t , with a ratio of 8:2.

4.3. Performance Analysis

Table 2 presents the results of performance comparison and the bold values indicate that the best performance of all comparison methods.
We have the following main observations:
  • ABNT performs the worst on all datasets. One possible reason is that ABNT employs the multi-layer perceptron to capture the nonlinear relationships among entities, which introduces a large number of trainable parameters. However, with sparse interactions, ABNT is unable to accurately learn the embeddings of entities.
  • NGCF performs better than ABNT. Although NGCF only models user–tag interaction information, NGCF utilizes GCN to extract high-order collaborative signals from the interaction behaviors, which enriches the embeddings of entities. This observation indicates that capturing high-order collaborative signals among entities is beneficial to PTR models.
  • Compared to NGCF, NLTF achieves better performance. In fact, NLTF captures the third-order interaction among three entities rather than the second-order interaction.
  • PITF generally outperforms NLTF. This indicates that explicitly modeling the pairwise interaction among entities is a promising approach for PTR systems.
  • GNN-PTR is superior to PITF, because GNN-PTR utilizes the graph convolution module to effectively capture high-order collaborative signals among entities.
  • The performance of LNGTR is better than that of GNN-PTR. This observation confirms that the cumbersome GCN may hinder the learning process of model parameters.
  • Compared to LNGTR, GHPTR is better. One possible reason is that GHPTR utilizes hyperbolic distances to measure similarities between entities and leverages hyperbolic embedding to alleviate the problem of embedding distortion.
  • On all datasets, our proposed PTR model achieves the best performance compared against all baselines. For instance, in terms of P r e @ 3 and R e c @ 3 , CLPTR outperforms LNGTR by 11.6% and 16.15% on the ml10m-10. On the ml10m-5, the improvements over LNGTR are 7.5% and 7.7%, respectively. This observation demonstrates that integrating the contrastive learning module into the PTR model is helpful to accurately learn the embeddings of entities via capturing the invariances among the augmented views.

4.4. Ablation Analysis

In order to evaluate the contribution of each component in CLPTR, we conduct an ablation study. Our proposed method contains two important components, i.e., a graph convolution module and a contrastive learning module. If we remove the contrastive learning module, we term our proposed CLPTR as CLPTR-cl. In addition, if we drop both the contrastive learning module and graph convolution module, we term our proposed CLPTR as CLPTR-gcn. In fact, CLPTR-cl is equivalent to LNGTR and CLPTR-gcn is the same as PITF. The performance comparison analysis among CLPTR, CLPTR-cl, and CLPTR-gcn is presented in Table 3. In Table 3, the best performance is bolded.
From Table 3, we observe that each component is beneficial to the performance of CLPTR. Specifically, compared to the graph convolution module, the contrastive learning module has greater contributions to the performance of the PTR model. One possible reason is that the contrastive learning module helps the PTR model accurately learn the embeddings of entities via capturing the invariances among the augmented views.

4.5. Impact of Noise Combination

In this section, we discuss the impact of different noise combinations that generate augmented views on the performance of CLPTR. Specifically, CLPTR u u / CLPTR g g indicates that two augmentation operations utilize the same uniform noise/Gaussian noise to generate two augmented views. Moreover, CLPTR u g represents that one augmented view is generated by injecting uniform noise and the other is created by adding Gaussian noise. Uniform noise and Gaussian noise are randomly sampled from uniform distribution U ( 0 , 1 ) and normal distribution N ( 0 , 1 ) , respectively.
As shown in Table 4, the performance of CLPTR u u is generally better than that of CLPTR u g , while CLPTR g g performs the worst. This observation indicates that employing Gaussian noise to generate augmented views may harm the performance of PTR models. One possible reason is that the directions of the augmented vector of entities may be significantly changed since the noises sampled from Gaussian distribution N ( 0 , 1 ) could be negative numbers. Hence, the vectors augmented with negative elements may destroy the underlying semantics of the original view.

4.6. Parameter Sensitivity Analysis

4.6.1. The Impact of τ

In this section, a group of experiments is conducted to analyze the sensitivity of τ . Figure 4 plots the performance changes of CLPTR with different values of τ . As shown in Figure 4, the performance of CLPTR is sensitive to the values of τ . In addition, when the value of τ is small, CLPTR shows competitive performance. For example, on ml10m-5 and lastfm-5, CLPTR achieves the best performance when τ = 0.03 . One possible reason is that the small value of τ makes CLPTR pay more attention to the differences among entities, and capturing individual differences is beneficial for the performance of CLPTR.

4.6.2. The Impact of λ c l

We investigate how λ c l affects the performance of our proposed model. The contrastive learning coefficient λ c l is tuned within {0.001, 0.01, 0.1, 0.2, 0.5, 1, 2, 5} and other parameters remain unchanged. The results are presented in Figure 5. As shown in Figure 5, the performance of CLPTR gradually reaches its peak when λ c l = 0.01 on the ml10m-10, 0.5 on the ml10m-5, 0.1 on lastfm-10, and 0.1 on lastfm-5. Then, the performance of CLPTR begins to decline. This indicates that we should carefully tune the parameter λ c l for different datasets. In addition, a large value of λ c l makes CLPTR focus on the contrastive learning module and ignore the interaction patterns among entities, leading to suboptimal performance.

4.6.3. The Impact of ε

For the sensitivity of parameter ε , we change ε within {0.001, 0.01, 0.1, 0.2, 0.5, 1, 2, 5}, and other parameters remain unchanged. From Figure 6, when ε 1 , the performance of CLPTR continuously improves as we increase the value of ε on lastfm-10 and lastfm-5. Moreover, on ml10m-10 and ml10m-5, CLPTR achieves the best performance when ε is 0.01 and 0.5, respectively. When ε is large, the performance of CLPTR obviously degrades. This is because the similarities among the augmented vectors are completely dominated by noise vectors, making it impossible for CLPTR to extract the correct self-supervised signals.

5. Conclusions

In this paper, we propose the contrastive learning-based personalized tag recommendation algorithm, namely CLPTR. CLPTR generates augmented views by injecting noises into implicit feature representations, which not only greatly preserves the underlying semantics of the original interaction graphs but also avoids introducing the false negatives. In addition, we utilize the contrastive learning module to extract the self-supervision signals from user–tag and item–tag interaction graphs, resulting in accurately learning the representations of entities. We conduct extensive experiments on real-world datasets, and the experimental results demonstrate the superior performance of our proposed CLPTR compared with traditional personalized tag recommendation models.

Author Contributions

A.Z.: Conceptualization, Software, Methodology, Validation, Roles/Writing—original draft, Writing review and editing. Y.Y.: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Validation, Roles/Writing—original draft, Writing—review and editing. S.L.: Investigation, Formal analysis, Methodology. L.Z.: Supervision, Validation, Roles/Writing—original draft, Writing review and editing. R.G.: Visualization, Writing review and editing. S.G.: Supervision, Validation, Writing review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Future Network Scientific Research Fund Project (FNSRFP-2021-YB-54), Tongda College of Nanjing University of Posts and Telecommunications (XK203XZ21001), Qing Lan Project of Jiangsu Province, Chunhui Plan Collaborative Research Project, Ministry of Education, China (HZKY20220350), Jiangsu Province Innovation and Entrepreneurship Project (KYCX24_4128) and National Natural Science Foundation of China project (62376109).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data utilized in this paper are available at https://grouplens.org/datasets/hetrec-2011/, accessed on 1 September 2024. The prepossessed data and codes are available on GitHub: https://github.com/zar123123/CLPTR, accessed on 1 September 2024.

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

References

  1. Adomavicius, G.; Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
  2. Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A Survey on Knowledge Graph-Based Recommender Systems. IEEE Trans. Knowl. Data Eng. 2022, 34, 3549–3568. [Google Scholar] [CrossRef]
  3. Wu, L.; He, X.; Wang, X.; Zhang, K.; Wang, M. A Survey on Accuracy-Oriented Neural Recommendation: From Collaborative Filtering to Information-Rich Recommendation. IEEE Trans. Knowl. Data Eng. 2023, 35, 4425–4445. [Google Scholar] [CrossRef]
  4. Symeonidis, P.; Nanopoulos, A.; Manolopoulos, Y. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys 2008), Lausanne, Switzerland, 23–25 October 2008; pp. 43–50. [Google Scholar]
  5. Rendle, S.; Balby Marinho, L.; Nanopoulos, A.; Schmidt-Thieme, L. Learning optimal ranking with tensor factorization for tag recommendation. In Proceedings of the SIGKDD, Paris, France, 28 July 2009; pp. 727–736. [Google Scholar]
  6. Rendle, S.; Schmidt-Thieme, L. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM), New York, NY, USA, 3–6 February 2010; pp. 81–90. [Google Scholar]
  7. Fang, X.; Pan, R.; Cao, G.; He, X.; Dai, W. Personalized tag recommendation through nonlinear tensor factorization using gaussian kernel. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, USA, 25–30 January 2015; pp. 439–445. [Google Scholar]
  8. Yuan, J.; Jin, Y.; Liu, W.; Wang, X. Attention-Based Neural Tag Recommendation. In Proceedings of the DASFAA, Chiang Mai, Thailand, 22–25 April 2019; pp. 350–365. [Google Scholar]
  9. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
  10. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  11. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar]
  12. Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
  13. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
  14. Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6894–6910. [Google Scholar]
  15. Wu, F.; de Souza, A.H., Jr.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K.Q. Simplifying Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6861–6871. [Google Scholar]
  16. Chen, X.; Yu, Y.; Jiang, F.; Zhang, L.; Gao, R.; Gao, H. Graph Neural Networks Boosted Personalized Tag Recommendation Algorithm. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  17. Yu, Y.; Chen, X.; Zhang, L.; Gao, R.; Gao, H. Neural Graph for Personalized Tag Recommendation. IEEE Intell. Syst. 2022, 37, 51–59. [Google Scholar] [CrossRef]
  18. Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
  19. Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised Graph Learning for Recommendation. In Proceedings of the SIGIR, Virtual Event, 11–15 July 2021; pp. 726–735. [Google Scholar]
  20. Jing, M.; Zhu, Y.; Zang, T.; Wang, K. Contrastive Self-supervised Learning in Recommender Systems: A Survey. ACM Trans. Inf. Syst. 2023, 42, 1–39. [Google Scholar] [CrossRef]
  21. Yu, J.; Yin, H.; Xia, X.; Chen, T.; Cui, L.; Nguyen, Q.V.H. Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation. In Proceedings of the SIGIR, Madrid, Spain, 11–15 July 2022; pp. 1294–1303. [Google Scholar]
  22. Wang, F.; Liu, H. Understanding the Behaviour of Contrastive Loss. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2495–2504. [Google Scholar]
  23. Zhang, O.; Wu, M.; Bayrooti, J.; Goodman, N. Temperature as Uncertainty in Contrastive Learning. arXiv 2021, arXiv:2110.04403. [Google Scholar]
  24. Liu, Z.; Li, H.; Chen, G.; Ouyang, Y.; Rong, W.; Xiong, Z. PopDCL: Popularity-aware Debiased Contrastive Loss for Collaborative Filtering. In Proceedings of the Conference on Information and Knowledge Management (CIKM), Birmingham, UK, 25 October 2023; pp. 1482–1492. [Google Scholar]
  25. Xie, X.; Sun, F.; Liu, Z.; Wu, S.; Gao, J.; Zhang, J.; Ding, B.; Cui, B. Contrastive Learning for Sequential Recommendation. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 1259–1273. [Google Scholar]
  26. Liu, Z.; Chen, Y.; Li, J.; Yu, P.S.; McAuley, J.; Xiong, C. Contrastive Self-supervised Sequential Recommendation with Robust Augmentation. arXiv 2021, arXiv:2108.06479. [Google Scholar]
  27. Chen, Y.; Liu, Z.; Li, J.; McAuley, J.; Xiong, C. Intent Contrastive Learning for Sequential Recommendation. In Proceedings of the ACM Web Conference 2022 (WWW), Lyon, France, 25–29 April 2022; pp. 2172–2182. [Google Scholar]
  28. Wu, H.; Zhang, Y.; Ma, C.; Guo, W.; Tang, R.; Liu, X.; Coates, M. Intent-aware Multi-source Contrastive Alignment for Tag-enhanced Recommendation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 1112–1125. [Google Scholar]
  29. Xu, C.; Zhang, Y.; Chen, H.; Dong, L.; Wang, W. A fairness-aware graph contrastive learning recommender framework for social tagging systems. Inf. Sci. 2023, 640, 119064. [Google Scholar] [CrossRef]
  30. He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 639–648. [Google Scholar]
  31. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
  32. Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
  33. Zhang, C.; Zhang, A.; Zhang, L.; Yu, Y.; Zhao, W.; Geng, H. A Graph Neural Networks-Based Learning Framework with Hyperbolic Embedding for Personalized Tag Recommendation. IEEE Access 2024, 12, 339–350. [Google Scholar] [CrossRef]
Figure 1. A toy example of random augmentations for benzoic acid.
Figure 1. A toy example of random augmentations for benzoic acid.
Sensors 24 06061 g001
Figure 2. An example of random augmentations for user–item interaction graph.
Figure 2. An example of random augmentations for user–item interaction graph.
Sensors 24 06061 g002
Figure 3. The framework of the contrastive learning-based personalized tag recommendation model.
Figure 3. The framework of the contrastive learning-based personalized tag recommendation model.
Sensors 24 06061 g003
Figure 4. The impact of τ .
Figure 4. The impact of τ .
Sensors 24 06061 g004
Figure 5. The impact of λ c l .
Figure 5. The impact of λ c l .
Sensors 24 06061 g005
Figure 6. The impact of ε .
Figure 6. The impact of ε .
Sensors 24 06061 g006
Table 1. Statistics of all datasets used in our experimental evaluation.
Table 1. Statistics of all datasets used in our experimental evaluation.
Dataset # Users # Items # Tags Interactions
lastfm-5134869272132162,047
lastfm-1096638701204133,945
ml10m-59903247256661,688
ml10m-104691524101737,414
Table 2. Performance analysis.
Table 2. Performance analysis.
DatasetMetricNGCFPITFNLTFABNTGNN-PTRLNGTRGHPTRCLPTR
ml10m-10 P r e @ 3 0.12440.16990.14360.08960.19330.22670.25070.2530
P r e @ 5 0.08960.11730.11430.07590.13900.17480.17060.1833
P r e @ 10 0.05710.07440.07140.05010.08420.10640.09600.1106
R e c @ 3 0.31980.37700.33880.22100.46020.49900.57130.5795
R e c @ 5 0.38370.45230.43340.30170.54610.63190.63430.6653
R e c @ 10 0.46880.52050.53410.38580.63980.76980.68980.7775
ml10m-5 P r e @ 3 0.09500.13980.13230.08220.14550.17950.19150.1929
P r e @ 5 0.06830.10210.09720.06280.10550.13880.13750.1455
P r e @ 10 0.04380.06410.05960.04000.06720.08920.07970.0908
R e c @ 3 0.24630.32080.29740.20890.33310.38190.41580.4111
R e c @ 5 0.28200.39100.35600.25380.39650.48170.47990.4992
R e c @ 10 0.34950.46230.42700.30390.48520.60080.53910.6116
lastfm-10 P r e @ 3 0.17390.25130.24430.16050.26470.32400.33820.4244
P r e @ 5 0.14680.20880.20640.13670.21430.26520.26580.3371
P r e @ 10 0.11400.14580.12490.09430.14620.18330.17720.2168
R e c @ 3 0.21800.32040.28490.15790.34790.39490.43390.5250
R e c @ 5 0.28780.41580.40170.21900.45290.52080.53670.6587
R e c @ 10 0.42890.56540.55410.30340.58740.68300.61190.7998
lastfm-5 P r e @ 3 0.16790.21270.19490.15630.23240.27890.30430.3591
P r e @ 5 0.13950.17890.16780.13530.19130.23250.23900.2918
P r e @ 10 0.10230.12740.11910.10180.13270.15960.15470.1911
R e c @ 3 0.21910.25710.22750.15690.32440.34150.39140.4490
R e c @ 5 0.29070.34790.32390.21940.41700.45110.47760.5736
R e c @ 10 0.40130.48140.45230.32980.54540.58610.56730.6990
Table 3. Ablation analysis.
Table 3. Ablation analysis.
DatasetMetricCLPTR-gcnCLPTR-clCLPTR
ml10m-10 P r e @ 3 0.16990.22670.2530
P r e @ 5 0.11730.17480.1833
R e c @ 3 0.37700.49900.5795
R e c @ 5 0.45230.63190.6653
ml10m-5 P r e @ 3 0.13980.17950.1929
P r e @ 5 0.10210.13880.1455
R e c @ 3 0.32080.38190.4111
R e c @ 5 0.39100.48170.4992
lastfm-10 P r e @ 3 0.25130.32400.4244
P r e @ 5 0.20880.26520.3371
R e c @ 3 0.32040.39490.5250
R e c @ 5 0.41580.52080.6587
lastfm-5 P r e @ 3 0.21270.27890.3591
P r e @ 5 0.17890.23250.2918
R e c @ 3 0.25710.34150.4490
R e c @ 5 0.34790.45110.5736
Table 4. Performance comparison among different CLPTR variants.
Table 4. Performance comparison among different CLPTR variants.
DatasetMetric CLPTR uu CLPTR ug CLPTR gg
ml10m-10 P r e @ 3 0.25300.24310.2424
P r e @ 5 0.18330.17740.1770
R e c @ 3 0.57950.56780.5607
R e c @ 5 0.66530.66960.6687
ml10m-5 P r e @ 3 0.19290.13230.1283
P r e @ 5 0.14550.10850.1012
R e c @ 3 0.41110.26870.2620
R e c @ 5 0.49920.35970.3378
lastfm-10 P r e @ 3 0.42440.36470.3192
P r e @ 5 0.33710.29690.2631
R e c @ 3 0.52500.42770.3769
R e c @ 5 0.65870.56540.4981
lastfm-5 P r e @ 3 0.35910.32440.2915
P r e @ 5 0.29180.26900.2436
R e c @ 3 0.44900.40880.3709
R e c @ 5 0.57360.53350.4876
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, A.; Yu, Y.; Li, S.; Gao, R.; Zhang, L.; Gao, S. Contrastive Learning-Based Personalized Tag Recommendation. Sensors 2024, 24, 6061. https://doi.org/10.3390/s24186061

AMA Style

Zhang A, Yu Y, Li S, Gao R, Zhang L, Gao S. Contrastive Learning-Based Personalized Tag Recommendation. Sensors. 2024; 24(18):6061. https://doi.org/10.3390/s24186061

Chicago/Turabian Style

Zhang, Aoran, Yonghong Yu, Shenglong Li, Rong Gao, Li Zhang, and Shang Gao. 2024. "Contrastive Learning-Based Personalized Tag Recommendation" Sensors 24, no. 18: 6061. https://doi.org/10.3390/s24186061

APA Style

Zhang, A., Yu, Y., Li, S., Gao, R., Zhang, L., & Gao, S. (2024). Contrastive Learning-Based Personalized Tag Recommendation. Sensors, 24(18), 6061. https://doi.org/10.3390/s24186061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop