1. Introduction
Since the inception of the Internet, users and hardware devices have generated an enormous amount of data. The problem of “information overload” has inevitably emerged, and recommendation algorithms have to solve the problem of how to efficiently extract valuable information from these massive amounts of unorganized data. Therefore, there is no doubt that recommendation systems, as the engines of Internet development, provide great convenience and benefits to users and Internet companies in the “information overload” era. They have also seen success in e-commerce and social networking as well as other fields.
The most traditional and widely used recommendation algorithm is the collaborative filtering algorithm (abbreviated as CF), which is further divided into item-based collaborative filtering (abbreviated as itemCF) and user-based collaborative filtering (abbreviated as userCF) algorithm based on item similarity and user similarity, so as to calculate the ratings of unknown items as well as predict user preferences. For example, for a target item whose rating needs to be estimated, the userCF chooses to calculate the similarity between users, and then the unknown rating is predicted by averaging the (weighted) known ratings of the target item by similar users. On the other hand, the itemCF calculates the similarity between items first and then predicts the unknown rating by averaging the (weighted) known ratings of similar items by other users [
1,
2]. As shown in
Figure 1, according to user’s movie-watching history, user
and user
share preferences, so the userCF recommends movie 3 and movie 6, which user
likes, to user
. Meanwhile, the itemCF recommends movie 2, which is similar to movie 6 that user
liked before, to user
.
CF algorithms are all designed to recommend items to users based on the preferences of similar users or the user’s history, and the process is done by calculating the similarity between users or items. However, traditional CF algorithms suffer from the problem of data sparsity and inadequate exploration of mechanisms of how user preferences decay. In fact, several prior studies [
3,
4,
5] pointed out that preference decay and memory decay are very similar. Thus, this paper proposes an improved CF algorithm based on retroactive inhibition of preferences (abbreviated as RICF) to capture the evolution of user’s preferences. There are a number of papers in the field of recommendation systems that measure the memory forgetting process based directly on the decay of time [
6,
7,
8,
9]. However, according to the [
10,
11,
12] researches in psychology, memory attenuation mainly stems from memory inhibition. Therefore, RICF adopts the theory of retroactive inhibition, which measures the retention of the user’s preference on item
by calculating the strength of inhibition that item
suffered within the corresponding time period, and consequently affects the weight of the contribution that item
makes to the target item’s rating prediction.
For example, given samples as in
Table 1, to predict user
’s rating of movie 4 on 11/12/2018, traditional itemCF calculates the contribution of each similar movie directly (namely, rating × similarity), so user
’s rating of movie 1 has contributed 2.5 × 0.9 = 2.25 rating to the prediction of user
’s rating of movie 4. However, we need to adjust the user’s previous ratings to fit real scenarios because of the evolution of user’s preference. Specifically, on 11/12/2018, for the preference that user
had to movie 1, it suffered retroactive inhibition from 05/04/2018 to 11/12/2018, namely the inhibition from movie 3 rated by user
, which exists in a different preference cluster and has a higher rating. Movie 3 subsequently affects movie 1’s contribution to the prediction of movie 4’s rating (2.5 × 0.9 ×
). In this way, RICF measures the evolution of user’s preferences more interpretatively and accurately from the perspective of preference inhibition.
The contributions of this paper are described as follows,
This paper introduces the theory of retroactive inhibition into recommendation systems to more interpretably measure the decay of a user’s preference over time. Specifically, we modified the application of the forgetting mechanism of brain memory to the recommendation system by using retroactive inhibition instead of using forgetting over time directly, in order to calculate the change of user’s preference more accurately.
The proposed RICF algorithm not only takes into account the evolution of user preferences but also uses more powerful item embeddings by fusing user, item and rating information to alleviate the problem of data sparsity and improve the accuracy of rating prediction. In addition, the embeddings trained by the model (using rating-prediction as an optimization goal) can help to reduce the number of similar neighbors in the collaborative process.
Differing from previous related studies, this paper proposes to use a clustering of embeddings to obtain a preference points model. Meanwhile, RICF combines the Canopy and K-Means algorithms to overcome the problem that the clustering efficiency decreases with the increasing size of the dataset, as well as the feature dimension.
To show the practicability of the proposed algorithm, this paper applies real datasets with real timestamps, which are the live movie rating dataset collected from Twitter [
13] and the digital music dataset collected from Amazon [
14]. The experiment results show that RICF performs better and is more interpretable than the traditional itemCF as well as the state-of-the-art sequential algorithms that focus on the research of preference decay.
The remainder of this paper is organized as follows.
Section 2 summarizes the related work.
Section 3 provides some preliminaries and describes the proposed RICF algorithm.
Section 4 presents results from experiments conducted on two evaluation datasets.
Section 5 concludes the paper.
3. Proposed Model: RICF
Similar to the example of learning words in
Section 2.2, a user’s recall of the older preferences can be affected by the competing information in the memory of newer preferences, which leads to a memory bias towards the older preferences. Thus, in order to explore the bias of user’s preferences memory and improve the accuracy of rating prediction, this paper mainly focuses on the phenomenon of memory decay caused by competition-induced retroactive inhibition. Specifically, this paper introduces the retroactive inhibition factor (RI) and proposes the RICF algorithm to improve the traditional CF algorithm. The whole algorithm is divided into the following steps: (1) training embedding vectors; (2) embedding clustering; (3) preference-retention calculation; (4) preference prediction. The whole process is shown in the
Figure 2.
To tackle the data sparsity problem, this paper introduces a deep learning technique, called embedding training, to convert high-dimensional sparse vectors of items into low-dimensional dense vectors. The distance between these trained embeddings reflects the similarity between them. Then, we clustered embeddings, and the clustered results represent the user’s preferences. After that, the evolution of the user’s preferences is shown by calculating the inhibition intensity and preference retention for each user’s historical preference. Finally, we calculated the user’s future preferences based on the evolution of the user’s preferences and the similarity between item embeddings.
3.1. Preliminary
Suppose that there are a user set
and an item set
, we define the rating pair
, where
represents the rating for user
to item
and
is the time when user
rates item
. The vector
represents the rating pair set of user
. If user
does not rate item
, then the corresponding rating pair is empty. Finally, the rating matrix
is defined as follows,
Definition 1 (Retroactive Inhibition Strength) is given an item set
and a user set
, then we can assign each item
a label
to record the cluster (we call it the preference point)
it belongs to after concluding the clustering of items in the item set, considering the result of cluster shows the preference distribution of users. After that, the retroactive inhibition strength for item
(belongs to the preference point
) rated by user
from time
to time
can be defined as follows,
represents the total impact of retroactive inhibition on
’s preference on the preference point
that
belongs to, where
in the Equation (2) is a collection of preference points to which
’s rated items belong from time
to time
. Moreover, retroactive inhibition is caused by the influence of other stronger preferences or the same preference from time
to time
, we defined the
function to calculate preference distance between
and
for user
, as shown in Equation (3).
where function
aims to measure the cumulative strength of a user’s preference for a particular preference point within the specified time period. Specifically,
is the function to get the average rating among the rating records which rated from time
to time
with the preference point
, and
is the function to get the average rating among the rating records that rated earlier than
with the preference point
.
For example, assuming that user
rated items {
} sequentially, and we know that items fell into preference points
, respectively, as shown in
Figure 3. Then, according to Equation (3), we can calculate the impact of retroactive inhibition on user’s preference on
(which belongs to
), namely
.
3.2. Embedding Training
According to
Section 2.1, traditional collaborative filtering techniques also have the problem of data sparsity and insufficient expression of features of data. That is, a small proportion of items rated by a few users, and other information such as interactions between users and items, sequential information, fail to be represented in traditional CF algorithms. Therefore, inspired by the “item2vec” method [
36], this paper proposes to pre-train a model based on the feedforward deep neural network with ratings as an optimization target to get the dense numeric representations for users/items, which aims to obtain more accurate representations of users/items in the rating space.
More generally, the conversion that makes the original features of an item be transformed into a dense item embedding vector is called “item2vec”. Thus, we train the aforementioned model which inputs users and items information into it, and makes ratings as optimization targets. Consequently, the original features of users/items are transformed into a dense user/item embedding vector, namely
and
, where
and
are the dimensions of the embedding vector. The whole architecture is shown in
Figure 4b. Specifically, for example, the proposed model maps each movie in the dataset to a dense (embedded) vector in a unified Euclidean space where the distance inside represents some kind of correlation between movies, as in
Figure 4a, exploring some implicit information between embeddings in vector space by the tool Gensim [
47]. It can be seen that the distance vectors from the movie
Wonder Woman (a female-themed sci-fi movie) to movie
Iron Man (a male-themed sci-fi movie) and from movie
Cinderella (a female-themed fantasy movie) to movie
Coco (a male-themed fantasy movie) are almost identical, suggesting that the embedding operation in this example can contain the rating relationship and some semantic information among rating history.
We have applied the “item2vec” method to CF to overcome the sparse data problem and take full advantage of the powerful expression of embeddings at the same time to accurately calculate the similarity between items, so that we can achieve more accurate preference predictions that will be discussed in the next sections.
3.3. Embedding Clustering
In addition to solving the data sparsity problem and measuring the similarity between items more precisely, we use clustering of embeddings to better explore the user’s preference partitions, and thus obtain the set of user’s preference points, see Definition 1. Furthermore, in contrast to the K-Means algorithm, the Canopy+K-Means algorithm is used to solve the problem where the effect of clustering for multidimensional data is limited by the K-value and the initial cluster centers, as well as the problem of slow computation of using the K-Means algorithm.
- (1)
Canopy coarse clustering
Canopy is an extremely simple and fast pre-processing algorithm. It was first proposed by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000 [
48]. It is often used for coarse clustering before K-Means algorithm to find appropriate K-value and initial cluster centers [
49] for K-Means algorithm. Specifically, it applies the inexpensive distance method to rough clustering and rigorous distance method to standard clustering. In this way, the Canopy algorithm can make large and high-dimensional data clusters efficiently and practically [
50]. Inspired by the Canopy algorithm, this paper proposes to pre-process the embeddings using the Canopy algorithm to obtain the appropriate cluster K-value and the initial cluster centers which will be treated as input to the K-Means algorithm in the next step, so as to get better cluster results in less time.
As shown in
Figure 5, given that the set of item embeddings
(e.g., movie embeddings) and the heuristic threshold
,
for the Canopy algorithm. First, sample
is selected randomly from
to initialize Canopy
, then the distance between
and the remaining samples in
is calculated. Second, assign those samples of the distance within
to Canopy
and remove those samples of the distance within
from
. Repeat this process until the set
is empty. Last, return the number of canopies and centroids as the K-value as well as initial centroids of the K-Means algorithm.
Algorithm 1. Pseudocode for Canopy clustering algorithm for embeddings. |
1: | Input: the set of item embeddings }. |
2: | Output: the K-value and initial centroids } of cluster. |
3: | Initialize |
4: | whiledo |
5: | Select sample from randomly |
6: | Initialize Canopy |
7: | Remove from |
8: | for remaining sample do |
9: | compute |
10: | if then |
11: | Canopy |
12: | else ifthen |
13: | remove from |
14: | end if |
15: | end for |
16: | end while |
17: | return K-value, initial centroids |
- (2)
K-Means clustering
After coarse clustering by the Canopy algorithm, the number of clusters K and cluster centers
are obtained, where each cluster center
is a multidimensional vector with the same latitude as the item embedding. Then, we apply the traditional K-Means algorithm [
51,
52] by the incorporation of known K-value and initial centroids to cluster embeddings. Eventually, we get the result of the embeddings cluster, namely the set of user preference points.
3.4. Preference Retention Calculation
After Canopy+K-Means clustering, RICF uses the clustering results and the theory of retroactive inhibition to explore the evolution of user’s preferences. In addition, this is assuming that user
rated items {
} sequentially, as shown in
Figure 3, and these items were partitioned in clusters
by Canopy+K-Means algorithm, respectively. Then, according to the definition of retroactive inhibition strength, we can calculate the suffered RI strength of each item. Consequently,
,
,
,
,
.
Next, we need to define a RI-based decay function to simulate the process of user’s preferences decay due to memory inhibition. We name the described function as
to record the proportion of user’s preference retention. In addition, inspired by the Ebbinghaus Curve [
11], we address several methods to find out the most appropriate simulation curve, including Power function, Exponential function and Parabolic function as a control for extreme conditions. As an instance shown in
Figure 6.
So far, we can calculate the proportion of user’s preference retention on specific items by the defined decay function and RI strength. Consequently, user’s real preference can be measured to improve the accuracy as well as interpretation of the recommendation.
3.5. Preference Prediction
RICF improved the traditional itemCF algorithm based on the idea of retroactive inhibition to take user preference evolution into consideration, and it introduced baseline rating as well as adjusted methods to calculate the similarity between items directly on the item embeddings obtained from the previous training. The experiment results are shown in the next section.
Definition 2 Embedding Similarity
Based on the items for which the embeddings have been trained, this paper uses the Cosine method to calculate the similarity between the two embeddings
and
. The original Cosine is used to calculate the angle
between two vectors
and
, ranging from −1 to 1, as defined in Equation (4). The proposed RICF uses the normalized Cosine method to measure the similarity between two embedding vectors, as shown in Equation (5).
Consequently, this paper derives the RICF algorithm as follows, to predict user
’s preference for item
.
where
is the heuristic parameter which we chose by stochastic selection strategy. Part 1 of this equation is the baseline rating, where
is the mean rating for all users.
and
are the rating bias of user
and item
, as defined in Equations (7) and (8), respectively. Moreover, in Equations (7) and (8),
(
) represents all rating records associated with user
(item
),
(
) represents the mean rating of item
(user
). Part 2 of Equation (6) is an improved itemCF algorithm that introduces embedding similarity
to calculate the similarity between item
and item
, as shown in Definition 2, and introduces preference decay factor
to calculate the decay degree of user
’s preference on older items
when user
is going to rate newer item
, as seen in
Section 3.4. Meanwhile,
(
) is the simplified base rating prediction for item
(
), which equals
(
). What is more, the KNN part in Equation (6) chooses
, which selects k items most similar to item
from the rating history of user
.
4. Results and Discussion
This section shows the comparison between traditional methods and our proposed RICF algorithm. Several state-of-the-art algorithms are also implemented as baselines to compare with the proposed RICF algorithm.
4.1. Dataset and Experimental Setup
Specifically, to evaluate the performance of the proposed RICF algorithm, we designed a total of seven algorithms to be used as comparisons for the experiment: (1) traditional itemCF, (2) RICF excluding RI effects, namely = 1 in Equation 6 (without-RI), (3) RICF with Power decay function (Power-RICF), (4) RICF with Exponential decay function (Exponential-RICF), (5) RICF with Parabolic decay function (Parabolic-RICF), (6) sequential model LSTM (Embedding+LSTM), and (7) sequential model GRU (Embedding+GRU). All algorithms are coded in python and tested on the Kaggle cloud-based workbench.
Furthermore, in order to ensure that each rating record is rated on user’s real rating habits and scenarios (not rated by volunteers at once such as in the Movielens dataset [
53]) and thus to ensure the authenticity of user’s preference evolution, we have specially selected three well-known datasets based on real timestamps, MovieTweetings-100k, MovieTweetings-latest [
13], and DigitalMusic [
14] to conduct the experiment. MovieTweetings is an up-to-date dataset that collects all tweets from Twitter having the format “*I rated #IMDB*”. Originally, the MovieTweetings-100k contained 16,554 unique users and 10,506 unique movies that were rated from 28/02/2013 to 01/09/2013. Likewise, the MovieTweetings-latest contains 68,332 unique users and 35,931 unique items that were rated from 28/02/2013 to 10/07/2020. All of the movie ratings are scaled from 1 to 10, which will be re-scaled to 1 to 5 in this paper. The DigitalMusic dataset contains reviews and metadata from Amazon. It contains 478,235 unique users and 266,414 unique digital music that were rated from 20/01/1998 to 23/07/2014. All of the music ratings are scaled from 1 to 5. The description of these three datasets is as shown in
Table 2.
4.2. Evaluation Metrics
Based on the common metrics of rating prediction in the recommendation system, this paper uses the mean absolute error (MAE) and the root mean squared error (RMSE), as metrics to evaluate how well we predict. They are defined as follows.
Assuming there are prediction values and target values . Therefore, 4.3. Results
4.3.1. Recommendation Quality
First, in
Section 3.2, we conducted the experiment on datasets with the training–testing ratio 90–10% to find the appropriate dimension for embedding vectors. For the movie dataset, as shown in
Figure 7a, the lowest RMSE occurred when the dimension of embedding vectors was between 50 and 100, which does not depend on the experimental algorithm to be evaluated. Meanwhile, considering the number of dimensions of a movie’s own attributes, such as classes, user preferences and so on, we chose 64 to be the embedding’s dimension for movies. Similarly, as shown in
Figure 7b, we chose 32 to be the embedding vector’s dimension for music.
Moreover, the second experiment was conducted to discuss the selection of K-neighbors for RICF. The effects of K-neighbors in RICF are compared with that in traditional itemCF, and the experiment results are shown in
Figure 8. We found that only a few K-neighbors are required to achieve good results in RICF and we speculate that the reason is related to the embedding model trained to optimize based on the rating target, because embeddings contain the latent information about the requirement of optimal rating-prediction, so that the more similar the neighbor, the greater the contribution to the rating prediction. Thus, this experiment proved that embedding trained using the rating as the optimization target is able to reduce the number of K-neighbors and hence reduce the computation of the collaborative part. Finally, we chose K = 10 (the minimum K value defined in this paper) as the number of neighbors in the RICF algorithm.
Most notably, this paper investigates the effects of different decay functions in the RICF as well as the comparisons between the traditional itemCF and the state-of-art sequential model that deals with temporal information. In addition, to better explore the process of user preference decay with inhibition, we selected four different ways to model the decay of preferences: (1) without RI effects (without-RI), (2) Power function: obviously decay “quickly then slowly”, (3) Exponential function: no obvious “quickly then slowly”, (4) Parabolic function: as a contrast, it is clear that decay “slowly then quickly”. Finally, the main experiment results on the two datasets are seen in
Table 3,
Table 4 and
Table 5.
A comprehensive comparison of the
Figure 9,
Figure 10 and
Figure 11 reveals: (1) The RICF algorithm has lower MAE and RMSE than traditional itemCF algorithms, the state-of-art sequential models LSTM and GRU on three datasets used in the experiment. (2) Power-RICF has the lowest MAE and RMSE among the three RICF algorithms. Taking
Figure 6 into consideration, the Power function can better fit the process of forgetting in contrast to the Exponential function and Parabolic function, and also it is in tune with the character of preference decay that “first quickly and then slowly”.
4.3.2. Embedding Clustering Visualization
For the trained multi-dimensional embeddings, we applied t-SNE (t-Distributed Stochastic Neighbor Embedding) to perform dimensional visualization. The t-SNE is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets [
54]. According to the previous experiment on the Canopy+K-means algorithm, we clustered the embeddings, as shown in
Figure 12, and each color in it represents an embedding cluster. Different clusters represent different user preferences. From the visualization of this figure, we can distinguish the categories of embeddings more clearly. For example, in
Figure 12a, the bar on the right shows seventeen colors, representing seventeen different clusters. The coordinate system on the left shows the clustering results in three dimensions. It can be seen that the similar embeddings are clearly clustered, and the boundaries between different clusters are relatively clear.
4.3.3. The stability of RICF
Furthermore, to verify the stability of the RICF algorithm, we separately experimented on three datasets and at the same time, varied the ratio of the training-testing portions from 50% to 90% to observe the results. As can be seen in the
Figure 13,
Figure 14 and
Figure 15, Power-RICF has consistently better performance compared to other experimental algorithms.
As is shown in
Figure 13, the
x-axis represents different ratios of trainset-testset for MovieTweetings-latest dataset; the
y-axis represents the root mean squared error (RMSE) of different algorithms. The left plot shows how the RMSE of the seven different algorithms varies with the ratio of trainset-testset. The plot on the right is a partial enlargement of the plot on the left. As can be seen from
Figure 13, the RMSEs of the three RICF algorithms are lower than other algorithms and Power-RICF’s is the lowest. Moreover, for different ratios of trainset-testset, the RMSEs of the RICF algorithms are stable around 0.63, which are the most stable among the seven algorithms.
Figure 14 and
Figure 15 show the RMSE of MovieTweetings-100k dataset and DigitalMusic dataset, respectively. Conclusions obtained from
Figure 13 and
Figure 14 are similar to those from
Figure 12. Therefore, for different datasets, the RICF algorithm has better stability and performance than other algorithms, especially Power-RICF.
4.4. Discussion
Currently, there are few studies applying the theory of memory inhibition to Computer Science [
11], and studies on Cognitive Psychology have mainly focused on activation propagation, and evolution of preferences from the temporal perspective only, ignoring the competition and inhibition within memories. Thus, in order to fill this gap, we firstly conducted the classical rating-prediction experiment on the traditional itemCF algorithm, and introduced the theory of memory inhibition to explore the evolution of users’ preferences. Secondly, to better take into account the multi-semantic information, we introduced the embedding pre-training technique on the traditional itemCF and used the more efficient Canopy+K-Means algorithm to cluster the multidimensional embeddings to construct the user preference points model, so as to simulate the process of user preference decay and build recommendation models more comprehensively and accurately.
The experiment results show that the memory decay based on retroactive inhibition is consistent with existing memory decay processes (decay first quickly and then slowly) [
10,
55], and the incorporation of a strong representation of embedding makes the recommendation mechanism more interpretable and has higher prediction accuracy in comparison to traditional itemCF and sequential models. What is more, only fewer neighbors yielded good results when using embeddings to compute similarity and select K neighbors accordingly, unlike traditional CF algorithms that are more reliant on K neighbor selection. Here, the embeddings are trained from a model with rating as an optimization target. We speculate that is because the trained embedding implicitly contains rating-optimized information, which is worthy of further research.
Here, we focused on the problem of rating prediction accuracy of the recommendation algorithm based on retroactive inhibition. We found that in terms of rating prediction accuracy, deep learning-based baseline algorithms perform worse than other baseline algorithms. Similar to most other modern recommendation algorithms, sequence algorithms are not designed for the rating prediction problem, but rather serve to perform the recommendation prediction task better. Therefore, as a future work, we plan to use the proposed algorithm to verify the performance of other recommendation tasks such as prediction ranking.
5. Conclusions
This paper proposed a novel approach called RICF to explore the evolution of user preference based on the theory of retroactive inhibition in cognitive psychology. In RICF, to tackle the problem of data sparsity, each item is represented as a dense numerical vector by training a feedforward deep neural network to predict user preferences for items. Moreover, the Canopy+K-Means clustering algorithm was used to more efficiently cluster multidimensional embedding vectors, and the results of cluster were used to construct a model of users’ points of preference. Evaluation experiments were conducted using three datasets which reflect user’s real rating timestamp (rather than volunteers’ pooled ratings), and the results indicate that the proposed algorithm is better at exploring the evolution of user preference with better accuracy and interpretability. Furthermore, the proposed approach has produced better performance than state-of-the-art techniques on both accuracy and novelty.
Further work consists of three directions. The first direction is to use more sequential datasets to further validate the RICF algorithm. The second direction is to incorporate more diverse information and even the Knowledge Graph technique to train more accurate embedding vectors. The third direction is to explore deeper into the mechanisms of memory inhibition to provide inspiration for sequential modeling algorithms in the field of Deep Learning as well as for the study of other recommendation tasks such as prediction ranking.