An Improved Collaborative Filtering Recommendation Algorithm Based on Retroactive Inhibition Theory

Collaborative filtering (CF) is the most classical and widely used recommendation algorithm, which is mainly used to predict user preferences by mining the user’s historical data. CF algorithms can be divided into two main categories: user-based CF and item-based CF, which recommend items based on rating information from similar user profiles (user-based) or recommend items based on the similarity between items (item-based). However, since user’s preferences are not static, it is vital to take into account the changing preferences of users when making recommendations to achieve more accurate recommendations. In recent years, there have been studies using memory as a factor to measure changes in preference and exploring the retention of preference based on the relationship between the forgetting mechanism and time. Nevertheless, according to the theory of memory inhibition, the main factors that cause forgetting are retroactive inhibition and proactive inhibition, not mere evolutions over time. Therefore, our work proposed a method that combines the theory of retroactive inhibition and the traditional item-based CF algorithm (namely, RICF) to accurately explore the evolution of user preferences. Meanwhile, embedding training is introduced to represent the features better and alleviate the problem of data sparsity, and then the item embeddings are clustered to represent the preference points to measure the preference inhibition between different items. Moreover, we conducted experiments on real-world datasets to demonstrate the practicability of the proposed RICF. The experiments show that the RICF algorithm performs better and is more interpretable than the traditional item-based collaborative filtering algorithm, as well as the state-of-art sequential models such as LSTM and GRU.


Introduction
Since the inception of the Internet, users and hardware devices have generated an enormous amount of data. The problem of "information overload" has inevitably emerged, and recommendation algorithms have to solve the problem of how to efficiently extract valuable information from these massive amounts of unorganized data. Therefore, there is no doubt that recommendation systems, as the engines of Internet development, provide great convenience and benefits to users and Internet companies in the "information overload" era. They have also seen success in e-commerce and social networking as well as other fields.
The most traditional and widely used recommendation algorithm is the collaborative filtering algorithm (abbreviated as CF), which is further divided into item-based collaborative filtering (abbreviated as itemCF) and user-based collaborative filtering (abbreviated as userCF) algorithm based on item similarity and user similarity, so as to calculate the ratings of unknown items as well as predict user preferences. For example, for a target item whose rating needs to be estimated, the userCF chooses to calculate the similarity between users, and then the unknown rating is predicted by averaging the (weighted) CF algorithms are all designed to recommend items to users based on the preferences of similar users or the user's history, and the process is done by calculating the similarity between users or items. However, traditional CF algorithms suffer from the problem of data sparsity and inadequate exploration of mechanisms of how user preferences decay. In fact, several prior studies [3][4][5] pointed out that preference decay and memory decay are very similar. Thus, this paper proposes an improved CF algorithm based on retroactive inhibition of preferences (abbreviated as RICF) to capture the evolution of user's preferences. There are a number of papers in the field of recommendation systems that measure the memory forgetting process based directly on the decay of time [6][7][8][9]. However, according to the [10][11][12] researches in psychology, memory attenuation mainly stems from memory inhibition. Therefore, RICF adopts the theory of retroactive inhibition, which measures the retention of the user's preference on item by calculating the strength of inhibition that item suffered within the corresponding time period, and consequently affects the weight of the contribution that item makes to the target item's rating prediction.
For example, given samples as in Table 1, to predict user 's rating of movie 4 on 11/12/2018, traditional itemCF calculates the contribution of each similar movie directly (namely, rating × similarity), so user 's rating of movie 1 has contributed 2.5 × 0.9 = 2.25 rating to the prediction of user 's rating of movie 4. However, we need to adjust the user's previous ratings to fit real scenarios because of the evolution of user's preference. CF algorithms are all designed to recommend items to users based on the preferences of similar users or the user's history, and the process is done by calculating the similarity between users or items. However, traditional CF algorithms suffer from the problem of data sparsity and inadequate exploration of mechanisms of how user preferences decay. In fact, several prior studies [3][4][5] pointed out that preference decay and memory decay are very similar. Thus, this paper proposes an improved CF algorithm based on retroactive inhibition of preferences (abbreviated as RICF) to capture the evolution of user's preferences. There are a number of papers in the field of recommendation systems that measure the memory forgetting process based directly on the decay of time [6][7][8][9]. However, according to the [10][11][12] researches in psychology, memory attenuation mainly stems from memory inhibition. Therefore, RICF adopts the theory of retroactive inhibition, which measures the retention of the user's preference on item a by calculating the strength of inhibition that item a suffered within the corresponding time period, and consequently affects the weight of the contribution that item a makes to the target item's rating prediction.
For example, given samples as in Table 1, to predict user a's rating of movie 4 on 11/12/2018, traditional itemCF calculates the contribution of each similar movie directly (namely, rating × similarity), so user a's rating of movie 1 has contributed 2.5 × 0.9 = 2.25 rating to the prediction of user a's rating of movie 4. However, we need to adjust the user's previous ratings to fit real scenarios because of the evolution of user's preference. Specifically, on 11/12/2018, for the preference that user a had to movie 1, it suffered retroactive inhibition from 05/04/2018 to 11/12/2018, namely the inhibition from movie 3 rated by user a, which exists in a different preference cluster and has a higher rating. Movie 3 subsequently affects movie 1's contribution to the prediction of movie 4's rating (2.5 × 0.9 × Decay(RI)). In this way, RICF measures the evolution of user's preferences more interpretatively and accurately from the perspective of preference inhibition. The contributions of this paper are described as follows, 1. This paper introduces the theory of retroactive inhibition into recommendation systems to more interpretably measure the decay of a user's preference over time. Specifically, we modified the application of the forgetting mechanism of brain memory to the recommendation system by using retroactive inhibition instead of using forgetting over time directly, in order to calculate the change of user's preference more accurately.

2.
The proposed RICF algorithm not only takes into account the evolution of user preferences but also uses more powerful item embeddings by fusing user, item and rating information to alleviate the problem of data sparsity and improve the accuracy of rating prediction. In addition, the embeddings trained by the model (using ratingprediction as an optimization goal) can help to reduce the number of similar neighbors in the collaborative process.

3.
Differing from previous related studies, this paper proposes to use a clustering of embeddings to obtain a preference points model. Meanwhile, RICF combines the Canopy and K-Means algorithms to overcome the problem that the clustering efficiency decreases with the increasing size of the dataset, as well as the feature dimension.

4.
To show the practicability of the proposed algorithm, this paper applies real datasets with real timestamps, which are the live movie rating dataset collected from Twitter [13] and the digital music dataset collected from Amazon [14]. The experiment results show that RICF performs better and is more interpretable than the traditional itemCF as well as the state-of-the-art sequential algorithms that focus on the research of preference decay.
The remainder of this paper is organized as follows. Section 2 summarizes the related work. Section 3 provides some preliminaries and describes the proposed RICF algorithm. Section 4 presents results from experiments conducted on two evaluation datasets. Section 5 concludes the paper.

Collaborative Filtering Recommendation
Recommendation System (RS) plays an important role in today's Internet, which aims to filter relevant as well as important information for users from previous feedbacks. The demand for such systems is gradually increasing along with the overload of data on the Internet. The root of RS can be traced back to the extensive researches in cognitive science [15], information retrieval [16], management science [17], and also in consumer choice modeling in marketing [18]. Later on, RS developed into an independent research field in the mid-1990s to tackle problems in the structure of explicit rating [19]. Naturally, one of the most common processes is to turn recommendations into operations to predict an unknown item's rating for users based on their previous behaviors, namely, the rating-Appl. Sci. 2021, 11, 843 4 of 20 prediction problem. Then, we can recommend the highest rated items to the user based on the predictions. In addition, there are top-N based recommendations, but this paper mainly focuses on the problem of rating prediction. Moreover, the approaches of RS can be divided into three categories [19]: (1) content-based recommendation, (2) collaborative recommendation, and (3) hybrid approaches.
CF algorithms belong to the collaborative recommendation, which makes recommendations similar to user's previous preference or from similar users. According to [20], algorithms for collaborative recommendations can be grouped into two general classes: memory-based(or heuristic-based) and model-based. Memory-based algorithms [20][21][22] are essentially heuristics that make recommendations based on the entire dataset of user previous behaviors, which can be further divided into userCF and itemCF. Furthermore, [23,24] present empirical evidence that itemCF can provide better computational performance as well as better quality of recommendation than traditional userCF. On the other hand, modelbased algorithms [25][26][27][28][29][30] focus on model learning and then apply the trained model to the recommendation. Meanwhile, there are two popular error functions for CF algorithms to evaluate the performance, especially for the task of rating prediction: mean absolute error (MAE) and root mean squared error (RMSE). Since the absolute value function is not differentiable at all points, RMSE is more desirable as an objective function [31].
However, traditional CF algorithms suffer from different problems: sparsity, high dimensionality of data [32], and disability of capturing the evolution of user preferences over time. From now on, amounts of researches in the recommendation community have emerged to tackle these difficulties. For example, to tackle the first problem, [33] directly reduce the number of users/items, [34,35] extract information from clustered groups, and more advanced, [36,37] propose to learn a latent space vector for each user/item by deep learning algorithms. On the second problem, to the best of our knowledge, there are two main different solutions that take user's dynamic preferences into consideration. One way is to directly apply time as a factor [7,8,38,39], e.g., Ebbinghaus's forgetting curve. Another approach is driven by the development of sequential recommendation, which treats useritem interactions as a series of sequential events and takes the sequential dependencies into account to capture user's current or recent preference for more accurate recommendations [40], such as state-of-art techniques, Long Short Term Memory (LSTM) [41,42] and Gated Recurrent Unit (GRU) [43].
In summary, the fact of reducing the influence of data sparsity as well as emphasizing the requirement of capturing dynamic user's preferences has gained more attention in the recommendation field. The classic CF algorithm still faces these challenges, and these factors should be taken into account to make recommendations not only interpretable but more accurate. Thus, that is the direction we follow in this paper.

Retroactive Inhibition Theory
Inhibition is one of the core concepts in Cognitive Psychology [11]. The idea that inhibition actively impairs the representation of the human mind has inspired a great deal of research in various fields. Specifically, among the several theories of the forgetting mechanism, the overall evidence suggests that the inhibition within similar objects are by far the most critical factor in the forgetting process rather than the direct time factor [11,44]. In addition, inhibition theory, also known as interference theory, proposed that information competition can refer to either proactive inhibition (PI) (interfered by previous similar objects) or retroactive inhibition (RI) (interfered by subsequent similar objects) [45]. For example, after you learn a vocabulary list containing the word "dairy" and the word "diary" (the "dairy" was learned first and the "diary" was learned later), your recall of the two similar words will be affected by mutual inhibition. Proactive inhibition is the inhibition of newer memories with the retrieval of older memories, that is, your memory of "dairy" will produce proactive inhibition when you recall "diary". Retroactive inhibition is the inhibition of older memories with the retrieval of newer memories, that is, your memory of "diary" will produce retroactive inhibition when you recall "dairy". Compared with proactive inhibition, retroactive inhibition may have larger effects [46].
According to [11], algorithms should take the cognitive principles into consideration to be more in tune with the process of forgetting. However, for Computer Science, to the best of our knowledge, while the cognitive concept of activation propagation and the concept of forgetting have been adopted into different types of computer science approaches more recently, the explicit adoption of the concept of inhibition has not been investigated yet [11].

Proposed Model: RICF
Similar to the example of learning words in Section 2.2, a user's recall of the older preferences can be affected by the competing information in the memory of newer preferences, which leads to a memory bias towards the older preferences. Thus, in order to explore the bias of user's preferences memory and improve the accuracy of rating prediction, this paper mainly focuses on the phenomenon of memory decay caused by competitioninduced retroactive inhibition. Specifically, this paper introduces the retroactive inhibition factor (RI) and proposes the RICF algorithm to improve the traditional CF algorithm. The whole algorithm is divided into the following steps: (1) training embedding vectors; (2) embedding clustering; (3) preference-retention calculation; (4) preference prediction. The whole process is shown in the Figure 2.
of the two similar words will be affected by mutual inhibition. Proactive inhibition is the inhibition of newer memories with the retrieval of older memories, that is, your memory of "dairy" will produce proactive inhibition when you recall "diary". Retroactive inhibition is the inhibition of older memories with the retrieval of newer memories, that is, your memory of "diary" will produce retroactive inhibition when you recall "dairy". Compared with proactive inhibition, retroactive inhibition may have larger effects [46].
According to [11], algorithms should take the cognitive principles into consideration to be more in tune with the process of forgetting. However, for Computer Science, to the best of our knowledge, while the cognitive concept of activation propagation and the concept of forgetting have been adopted into different types of computer science approaches more recently, the explicit adoption of the concept of inhibition has not been investigated yet [11].

Proposed Model: RICF
Similar to the example of learning words in Section 2.2, a user's recall of the older preferences can be affected by the competing information in the memory of newer preferences, which leads to a memory bias towards the older preferences. Thus, in order to explore the bias of user's preferences memory and improve the accuracy of rating prediction, this paper mainly focuses on the phenomenon of memory decay caused by competition-induced retroactive inhibition. Specifically, this paper introduces the retroactive inhibition factor (RI) and proposes the RICF algorithm to improve the traditional CF algorithm. The whole algorithm is divided into the following steps: (1) training embedding vectors; (2) embedding clustering; (3) preference-retention calculation; (4) preference prediction. The whole process is shown in the Figure 2.  To tackle the data sparsity problem, this paper introduces a deep learning technique, called embedding training, to convert high-dimensional sparse vectors of items into lowdimensional dense vectors. The distance between these trained embeddings reflects the similarity between them. Then, we clustered embeddings, and the clustered results represent the user's preferences. After that, the evolution of the user's preferences is shown by calculating the inhibition intensity and preference retention for each user's historical preference. Finally, we calculated the user's future preferences based on the evolution of the user's preferences and the similarity between item embeddings.

Preliminary
Suppose that there are a user set U = {u 1 , u 2 · · · u n } and an item set O = {o 1 , o 2 · · · o m }, we define the rating pair r ij , t ij , where r ij represents the rating for user u i to item o j and t ij is the time when user u i rates item o j . The vector → u i represents the rating pair set of user u i . If user u i does not rate item o j , then the corresponding rating pair is empty. Finally, the rating matrix M is defined as follows, Definition 1 (Retroactive Inhibition Strength) is given an item set O = {o 1 , o 2 · · · o m } and a user set U = {u 1 , u 2 · · · u n }, then we can assign each item o j a label k to record the cluster (we call it the preference point) k it belongs to after concluding the clustering of items in the item set, considering the result of cluster shows the preference distribution of users. After that, the retroactive inhibition strength for item o n (belongs to the preference point k) rated by user u i from time t in to time t im can be defined as follows, RI in→im represents the total impact of retroactive inhibition on u i 's preference on the preference point k that o n belongs to, where K in→im in the Equation (2) is a collection of preference points to which u i 's rated items belong from time t in to time t im . Moreover, retroactive inhibition is caused by the influence of other stronger preferences or the same preference from time t in to time t im , we defined the Dist function to calculate preference distance between k and k for user u i , as shown in Equation (3).
where function r aims to measure the cumulative strength of a user's preference for a particular preference point within the specified time period. Specifically, r k is the function to get the average rating among the rating records which rated from time t in to time t im with the preference point k , and r(k) is the function to get the average rating among the rating records that rated earlier than t in with the preference point k.

Embedding Training
According to Section 2.1, traditional collaborative filtering techniques also have the problem of data sparsity and insufficient expression of features of data. That is, a small proportion of items rated by a few users, and other information such as interactions between users and items, sequential information, fail to be represented in traditional CF algorithms. Therefore, inspired by the "item2vec" method [36], this paper proposes to pretrain a model based on the feedforward deep neural network with ratings as an optimization target to get the dense numeric representations for users/items, which aims to obtain more accurate representations of users/items in the rating space.
More generally, the conversion that makes the original features of an item be transformed into a dense item embedding vector is called "item2vec". Thus, we train the aforementioned model which inputs users and items information into it, and makes ratings as optimization targets. Consequently, the original features of users/items are transformed into a dense user/item embedding vector, namely ( ) = { , ⋯ } and ( ) = { , ⋯ }, where and are the dimensions of the embedding vector. The whole architecture is shown in Figure 4b. Specifically, for example, the proposed model maps each movie in the dataset to a dense (embedded) vector in a unified Euclidean space where the distance inside represents some kind of correlation between movies, as in Figure 4a, exploring some implicit information between embeddings in vector space by the tool Gensim [47]. It can be seen that the distance vectors from the movie Wonder Woman (a female-themed sci-fi movie) to movie Iron Man (a male-themed sci-fi movie) and from movie Cinderella (a female-themed fantasy movie) to movie Coco (a male-themed fantasy movie) are almost identical, suggesting that the embedding operation in this example can contain the rating relationship and some semantic information among rating history.

Embedding Training
According to Section 2.1, traditional collaborative filtering techniques also have the problem of data sparsity and insufficient expression of features of data. That is, a small proportion of items rated by a few users, and other information such as interactions between users and items, sequential information, fail to be represented in traditional CF algorithms. Therefore, inspired by the "item2vec" method [36], this paper proposes to pre-train a model based on the feedforward deep neural network with ratings as an optimization target to get the dense numeric representations for users/items, which aims to obtain more accurate representations of users/items in the rating space.
More generally, the conversion that makes the original features of an item be transformed into a dense item embedding vector is called "item2vec". Thus, we train the aforementioned model which inputs users and items information into it, and makes ratings as optimization targets. Consequently, the original features of users/items are transformed into a dense user/item embedding vector, namely Vec(u i ) = {w 1 , w 2 · · · w n } and Vec(o i ) = {w 1 , w 2 · · · w m }, where n and m are the dimensions of the embedding vector. The whole architecture is shown in Figure 4b. Specifically, for example, the proposed model maps each movie in the dataset to a dense (embedded) vector in a unified Euclidean space where the distance inside represents some kind of correlation between movies, as in Figure 4a, exploring some implicit information between embeddings in vector space by the tool Gensim [47]. It can be seen that the distance vectors from the movie Wonder Woman (a female-themed sci-fi movie) to movie Iron Man (a male-themed sci-fi movie) and from movie Cinderella (a female-themed fantasy movie) to movie Coco (a male-themed fantasy movie) are almost identical, suggesting that the embedding operation in this example can contain the rating relationship and some semantic information among rating history.  We have applied the "item2vec" method to CF to overcome the sparse data problem and take full advantage of the powerful expression of embeddings at the same time to accurately calculate the similarity between items, so that we can achieve more accurate preference predictions that will be discussed in the next sections.

Embedding Clustering
In addition to solving the data sparsity problem and measuring the similarity between items more precisely, we use clustering of embeddings to better explore the user's preference partitions, and thus obtain the set of user's preference points, see Definition 1. Furthermore, in contrast to the K-Means algorithm, the Canopy+K-Means algorithm is used to solve the problem where the effect of clustering for multidimensional data is limited by the K-value and the initial cluster centers, as well as the problem of slow computation of using the K-Means algorithm.
(1) Canopy coarse clustering Canopy is an extremely simple and fast pre-processing algorithm. It was first proposed by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000 [48]. It is often used for coarse clustering before K-Means algorithm to find appropriate K-value and initial cluster centers [49] for K-Means algorithm. Specifically, it applies the inexpensive distance method to rough clustering and rigorous distance method to standard clustering. In this way, the Canopy algorithm can make large and high-dimensional data clusters efficiently and practically [50]. Inspired by the Canopy algorithm, this paper proposes to pre-process the embeddings using the Canopy algorithm to obtain the appropriate cluster K-value and the initial cluster centers which will be treated as input to the K-Means algorithm in the next step, so as to get better cluster results in less time.
As shown in Figure 5, given that the set of item embeddings (e.g., movie embeddings) and the heuristic threshold , for the Canopy algorithm. First, sample is selected randomly from to initialize Canopy , then the distance between and the remaining samples in is calculated. Second, assign those samples of the distance within to Canopy and remove those samples of the distance within from . Repeat this process until the set is empty. Last, return the number of canopies and centroids as the K-value as well as initial centroids of the K-Means algorithm. We have applied the "item2vec" method to CF to overcome the sparse data problem and take full advantage of the powerful expression of embeddings at the same time to accurately calculate the similarity between items, so that we can achieve more accurate preference predictions that will be discussed in the next sections.

Embedding Clustering
In addition to solving the data sparsity problem and measuring the similarity between items more precisely, we use clustering of embeddings to better explore the user's preference partitions, and thus obtain the set of user's preference points, see Definition 1. Furthermore, in contrast to the K-Means algorithm, the Canopy+K-Means algorithm is used to solve the problem where the effect of clustering for multidimensional data is limited by the K-value and the initial cluster centers, as well as the problem of slow computation of using the K-Means algorithm.
(1) Canopy coarse clustering Canopy is an extremely simple and fast pre-processing algorithm. It was first proposed by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000 [48]. It is often used for coarse clustering before K-Means algorithm to find appropriate K-value and initial cluster centers [49] for K-Means algorithm. Specifically, it applies the inexpensive distance method to rough clustering and rigorous distance method to standard clustering. In this way, the Canopy algorithm can make large and high-dimensional data clusters efficiently and practically [50]. Inspired by the Canopy algorithm, this paper proposes to pre-process the embeddings using the Canopy algorithm to obtain the appropriate cluster K-value and the initial cluster centers which will be treated as input to the K-Means algorithm in the next step, so as to get better cluster results in less time.
As shown in Figure 5, given that the set of item embeddings E (e.g., movie embeddings) and the heuristic threshold T1, T2 for the Canopy algorithm. First, sample a is selected randomly from E to initialize Canopy c, then the distance between a and the remaining samples in E is calculated. Second, assign those samples of the distance within T1 to Canopy c and remove those samples of the distance within T2 from E. Repeat this process until the set E is empty. Last, return the number of canopies and centroids as the K-value as well as initial centroids of the K-Means algorithm.   [51,52] by the incorporation of known K-value and initial centroids to cluster embeddings. Eventually, we get the result of the embeddings cluster, namely the set of user preference points.

3:
Initialize T1, T2 4: while E = ∅ do 5: Select sample a from E randomly 6: Initialize Canopy c ← a 7: Remove a from E 8: for remaining sample e ∈ E do 9: compute d 10: if (2) K-Means clustering After coarse clustering by the Canopy algorithm, the number of clusters K and cluster centers {s 1 , s 2 · · · s k } are obtained, where each cluster center s i is a multidimensional vector with the same latitude as the item embedding. Then, we apply the traditional K-Means algorithm [51,52] by the incorporation of known K-value and initial centroids to cluster embeddings. Eventually, we get the result of the embeddings cluster, namely the set of user preference points.
Next, we need to define a RI-based decay function to simulate the process of user's preferences decay due to memory inhibition. We name the described function as Decay(RI) to record the proportion of user's preference retention. In addition, inspired by the Ebbinghaus Curve [11], we address several methods to find out the most appropriate simulation curve, including Power function, Exponential function and Parabolic function as a control for extreme conditions. As an instance shown in Figure 6.

Preference Retention Calculation
After Canopy+K-Means clustering, RICF uses the clustering results and the theory of retroactive inhibition to explore the evolution of user's preferences. In addition, this is assuming that user rated items { , , , , } sequentially, as shown in Figure 3, and these items were partitioned in clusters { = , = , = , = , = } by Can-opy+K-Means algorithm, respectively. Then, according to the definition of retroactive inhibition strength, we can calculate the suffered RI strength of each item. Consequently, Next, we need to define a RI-based decay function to simulate the process of user's preferences decay due to memory inhibition. We name the described function as ( ) to record the proportion of user's preference retention. In addition, inspired by the Ebbinghaus Curve [11], we address several methods to find out the most appropriate simulation curve, including Power function, Exponential function and Parabolic function as a control for extreme conditions. As an instance shown in Figure 6. So far, we can calculate the proportion of user's preference retention on specific items by the defined decay function and RI strength. Consequently, user's real preference can be measured to improve the accuracy as well as interpretation of the recommendation.

Preference Prediction
RICF improved the traditional itemCF algorithm based on the idea of retroactive inhibition to take user preference evolution into consideration, and it introduced baseline rating as well as adjusted methods to calculate the similarity between items directly on the item embeddings obtained from the previous training. The experiment results are shown in the next section.

Definition 2 Embedding Similarity
Based on the items for which the embeddings have been trained, this paper uses the Cosine method to calculate the similarity between the two embeddings ( ) = { , ⋯ , } and = { , ⋯ }. The original Cosine is used to calculate the angle between two vectors ( ) and ( ), ranging from −1 to 1, as defined in Equation (4). The proposed RICF uses the normalized Cosine method to measure the similarity between two embedding vectors, as shown in Equation (5). So far, we can calculate the proportion of user's preference retention on specific items by the defined decay function and RI strength. Consequently, user's real preference can be measured to improve the accuracy as well as interpretation of the recommendation.

Preference Prediction
RICF improved the traditional itemCF algorithm based on the idea of retroactive inhibition to take user preference evolution into consideration, and it introduced baseline rating as well as adjusted methods to calculate the similarity between items directly on the item embeddings obtained from the previous training. The experiment results are shown in the next section.

Definition 2 Embedding Similarity
Based on the items for which the embeddings have been trained, this paper uses the Cosine method to calculate the similarity between the two embeddings Vec(O x ) = {x 1 , x 2 · · · , x n } and Vec O y = {y 1 , y 2 · · · y n }. The original Cosine is used to calculate the angle θ between two vectors Vec(O x ) and Vec O y , ranging from −1 to 1, as defined in Equation (4). The proposed RICF uses the normalized Cosine method to measure the similarity between two embedding vectors, as shown in Equation (5).
Consequently, this paper derives the RICF algorithm as follows, to predict user u i 's preference for item o j .
where ρ is the heuristic parameter which we chose by stochastic selection strategy. Part 1 of this equation is the baseline rating, where U is the mean rating for all users. b u and b j are the rating bias of user u i and item o j , as defined in Equations (7) and (8), respectively. Moreover, in Equations (7) and (8), N(u)(N(j)) represents all rating records associated with user u(item o j ), r i (r u ) represents the mean rating of item o i (user u). Part 2 of Equation (6) is an improved itemCF algorithm that introduces embedding similarity Sim ij to calculate the similarity between item o i and item o j , as shown in Definition 2, and introduces preference decay factor Decay RI ui→uj to calculate the decay degree of user u's preference on older items o i when user u is going to rate newer item o j , as seen in Section 3.4. Meanwhile, b ui (b uj ) is the simplified base rating prediction for item o i (o j ), which equals r u + b i (r u + b j ). What is more, the KNN part in Equation (6) chooses N k j (u), which selects k items most similar to item o j from the rating history of user u.

Results and Discussion
This section shows the comparison between traditional methods and our proposed RICF algorithm. Several state-of-the-art algorithms are also implemented as baselines to compare with the proposed RICF algorithm.
Furthermore, in order to ensure that each rating record is rated on user's real rating habits and scenarios (not rated by volunteers at once such as in the Movielens dataset [53]) and thus to ensure the authenticity of user's preference evolution, we have specially selected three well-known datasets based on real timestamps, MovieTweetings-100k, MovieTweetings-latest [13], and DigitalMusic [14] to conduct the experiment. MovieTweetings is an up-to-date dataset that collects all tweets from Twitter having the format "*I rated #IMDB*". Originally, the MovieTweetings-100k contained 16,554 unique users and 10,506 unique movies that were rated from 28/02/2013 to 01/09/2013. Likewise, the MovieTweetings-latest contains 68,332 unique users and 35,931 unique items that were rated from 28/02/2013 to 10/07/2020. All of the movie ratings are scaled from 1 to 10, which will be re-scaled to 1 to 5 in this paper. The DigitalMusic dataset contains reviews and metadata from Amazon. It contains 478,235 unique users and 266,414 unique digital music that were rated from 20/01/1998 to 23/07/2014. All of the music ratings are scaled from 1 to 5. The description of these three datasets is as shown in Table 2.

Evaluation Metrics
Based on the common metrics of rating prediction in the recommendation system, this paper uses the mean absolute error (MAE) and the root mean squared error (RMSE), as metrics to evaluate how well we predict. They are defined as follows.

Recommendation Quality
First, in Section 3.2, we conducted the experiment on datasets with the training-testing ratio 90-10% to find the appropriate dimension for embedding vectors. For the movie dataset, as shown in Figure 7a, the lowest RMSE occurred when the dimension of embedding vectors was between 50 and 100, which does not depend on the experimental algorithm to be evaluated. Meanwhile, considering the number of dimensions of a movie's own attributes, such as classes, user preferences and so on, we chose 64 to be the embedding's dimension for movies. Similarly, as shown in Figure 7b, we chose 32 to be the embedding vector's dimension for music.

Evaluation Metrics
Based on the common metrics of rating prediction in the recommendation system, this paper uses the mean absolute error (MAE) and the root mean squared error (RMSE), as metrics to evaluate how well we predict. They are defined as follows.

Recommendation Quality
First, in Section 3.2, we conducted the experiment on datasets with the training-testing ratio 90-10% to find the appropriate dimension for embedding vectors. For the movie dataset, as shown in Figure 7a, the lowest RMSE occurred when the dimension of embedding vectors was between 50 and 100, which does not depend on the experimental algorithm to be evaluated. Meanwhile, considering the number of dimensions of a movie's own attributes, such as classes, user preferences and so on, we chose 64 to be the embedding's dimension for movies. Similarly, as shown in Figure 7b, we chose 32 to be the embedding vector's dimension for music. Moreover, the second experiment was conducted to discuss the selection of K-neighbors for RICF. The effects of K-neighbors in RICF are compared with that in traditional itemCF, and the experiment results are shown in Figure 8. We found that only a few Kneighbors are required to achieve good results in RICF and we speculate that the reason is related to the embedding model trained to optimize based on the rating target, because Moreover, the second experiment was conducted to discuss the selection of K-neighbors for RICF. The effects of K-neighbors in RICF are compared with that in traditional itemCF, and the experiment results are shown in Figure 8. We found that only a few K-neighbors are required to achieve good results in RICF and we speculate that the reason is related to the embedding model trained to optimize based on the rating target, because embeddings contain the latent information about the requirement of optimal rating-prediction, so that the more similar the neighbor, the greater the contribution to the rating prediction. Thus, this experiment proved that embedding trained using the rating as the optimization target is able to reduce the number of K-neighbors and hence reduce the computation of the collaborative part. Finally, we chose K = 10 (the minimum K value defined in this paper) as the number of neighbors in the RICF algorithm.
ci. 2021, 11, x FOR PEER REVIEW 13 of 20 embeddings contain the latent information about the requirement of optimal rating-prediction, so that the more similar the neighbor, the greater the contribution to the rating prediction. Thus, this experiment proved that embedding trained using the rating as the optimization target is able to reduce the number of K-neighbors and hence reduce the computation of the collaborative part. Finally, we chose K = 10 (the minimum K value defined in this paper) as the number of neighbors in the RICF algorithm. Most notably, this paper investigates the effects of different decay functions in the RICF as well as the comparisons between the traditional itemCF and the state-of-art sequential model that deals with temporal information. In addition, to better explore the process of user preference decay with inhibition, we selected four different ways to model the decay of preferences: (1) without RI effects (without-RI), (2) Power function: obviously decay "quickly then slowly", (3) Exponential function: no obvious "quickly then slowly", (4) Parabolic function: as a contrast, it is clear that decay "slowly then quickly". Finally, the main experiment results on the two datasets are seen in Tables 3-5.  Most notably, this paper investigates the effects of different decay functions in the RICF as well as the comparisons between the traditional itemCF and the state-of-art sequential model that deals with temporal information. In addition, to better explore the process of user preference decay with inhibition, we selected four different ways to model the decay of preferences: (1) without RI effects (without-RI), (2) Power function: obviously decay "quickly then slowly", (3) Exponential function: no obvious "quickly then slowly", (4) Parabolic function: as a contrast, it is clear that decay "slowly then quickly". Finally, the main experiment results on the two datasets are seen in Tables 3-5.                1) The RICF algorithm has lower MAE and RMSE than traditional itemCF algorithms, the state-of-art sequential models LSTM and GRU on three datasets used in the experiment. (2) Power-RICF has the lowest MAE and RMSE among the three RICF algorithms. Taking Figure 6 into consideration, the Power function can better fit the process of forgetting in contrast to the Exponential function and Parabolic function, and also it is in tune with the character of preference decay that "first quickly and then slowly".

Embedding Clustering Visualization
For the trained multi-dimensional embeddings, we applied t-SNE (t-Distributed Stochastic Neighbor Embedding) to perform dimensional visualization. The t-SNE is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets [54]. According to the previous experiment on the Canopy+Kmeans algorithm, we clustered the embeddings, as shown in Figure 12, and each color in it represents an embedding cluster. Different clusters represent different user preferences. From the visualization of this figure, we can distinguish the categories of embeddings more clearly. For example, in Figure 12a, the bar on the right shows seventeen colors, representing seventeen different clusters. The coordinate system on the left shows the clustering results in three dimensions. It can be seen that the similar embeddings are clearly clustered, and the boundaries between different clusters are relatively clear.  (1) The RICF algorithm has lower MAE and RMSE than traditional itemCF algorithms, the state-of-art sequential models LSTM and GRU on three datasets used in the experiment. (2) Power-RICF has the lowest MAE and RMSE among the three RICF algorithms. Taking Figure 6 into consideration, the Power function can better fit the process of forgetting in contrast to the Exponential function and Parabolic function, and also it is in tune with the character of preference decay that "first quickly and then slowly".

Embedding Clustering Visualization
For the trained multi-dimensional embeddings, we applied t-SNE (t-Distributed Stochastic Neighbor Embedding) to perform dimensional visualization. The t-SNE is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets [54]. According to the previous experiment on the Canopy+Kmeans algorithm, we clustered the embeddings, as shown in Figure 12, and each color in it represents an embedding cluster. Different clusters represent different user preferences. From the visualization of this figure, we can distinguish the categories of embeddings more clearly. For example, in Figure 12a, the bar on the right shows seventeen colors, representing seventeen different clusters. The coordinate system on the left shows the clustering results in three dimensions. It can be seen that the similar embeddings are clearly clustered, and the boundaries between different clusters are relatively clear.

The stability of RICF
Furthermore, to verify the stability of the RICF algorithm, we separately experimented on three datasets and at the same time, varied the ratio of the training-testing portions from 50% to 90% to observe the results. As can be seen in the Figures 13-15, Power-RICF has consistently better performance compared to other experimental algorithms.

The stability of RICF
Furthermore, to verify the stability of the RICF algorithm, we separately experimented on three datasets and at the same time, varied the ratio of the training-testing portions from 50% to 90% to observe the results. As can be seen in the Figures 13-15, Power-RICF has consistently better performance compared to other experimental algorithms.   As is shown in Figure 13, the x-axis represents different ratios of trainset-testset for MovieTweetings-latest dataset; the y-axis represents the root mean squared error (RMSE) of different algorithms. The left plot shows how the RMSE of the seven different algorithms varies with the ratio of trainset-testset. The plot on the right is a partial enlargement of the plot on the left. As can be seen from Figure 13, the RMSEs of the three RICF algorithms are lower than other algorithms and Power-RICF's is the lowest. Moreover, for different ratios of trainset-testset, the RMSEs of the RICF algorithms are stable around 0.63, which are the most stable among the seven algorithms. Figures 14 and 15 show the RMSE of MovieTweetings-100k dataset and DigitalMusic dataset, respectively. Conclusions obtained from Figures 13 and 14 are similar to those from Figure 12. Therefore, for different datasets, the RICF algorithm has better stability and performance than other algorithms, especially Power-RICF.   As is shown in Figure 13, the x-axis represents different ratios of trainset-testset for MovieTweetings-latest dataset; the y-axis represents the root mean squared error (RMSE) of different algorithms. The left plot shows how the RMSE of the seven different algorithms varies with the ratio of trainset-testset. The plot on the right is a partial enlargement of the plot on the left. As can be seen from Figure 13, the RMSEs of the three RICF algorithms are lower than other algorithms and Power-RICF's is the lowest. Moreover, for different ratios of trainset-testset, the RMSEs of the RICF algorithms are stable around 0.63, which are the most stable among the seven algorithms. Figures 14 and 15 show the RMSE of MovieTweetings-100k dataset and DigitalMusic dataset, respectively. Conclusions obtained from Figures 13 and 14 are similar to those from Figure 12. Therefore, for different datasets, the RICF algorithm has better stability and performance than other algorithms, especially Power-RICF.   As is shown in Figure 13, the x-axis represents different ratios of trainset-testset for MovieTweetings-latest dataset; the y-axis represents the root mean squared error (RMSE) of different algorithms. The left plot shows how the RMSE of the seven different algorithms varies with the ratio of trainset-testset. The plot on the right is a partial enlargement of the plot on the left. As can be seen from Figure 13, the RMSEs of the three RICF algorithms are lower than other algorithms and Power-RICF's is the lowest. Moreover, for different ratios of trainset-testset, the RMSEs of the RICF algorithms are stable around 0.63, which are the most stable among the seven algorithms. Figures 14 and 15 show the RMSE of MovieTweetings-100k dataset and DigitalMusic dataset, respectively. Conclusions obtained from Figures 13 and 14 are similar to those from Figure 12. Therefore, for different datasets, the RICF algorithm has better stability and performance than other algorithms, especially Power-RICF. As is shown in Figure 13, the x-axis represents different ratios of trainset-testset for MovieTweetings-latest dataset; the y-axis represents the root mean squared error (RMSE) of different algorithms. The left plot shows how the RMSE of the seven different algorithms varies with the ratio of trainset-testset. The plot on the right is a partial enlargement of the plot on the left. As can be seen from Figure 13, the RMSEs of the three RICF algorithms are lower than other algorithms and Power-RICF's is the lowest. Moreover, for different ratios of trainset-testset, the RMSEs of the RICF algorithms are stable around 0.63, which are the most stable among the seven algorithms. Figures 14 and 15 show the RMSE of MovieTweetings-100k dataset and DigitalMusic dataset, respectively. Conclusions obtained from Figures 13 and 14 are similar to those from Figure 12. Therefore, for different datasets, the RICF algorithm has better stability and performance than other algorithms, especially Power-RICF.

Discussion
Currently, there are few studies applying the theory of memory inhibition to Computer Science [11], and studies on Cognitive Psychology have mainly focused on activation propagation, and evolution of preferences from the temporal perspective only, ignoring the competition and inhibition within memories. Thus, in order to fill this gap, we firstly conducted the classical rating-prediction experiment on the traditional itemCF algorithm, and introduced the theory of memory inhibition to explore the evolution of users' preferences. Secondly, to better take into account the multi-semantic information, we introduced the embedding pre-training technique on the traditional itemCF and used the more efficient Canopy+K-Means algorithm to cluster the multidimensional embeddings to construct the user preference points model, so as to simulate the process of user preference decay and build recommendation models more comprehensively and accurately.
The experiment results show that the memory decay based on retroactive inhibition is consistent with existing memory decay processes (decay first quickly and then slowly) [10,55], and the incorporation of a strong representation of embedding makes the recommendation mechanism more interpretable and has higher prediction accuracy in comparison to traditional itemCF and sequential models. What is more, only fewer neighbors yielded good results when using embeddings to compute similarity and select K neighbors accordingly, unlike traditional CF algorithms that are more reliant on K neighbor selection. Here, the embeddings are trained from a model with rating as an optimization target. We speculate that is because the trained embedding implicitly contains rating-optimized information, which is worthy of further research.
Here, we focused on the problem of rating prediction accuracy of the recommendation algorithm based on retroactive inhibition. We found that in terms of rating prediction accuracy, deep learning-based baseline algorithms perform worse than other baseline algorithms. Similar to most other modern recommendation algorithms, sequence algorithms are not designed for the rating prediction problem, but rather serve to perform the recommendation prediction task better. Therefore, as a future work, we plan to use the proposed algorithm to verify the performance of other recommendation tasks such as prediction ranking.

Conclusions
This paper proposed a novel approach called RICF to explore the evolution of user preference based on the theory of retroactive inhibition in cognitive psychology. In RICF, to tackle the problem of data sparsity, each item is represented as a dense numerical vector by training a feedforward deep neural network to predict user preferences for items. Moreover, the Canopy+K-Means clustering algorithm was used to more efficiently cluster multidimensional embedding vectors, and the results of cluster were used to construct a model of users' points of preference. Evaluation experiments were conducted using three datasets which reflect user's real rating timestamp (rather than volunteers' pooled ratings), and the results indicate that the proposed algorithm is better at exploring the evolution of user preference with better accuracy and interpretability. Furthermore, the proposed approach has produced better performance than state-of-the-art techniques on both accuracy and novelty.
Further work consists of three directions. The first direction is to use more sequential datasets to further validate the RICF algorithm. The second direction is to incorporate more diverse information and even the Knowledge Graph technique to train more accurate embedding vectors. The third direction is to explore deeper into the mechanisms of memory inhibition to provide inspiration for sequential modeling algorithms in the field of Deep Learning as well as for the study of other recommendation tasks such as prediction ranking.
Author Contributions: The conceptualization of the original idea, formal analysis, and the performance of the experiment, as well as the original draft preparation, were completed by N.Y. Review and editing were completed by L.C. and Y.Y. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant number 91118002.