Re-Enrichment Learning: Metadata Saliency for the Evolutive Personalization of a Recommender System

: Many studies have been conducted on recommender systems in both the academic and industrial ﬁelds, as they are currently broadly used in various digital platforms to make personalized suggestions. Despite the improvement in the accuracy of recommenders, the diversity of interest areas recommended to a user tends to be reduced, and the sparsity of explicit feedback from users has been an important issue for making progress in recommender systems. In this paper, we introduce a novel approach, namely re-enrichment learning, which effectively leverages the implicit logged feedback from users to enhance user retention in a platform by enriching their interest areas. The approach consists of (i) graph-based domain transfer and (ii) metadata saliency, which (i) ﬁnd an adaptive and collaborative domain representing the relations among many users’ metadata and (ii) extract attentional features from a user’s implicit logged feedback, respectively. The experimental results show that our proposed approach has a better capacity to enrich the diversity of interests of a user by means of implicit feedback and to help recommender systems achieve more balanced personalization. Our approach, ﬁnally, helps recommenders improve user retention, i.e., encouraging users to click more items or dwell longer on the platform.


Introduction
Online contents and services have been rapidly growing in recent decades. To help users make decisions when faced with overwhelming options and to achieve commercial success in making profit, both academic and industrial research on finding specific sets of items that meet the personalized interests of users has been intensely conducted. For this purpose, recommender systems have been suggested and designed that are based on a user's history, including whether the user has bought an item, which ratings the user has given to items, etc. Previous studies have reported that recommendations account for 60% of the clicks on the main screen of Youtube [1], 75% of what people watch through Netflix [2], and 35% of the sales through Amazon [3].
Recommendation is obviously not an easy task. A recommender system needs to provide a personalized user experience with long-term satisfaction under the condition that users' preferences can change over time [4]. Behaviors based on the preferences of users could be either positively or negatively influenced by recommendation results. Traditionally, the recommendation problem has been considered as equivalent to the problem of precisely predicting the rating that a user would leave on an item. However, recent academic studies and industrial providers have both emphasized that one of the main measurement targets is improved user retention [5,6]. In spite of the gains in theoretical accuracy, it was unclear that the winning strategy in accuracy would result always in increasing business value [7].
Still, recommender systems are mostly optimized toward the accuracy of predictions of item ratings, and are likely to gradually narrow down the interest area of a user-the so-called the diversity problem [8][9][10][11]. That is to say, recommendations where the lesser interests get squeezed out by the main interest ironically help the system minimize prediction errors. This not only impedes the diversity of experiences of users, but also causes filter bubbles [12,13], e.g., a reduced spectrum of user consumption and the political bias, which limit discovery or neglect the potential for promoting new items from the long tail [14,15]. Moreover, as the training of a recommender system mostly relies on explicit feedback (typically the ratings of items), it is inevitable to face the fact that only a small portion of users leave their ratings-the so-called the sparsity problem [16,17].
In this paper, we propose a novel data interpretation approach, which includes graphbased domain transfer and metadata saliency, for recommender systems, and show its effective application for a personalized recommender, namely re-enrichment learning. To address the problems of diversity and sparsity, the goal of re-enrichment learning is to find conspicuous features of the implicit logged feedback [18,19] that tell the behavior of a user, e.g., how frequently the user clicks on a certain item or how long the user stays on the platform, and to promote immanently correlated recommendations. Unlike explicit feedback, implicit logged feedback is relatively easier to spontaneously collect from users. Metadata saliency describes the confidence of a recommendation in the transferred domain, which reflects the graph-based relationships among items. It is generally known that the visual saliency represents the position-based visual preference [20]. As such, methods for predicting eye fixation maps on websites or mobile interfaces [21][22][23] have been studied, and the relation between the probability of the movement of a mouse pointer and the eye fixation has been discussed [24]. Inspired by those works, metadata saliency is a novel approach to interest fixation for distinguishing a certain set of items from the others. Based on the establishment of links between one group of items and their closely related ones, re-enrichment learning is used to train a recommender system based on the metadata saliency, thus enriching the potential interest areas of a user rather than narrowing them down. The experimental results show that our approach is effective in interpreting implicit logged feedback and helpful in persuading a user to click more or stay longer on a commercial platform.
The organization of the rest of this paper is as follows: Section 2 presents a review of the related work, Section 3 describes our proposed approach in detail, and Section 4 provides the comparative experiments and the discussion. Finally, our conclusion is given in Section 5.

Background
Recommender systems can be largely classified into three categories: the collaborative approach, content-based approach, and hybrid approach [16,17,25]. Firstly, the collaborative approach is based on the idea that people with similar preferences are likely to agree on an evaluation of an item. This approach consists of neighborhood-based methods, which directly use the users' item ratings to estimate the classification of a new item, and modelbased methods, which use the ratings to acquire knowledge and teach a predictive model. The advantage of this approach is that new data can easily be added in an incremental manner, whereas its weakness to the cold-start problem of new users and the sparsity problem are its disadvantages. Koren et al. (SVD) [26] suggested a singular value decomposition (SVD)-based matrix factorization algorithm that uses stochastic gradient descent for optimization. Rendle et al. (Bayesian personalized ranking (BPR)) [27] introduced a generic optimization criterion and learning algorithm for personalizing a recommender system. The optimization criterion is based on the maximum posterior estimator derived from Bayesian analysis, while the learning algorithm is based on stochastic gradient descent with bootstrap sampling. He et al. (L-GCN) [28] adopted the neighborhood aggregation of a graph convolution network (GCN) [29] and proposed learning user and item embeddings by linearly propagating them with respect to the interaction graph of the user and item.
They used the weighted summation of the embeddings, which are learned at all layers, as in the final stage.
Then, the content-based approach is based on the combination of a user's profile and an item description. In other words, the similarity between the items that the user has liked in the past and the detailed content of an item play important roles. This approach shows strength when recommending an item that is not yet rated by any user, whereas it suffers from the overspecialization problem. Wang et al. (deep knowledge-aware network (DKN)) [30] proposed a multi-channel and word-entity-aligned knowledge-aware convolutional neural network for semantic and knowledge-based representation of news as well as an attention module to dynamically compute the aggregated historical representation of a user. Wu et al. (neural news recommendation with personalized attention (NPA)) [31] suggested personalized convolutional neural networks that adopt the embedding of user IDs as the queries of the correlations between words and news. The meta-context dimension tree (Meta CDT) was proposed for the selection of the most suitable contents and services for a user in a certain context, and was used in a practical context-aware application by Colace et al. [32] and Casillo et al. [33]. The approach has strength in tailoring the information domain according to the user's needs as well as in analyzing relevant features of context models.
Lastly, there also are hybrid approaches that combine the collaborative and contentbased approaches to exploit the advantages of the individual approaches [16,34]. Many types have been proposed, such as by aggregating the predictions of collaborative and content-based approaches or integrating one approach's characteristics into the other. Kula  Many of the recent studies have discussed the problem that unbalanced or biased recommendations can shrink the diversity of interests of a user. Practical evaluation scenarios and techniques have been introduced to produce unbiased estimators in spite of biased or missing data [37,38]. Research has been carried out on adopting the relationships among items or entities as well as among users [39]. For instance, each user has an individual graph-based representation for their explicit feedback of ratings, and the graph can provide insights into how good use of the hidden information can be made in terms of personalized recommendations. Still, most of the previous approaches have difficulty with the aforementioned sparsity and diversity problems. For the sparsity problem, we need to further study how to incorporate the naturally obtainable feedback information from users. In addition, there must be many alternatives to alleviate the diversity problem resulting from unbalanced recommendations.

Re-Enrichment Learning
The two main objectives of re-enrichment learning are literally to enrich the interest areas recommended to a user and to achieve them by extracting meaningful features from naturally acquirable feedback data. In other words, re-enrichment learning is directly designed to tackle the diversity and sparsity problems of a recommender system. Our proposed approach is composed of two cores: graph-based domain transfer and metadata saliency. The domain of metadata saliency is determined by the graph-based domain transfer, while the graph itself can be changed recursively by the metadata saliency from new feedback. To be more concrete, every implicit logged feedback of a user causes a change in the user's metadata saliency; the update in the metadata saliency is promptly reflected in the user's graph description, as well as in the universal domain determined from a large set of users; the new environment of graph again recursively influences the next feedback of the user.

Graph-Based Domain Transfer
An undirected weighted graph is assigned to every user in order to build a data structure that consists of item categories as the finite set of nodes and the similarities between categories as the finite set of edges.
As noted in Equations (1)-(3), the graph of the k th user among N users at time t, G k,t , is made up of M nodes V k,t with their attributes v k,t and M C 2 edges E k,t with their attributes e k,t . The edge attribute between the two nodes v k,t (i) and v k,t (j) is e k,t (i, j), which can be computed as in Equation (4), where i, j ∈ {1, ..., M}. The attribute of an edge represents the combination of the influences and similarity of two nodes. A node is determined to be influential when its attribute is large, and two nodes are determined to be similar when they share similar levels of attributes to that of the same user. Here, the attribute of a node is the metadata saliency, which represents the attentional prominence of the node from the entire set of nodes. Equation (4) is composed of two terms: (i) the likelihood for the influence of a node and (ii) the weight for the similarity between two nodes. A softmax function has been adopted in the first term to produce the influence of a node as a probabilistic value within [0, 1] (see Equations (5) and (6)), where h ∈ {1, ..., M}, and α is the influence factor, which determines the degree of importance of an individual feedback (s.t. α ≥ 1). In the second term, the inversed substitution between two nodes' attributes is adopted to produce the similarity as a weight value (see Equations (7) and (8)). The attributes of nodes have been normalized to remove scale dependency.
Then, the edges can be denoted by an |V k,t | × |V k,t | matrix, i.e., adjacency matrix E k,t = e k,t (i, j) [40], where each element indicates the attribute of an edge, and the elements can be stored in a triangular matrix as e k,t (i, j) = e k,t (j, i) (as shown in Figure 1). Finally, the attributes of nodes and edges, as well as the adjacent matrix that allows their sequential representation, are able to respond to every feedback event.
, (2,1) , (2,2) , (3,1) , (3,2) , (3,3) 0 In the end, a universal domain D t at time t can be determined based on the graphs of N users, where D t (h) is the h th domain in which M nodes are sorted by the level of the attribute of the edge connected to the h th node (see Equation (9)). That is to say, in D t (h), the h th node itself appears first, the node with the second largest attribute of the edge connected to the h th node appears second, and so on. Understandably, the domain changes over time as the individual graphs vary-e.g., with fluctuating attributes, newly inserted nodes starting with zero-level attributes, etc. Empirically, two nodes that have large attributes and share the edge of a large attribute become adjacent neighbors in the transferred domain. Here, being neighbors has dire consequences with respect to metadata saliency.

Metadata Saliency
A user's behavioral history can be used not only as primary cumulative data, but also as information from which secondary features are extracted. Metadata saliency is one of the secondary features that can play an important role, especially in leveraging implicit logged feedback data. A few interesting items usually account for the majority of interest fixation, and they can be quantified by extracting metadata saliency from the logged feedback. Among the various types of feedback, we focus on the ones that contain the retention data of a user, and we start with the simplest type: the clicks on recommended items. The metadata saliency of the i th d category in the transferred domain at time t, v k,t (i d ), is computed based on the i * th d clicked category, as in Equation (10). A Gaussian fixation model has been adopted [21,24,41,42], of which the average and standard deviation are i * and 1, respectively, in order to reinforce the metadata saliency of the i th d category, but also to enrich its nearest neighbors with respect to the transferred domain. The Gaussian fixation is also scaled based on the metadata saliency of the i th d category at time t − 1, v k,t−1 (i d ), and the influence factor α (see Equation (11)). The very first metadata saliency, v k,0 , can be set by a preference survey for new users for the cold start (discussed in Section 4.2.1). Otherwise, for users who refuse the survey, the average of the metadata saliencies of many random users can be an alternative for v k,0 (discussed in Section 4. Suppose that a graph has been built to represent the metadata saliencies of movie genres for a user (see Figure 2). A node is presented as a circle, of which the radius indicates its attribute level. An edge is drawn as a line, of which the thickness is proportional to its attribute level. The user must mostly have preferences for Action movies, as well as Sci-Fi movies, because the two nodes corresponding to those genres are the two biggest circles, as shown in Figure 2. Provided that the user clicked an item that belongs to the Action category at time t, not only does the metadata saliency of Action node get larger at time t + 1, but that of the Sci-Fi node also benefits from it to a certain extent due to the connection of the edge to the two nodes and the Gaussian fixation effect. Moreover, the Fantasy and Adventure nodes also deserve a small benefit from the feedback. Because of the new feedback, the attributes of the nodes and edges change simultaneously and incrementally create a new condition in the graph representation. As such, the metadata saliency naturally changes over time by means of the newly collected feedback. The change in metadata saliency is promptly reflected in a user's graph and, by extension, also in the universal domain, which is built upon many users' graphs. Consequently, the evolutive characteristic based on the mutual interaction in the graph makes re-enrichment learning prompt and flexible.

Summary
In summary, we provide Figure 3 to depict a diagram that connects all the steps of the process, and the process can be listed as follows: • Step 1: Obtain the user's implicit logged feedback i * d at time t from the recommender. • Step 2: Update nodes V k,t−1 → V k,t : Calculate node attribute v k,t (i d ) at time t using the universal domain D t−1 and the node attribute v k,t−1 (i d ) at time t − 1 (as shown in Equation (10)). • Step 3: Update edges E k,t−1 → E k,t : Calculate the edge attribute e k,t (i, j) at time t using the node attribute v k,t (i d ) at time t (as in Equation (4), which consists of the two following terms).
-Compute the weight for the similarity between two nodes: 1 − v k,t (i, j) (see Equation (7)).

•
Step 4: Update the universal domain D t−1 → D t (as in Equation (9)): Sort domain D t (h) by edge attribute. • Step 5: Apply the node attribute v k,t (i d ) at time t, i.e., the metadata saliency, to the recommender.
Furthermore, we provide a pseudocode to effectively present how the different steps of our proposed approach are organized (see Table 1).  Figure 3. The process of re-enrichment learning.

Evaluation
Learning from implicit user feedback, e.g., click and dwell time, has been an important factor in improving recommender systems [18]. The common implicit retention metric [19], the click-through rate (CTR) [6,[46][47][48][49], was adopted to intuitively and directly evaluate the effectiveness of the methods for user satisfaction or retention on digital platforms. Specifically, the metric was used to measure the ratio of clicks to recommendations, as in Equation (12).

CTR = Number of click-throughs Number of recommendations
× 100 (%) (12) Furthermore, the metric of mean average precision [50], MAP@K, supports the consistency of an experiment. This metric indicates how many items a user engages with among those recommended to them, as CTR does, but also uses the sequence of feedback (see Equation (13), where S indicates the number of samples). To begin with, average precision, AP@K, is defined as the summation of the engaged precision values divided by the number of engagements, m, where P(i) is the precision at i and δ(i) is a bivariate function for engagement (as in Equations (14)- (16)).
A/B experiments [5,19,51] were conducted to verify the difference between applying and not applying re-enrichment learning to baseline methods. Five baseline methods are used for comparison: SVD [26], BPR [27], L-GCN [28], L-FM [35], and W&D [36]. The baseline methods are either collaborative or hybrid approaches because our proposed method treats the category preferences of users, e.g., movie genre and product category, rather than the internal content of an item, e.g., the words in a news article. We collected data from 32 participants, who clicked on all the items that seemed to be interesting. For fairness, the participants were informed of which method was used in each experiment. The time required to finish one sequence of collecting implicit feedback ranged from 10 to 30 min depending on the participant. The sequence of collecting feedback was repeated four times for every participant. A total of 4532 feedbacks were collected for the MovieLens dataset and 4830 feedbacks were collected for Amazon dataset were collected.
Lastly, a non-parametric statistical hypothesis test, Wilcoxon test [52], was performed to check the significance of improvements. We adopted a significance level α of 0.05, which indicates a 5% risk of concluding that a difference exists when there is no actual difference. The interpretation of a result in the Wilcoxon test is that the null hypothesis, "The population median (η) equals the hypothesized median (η 0 )", is wrong when the p-value is smaller than the significance level (i.e., the difference is significant).

Configuration
The baseline methods were applied with the settings of default parameters given in their open sources. Based on the scaling property of signal processing, the influence factor α in Equation (4), which operates as the scaling factor of the softmax function, was identically set to 4 in all cases. This parameter determines the agility of re-enrichment learning in adapting to recent short-term feedback.

Recommending Movies
At the beginning, users were asked to respond to a survey on their preferred categories as a cold start. Otherwise, users could also skip it to simply use the given averaged preference model built upon the feedback collected from other previous users. When responding to the survey, the selected categories directly created an initial metadata saliency. Then, a set of 10 movies, with their images and titles, was recommended to a user at a time, and the user was allowed to click on any items they were interested in. The user could also skip the entire set and go on to the next set. Table 2 shows our A/B experiments, and it compares the results of all baseline comparison methods with respect to CTR and MAP@K for the MovieLens dataset. As demonstrated in Figure 6, our proposed re-enrichment learning helps baseline methods improve the resulting retention of users by providing a feature that enriches the interest areas of the user instead of overspecializing them. Compared to the intuitive metric, CTR, MAP@K additionally reflects the order in which the user engages with items within a recommended set, so we can infer that our approach has nothing to do with recommending more attractive items in the front. Still, the results of CTR and MAP@K show a consistent tendency for improvement. In addition, Table 3 presents the results of the Wilcoxon test which essentially calculates the difference between sets of paired samples and analyzes these differences to establish if they are statistically significantly different from one another. The resulting p-values in all cases were much smaller than the significance level α, which allows the conclusion that the differences between the population distribution and the hypothesized distribution are statistically significant.   This observation is supported by Figure 7, which shows an example comparing the initial metadata saliency of a user to the metadata saliency variations over 30 implicit feedbacks. The x-axis presents the node of a graph (i.e., movie genre), while the y-axis shows the attribute of a node (i.e., the level of metadata saliency). As can be observed in Figure 7a, the user started with two large saliencies at Action and Sci-Fi. Despite the fact that the Action and Sci-Fi genres accounted for the majority of the 30 feedbacks and the two were still ranked in the two largest saliencies, the metadata saliencies of other genres (Adventure and Thriller) prominently increased because they had relatively large edge attributes with Action and Sci-Fi (see the thicker lines in Figure 7a). We depict all the edges connected to Action as red-colored lines and all the edges connected to Sci-Fi as blue-colored lines, which show that Adventure and Thriller are fairly interrelated with Action and Sci-Fi with respect to the thickness of the lines. It is worth observing that Adventure and Thriller were encouraged to be recommended, and were consequently selected quite a few times.  Collaborative approaches find new items in which a particular user is most likely to be interested by modeling the task as a regression or classification using the rating data. By being provided with a recommendation of an item that satisfies them, the user is likely to engage more with the platform. By replacing the rating with metadata saliency, the retention results of collaborative approaches were increased. This observation directly shows that users spent more time exploring the enriched interest areas. In addition, only small subsets of movies from the available set of movies were rated by users [16], which made it difficult to successively collect explicit cues for training. In contrast, metadata saliency is based on implicit logged feedback, which can be naturally and continuously obtained. Moreover, hybrid approaches already merge different techniques or features to avoid limitations, but they also can be further assisted by re-enrichment learning. Based on our graph-based domain transfer, the relations between the categories were effectively incorporated, and that allowed the metadata saliency to emphasize the salient interest areas of the user, as well as the interest areas implicitly linked to them. As a result, applying re-enrichment learning led to significantly improved click-through rates for the hybrid approaches. L-FM is based on the latent representation approach, which aims to learn user and item representations from the interaction data [35], while the wide linear model of W&D aims to memorize sparse feature interactions based on cross-product feature transformation [36]. These characteristics allow L-FM and W&D to be run together with graph-based domain transfer and metadata saliency.

Recommending Goods
Likewise, the preference survey was followed by a set of 10 recommendations, and users could click on any items they wished. Experiments were carried out using the Amazon dataset [44,45], which contains Amazon products' review data as well as the products' metadata. Table 4 reports the results of our A/B experiments, and Figure 8 demonstrates their visual comparisons. We observed that the methods with re-enrichment learning outperformed their baselines, generally producing increased CTR and MAP@K. The results of the Wilcoxon test on the Amazon dataset show more or less the same tendency as those of the MovieLens dataset. As shown in Table 5, the p-values were fairly smaller in all cases than the significance level α, which means that the differences between A and B were significant.   Figure 9 demonstrates an example of the variations in metadata saliency over 30 feedbacks by a user. This example is interesting because the user did not participate in the preference survey at the beginning. As such, the average metadata saliency from 70M previous feedbacks was used for the initial metadata saliency (see Figure 9a). The two categories, Digital music and Gift cards, initially obtained relatively large metadata saliencies. On the other hand, the user mostly selected Appliances, Cell phones and accessories, Industrial and scientific, and especially Sports and outdoors. Consequently, the metadata saliency levels of those categories increased during the latter feedback (see Figure 9b). The rest of the categories moved opposite to the most selected ones, yet the saliency levels, of which the edge attribute was large (see the thicker lines in Figure 9a), were rather slightly raised.
Re-enrichment learning noticeably raised the click-through rates of the collaborative approaches. The advantage of collaborative approaches is that adding new data in an incremental manner is relatively easy, and re-enrichment learning is quite well matched with this advantage. It is easy to add a new node and its edges, starting with initial attributes set to zero. Furthermore, our proposed approach supplements the weakness of collaborative approaches when very few explicit feedbacks, i.e., ratings, are limitedly collectable from users. Furthermore, we observed that hybrid approaches with re-enrichment learning also outperformed their baselines. Re-enrichment learning effectively exploits the implicit logged feedback information and led to better click-through rate results. The observation from Figure 9 intuitively conveys that the interest areas that have close relations in the graph-based transferred domain to the areas selected by a user were nourished due to re-enrichment learning. In consequence, the recommendations based on re-enrichment learning encouraged users to consume more content.

Conclusions
In this paper, we introduced a novel approach that effectively exploits the implicit logged feedback from users to enrich their potential interest areas and to increase user retention in recommender systems for digital platforms. Our approach includes graphbased domain transfer and metadata saliency, which are incorporated to operate as reenrichment learning. A universal domain was built based on the graph representation of item categories and their interrelations as nodes and edges, respectively. While capturing the attentional prominence of a node, metadata saliency also confers benefits on the nearest neighbors of the node in the universal domain. Every implicit logged feedback of a user causes a change in the user's metadata saliency; the update in the metadata saliency is promptly reflected in the user's graph description, as well as in the universal domain determined by a number of users; the new environment of graph again recursively influences the next feedback of the user. The eventual goal is the improvement of user retention, rather than accurately predicting ratings, as there are many better ways to help people find interesting items than focusing only on those with high predictions of ratings.
It should be pointed out that re-enrichment learning shows a constant tendency of internal and external advantages. Internally, it has a better capacity to enrich the diversity of possible interest areas of users and to help recommender systems achieve more balanced personalization. In addition, to address the issue of sparsity of explicit feedback, it extracts the immanent features from implicit logged feedback, which is much more naturally collectable. Externally, it derives an intuitive interpretation of the relation between the recommender system and profitability. In other words, it helps find an industrial value as well as a better solution that makes users click on more items or dwell longer on platforms.
In future work, we plan to extend this approach both to create other types of saliency in metadata and to fertilize the graph representation by incorporating various user information, e.g., gender, age, and occupation. We believe re-enrichment learning to be promising for industrial applications.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: