Priors for Diversity and Novelty on Neural Recommender Systems †

: PRIN is a neural based recommendation method that allows the incorporation of item prior information into the recommendation process. In this work we study how the system behaves in terms of novelty and diversity under different conﬁgurations of item prior probability estimations. Our results show the versatility of the framework and how its behavior can be adapted to the desired properties, whether accuracy is preferred or diversity and novelty are the desired properties, or how a balance can be achieved with the proper selection of prior estimations.


Introduction
In recent years, there has been a shift in the way users interact with the information services, from a proactive approach, where users looked actively for content, to one where users take a more passive role and content is suggested to them. Recommender Systems have played a pivotal role in this transformation. The advances in this field have driven increases in user engagement and revenue.
Among the properties of a recommender, accuracy is usually the more desirable one. However, there are other properties, such as novelty and diversity [1], that are also important and that, depending on the task, can be of higher priority. Diversity is the ability of the system to produce recommendations that include items from the whole catalog. This property is usually desirable for the vendors [2,3]. Novelty is the ability of the system to produce unexpected recommendations and is associated with serendipity, a property associated with higher user engagement and satisfaction [4]. These properties are related to accuracy insofar that increasing the later lowers the best achievable numbers in those metrics [5].
Recently a neural based recommendation method was presented, PRIN [6], that allows the incorporation of prior knowledge about the items into the recommendation process. This previous work was focused on the accuracy of the system. In this work we focus on studying the behavior of the system in terms of diversity and novelty and how the choice of prior information affects the performance of the system in these metrics.

Materials and Methods
We conducted a series of experiments in order to analyze the trade-off between accuracy, diversity and novelty when choosing different configurations of item prior probabilities with the PRIN recommender framework.

PRIN and Prior Probabilities of Items
PRIN [6] is a recommendation framework composed of two components: a neural model trained to predict the probability of a user given an item, p(u|i), and a graph model of the data where centrality measures are applied to calculate the prior probability distribution of the items, p(i). The objective of the system is to produce a ranking of items for a user. We cannot use p(u|i) to produce such ranking as p(u|i) and p(u|j) for two items i and j, i = j, are not comparable as they are in different event spaces. Bayes' rule can be used to obtain p(i|u), ∀i ∈ I, which is used to produce a ranking.
Previously reported results focused on the accuracy of the system [6]. In this work we analyze the behavior of the system with respect to diversity and novelty. We take the neural model that gets the best results in accuracy and explore how the system responds with different prior probability estimations. This showcases how the same model can be reused depending on the particular needs without the need to train a new model. We report the results obtained when using indegree, PageRank [7], Katz's Index [8], eigenvector, HITS [9] and closeness [10] centrality measures. We also report the result obtained when using an uniform prior and the complement of indegree. Lastly, we report the results of PRIN's dual model, PRN [6], that is trained to estimate p(i|u), which is used directly to produce a ranking.

Evaluation Protocol
We report our results on the MovieLens 20M dataset, a popular public dataset. It contains 20,000,263 ratings that 138,493 users gave to 26,744 items. We split the data into training and test sets, with 80% of the ratings of each user in the train set and the remaining 20% in the test set.
To evaluate the accuracy of the recommendations we used the Normalized Discounted Cumulative Gain (nDCG), using the standard formulation as described in [11], with ratings as graded relevance judgments. We only considered items with a rating of 4.0 or more as relevant during evaluation. We used the inverse of the Gini index [2] to assess diversity and mean self-information (MSI) [12] to evaluate novelty. All metrics are evaluated at a cut-off of 100 because of its better robustness to sparsity and popularity biases and greater discriminative power with respect to shallower cut-offs [13].

Results
Results for all the centrality measures and other prior estimations, including also the results for PRN, can be seen in Figure 1. The systems are ordered by their accuracy result, except for PRN, that is the last bar in the graphs.
In d e g r e e P a g e R a n k We can see that as accuracy decreases we get better results in novelty and diversity. An exception to this in the HITS centrality measure, that has the worst results in novelty. Uniform prior and complementary indegree obtain similar results in all the metrics. This can be explained by the long tail distribution of the items that leads to similar estimations for most of the items.
The results of PRIN dual model, PRN, are also noteworthy. It obtains competitive results in diversity and has the best result in novelty while having better accuracy than the systems with similar behavior in those metrics.

Discussion
We have shown how the properties of the recommendations produced by PRIN can be adjusted by the selection of the estimation of the prior probabilities of the items. This is done without the need to train the neural model with a different configuration of hyper-parameters. It should be possible to fine tune even more the trade-off between accuracy and novelty/diversity by tuning the hyper-parameters of the neural model independently, but this process is computationally intensive.
The behavior of the system when using HITS shows that while usually increases in diversity and novelty are paired, it is not always necessarily true. In this particular case we can see that the results in diversity are competitive with the best systems in this metric, while having greater accuracy. On the other hand, the results in terms of novelty are the worst of all the prior configurations.