How the Multiplicity of Suggested Information Affects the Behavior of a User in a Recommender System

: Many researchers have suggested improving the retention of a user in the digital platform using a recommender system. Recent studies show that there are many potential ways to assist users to ﬁnd interesting items, other than high-precision rating predictions. In this paper, we study how the diverse types of information suggested to a user can inﬂuence their behavior. The types have been divided into visual information, evaluative information, categorial information, and narrational information. Based on our experimental results, we analyze how different types of supplementary information affect the performance of a recommender in terms of encouraging users to click more items or spend more time in the digital platform.


Introduction
Recommender systems have been broadly used in providing a personalized user experience in various digital platforms for video streaming, online shopping, etc. Users face overwhelming options in a platform, and a recommender system helps users make decisions. Many studies have been conducted on finding recommendable items that can possibly satisfy a user with a set of certain interest areas. The studies not only have academic values but also have industrial values, as the satisfaction of a user usually results in the increased retention of the user, thereby increasing the profitability of a digital platform. In these days, most of the influential platforms rely on their recommendations, which account for many of the users' selections. A recommendation is based on how a user has behaved, for instance, clicking on items, watching contents, and buying goods. It has to consider the long-term satisfaction of a user under the condition that the user's tastes usually change over time [1]. It is important to point out that the recommendation task has been regarded as equivalent to predicting the rating of a user accurately. However, many of the recent practical studies focus on how to increase user retention [2,3]. This is because it is often not clear enough to what extent the precision measures of predicting ratings are correlated with the business success of a recommender [3].
The information used for recommending an item is usually stored in large databases which dynamically grow and build the source of knowledge regarding user behavior, so that recommendations of items can be suggested [4]. The decision of a user in digital platforms is likely to follow the cost-benefit theory on information presentation. This theory suggests that the strategy for a decision is easily adapted so that the joint cost of effort and errors in making a decision are minimized [5]. In general, the numerosity of different types of information tends to benefit certain sorts of options more than others and therefore has systematic effects on the choice of a user [6]. Zhang et al. [7] proposed Joint Representation Learning (JRL) as a general framework for a recommender by learning user-item representation in a unified space. The framework can incorporate different types of information, and its extendable version can integrate new types of information. Liu et al. [8] suggested a novel attention neural network which exploits an item's multimodal features to estimate the user's attention to various aspects of the item and eventually improve the accuracy of recommendations. They introduced a weight vector to each user-item pair, which uniquely describes the user's attention on the different aspects of an item. Chang et al. [9] proposed a novel aspect-aware latent factor model which first uses both textual reviews and images to learn preferences of users on different aspects and then integrates the learned preferences into a rating-based matrix factorization model for accurate rating prediction. Still, those works propose methods on how to incorporate different features for improving the accuracy of a recommender rather than studying the systematic effects of suggesting varying types of information of a recommended item for the betterment of user retention and engagement in a recommender-based platform.
In this paper, we study how suggesting various types of information about a recommended item influences the behavior of a user. The various types of information include visual, evaluative, categorial, and narrational information. Based on movie datasets, the effect of each type of information has been verified by providing additional images, providing averaged rating information, providing genre information, and providing shortened synopsis information. The experimental results show how each type of information has affected user retention as well as user engagement and whether combining various types always leads to betterment. Eventually, our study can be useful in designing the digital platform based on a recommender system in which users can click on more items or spend more time on the platform.
The organization of the rest of this paper is as follows: Section 2 provides the review on related works, Section 3 presents our proposed approach, the comparative experiments and discussion can be found in Section 4, and our conclusion is given in Section 5.

Background
Recommenders are roughly categorized into three categories: a collaborative approach, a content-based approach, and a hybrid approach [10,11]. A collaborative approach is designed for its predictive process based on the similarity among users measured from the historical interactions, provided that similar users show similar patterns of preference and that similar items receive similar ratings [12]. For instance, a collaborative recommender for TV shows could make predictions about which TV show a user should like based on the behavioral histories of other users' feedback. Aggarwal et al. (SAR) [13] suggested a Simple Algorithm for Recommendation (SAR), which is a fast and scalable method for personalized recommendations using user transaction history. The recommendations are explainable and interpretable. SAR recommends items similar to the ones that the user already has liked. Two items are determined to be similar if many users who rated one item well are also likely to leave positive feedback on the other. This method can provide fast training and fast scoring. Only simple counting is needed to construct matrices used for training, and the scoring only involves the multiplication of a similarity matrix with an affinity vector. He et al. (NCF) [14] introduced Neural Collaborative Filtering (NCF), which is a neural matrix factorization model ensembling Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP) to unify the linearity of matrix factorization and the non-linearity of MLP for modeling the latent structures between users and items. The authors had GMF and MLP learn separate embeddings to allow for flexibility in the fused model and combined the two models by concatenating their last hidden layer. He et al. (L-GCN) [15] proposed to learn user and item embeddings by linearly propagating them in terms of the interaction between users and items, based on the neighborhood aggregation of a graph convolution network (GCN) [16]. The weighted summation of the embeddings was used in all layers.
The content-based approach is based on the description of an item and the profile of a user's preferences. This approach considers the recommendation task as a user-specific classification and learns a classifier for the user's preferences based on the feature of an item. For example, keywords are used to describe a set of movies, and a user profile is built to indicate the type of movie this user prefers. Yu et al. (SLi-Rec) [17] proposed a deep learning-based model that captures both long-term and short-term user preferences. It adopts the attentive feature of asymmetric singular value decomposition [18] for long-term modeling and considers both time irregularity and semantic irregularity by modifying the gating logic in LSTM [19]. It also uses the attention mechanism to dynamically fuse the long-term and short-term components. Juan et al. (FFM) [20] introduced a Field-aware Factorization Machine (FFM), which is an extension to the Factorization Machine (FM) [21]. Unlike the FM, this method uses different factorized latent factors for different groups (fields) of features. By putting features into fields, they address the issue that the latent factors shared by features which intuitively describe different categories of information might not generalize the correlation.
Hybrid approaches combine the collaborative and content-based approaches [10,22]. Many studies have been done on aggregating the predictions of the two approaches or integrating the characteristics of one approach into the other approach.  [24] proposed a framework to jointly train feed-forward neural networks with embedding and a feature transformation-based linear model for generic recommenders using sparse inputs.
Many of the recent studies have focused on enhancing the accuracy of rating prediction rather than improving user engagement or retention. As such, it is also important to study how the suggesting strategy of a recommendation can influence the behavior of a user in recommender systems.

Research Question
To begin with the strategy of suggesting different types of supplementary information, we emphasize the main research question to clarify the strategy of our proposed approach. Information cues can have different dimensions with varying influences on information load and, accordingly, the decision of a user [25]. Youtube, for example, presents an item by using the title, the image, the uploaded time, the number of hits, the label of a channel, etc. The way of presenting an item indirectly but obviously has a strong influence on the behavior of a user.
We start by questioning whether different types of information lead to the different consequences of a user response in a recommender system, and if so, how different the degree of an influence is between various types of suggested information. Therefore, here, we study the effects of suggesting different types of information, which include visual, evaluative, categorial, and narrational information, as listed in Table 1, when the title and the poster image of a movie have been provided to the user as default.

Visual information
The images of an item (e.g., movie scenes) Evaluative information The rating of an item (e.g., movie rating) Categorial information The category of an item (e.g., movie genre) Narrational information The linguistic description of an item (e.g., movie synopsis)

Visual Information
Visual information helps a user understand increasingly rich databases of recommended items and biases the decision of a user by focusing attention on a limited set of options [9,26]. Many studies report that the contents of a movie poster are generally meant to be matched with the kind of motivation a user has when selecting the movie [27]. To verify the impact of the quantity of visual information, presenting a single movie poster has been compared to presenting multiple images that include different versions of movie posters and scenes from the movie. (A Python script has been leveraged to automatically obtain the additional images by searching with the title and released year from Google.) While exploring the additionally suggested images, a user has the chance to find out that a specific actor is going to be shown in the movie or that a certain background scene is interesting. Users tend to be more interested in a movie when they actively explore more visual information about the movie. In this experiment, the supplementary visual information of an item was suggested to users in order to study its effect on the users' behaviors. A Netflix dataset was used, and Figure 1 shows the example of additional images as well as the poster of a movie that a participant can explore during the experiment.

Evaluative Information
A number of platforms allow users to submit the review of a movie and aggregate the collected reviews into an average. Users interactively express their opinions on movies through the community-driven reviews. The results are then presented as an overall rating for any particular movie. The ratings from a community lead to a bandwagon effect [28], which means that an item with a positive rating is likely to be clicked more frequently [29]. As such, ratings as evaluative information play an important role when a user makes a decision based on the environment of the recommendation. Items with a high rating are perceived as more credible than items with a low rating. In this experiment, a Tweetings dataset was used, and it contains around 894K ratings obtained from around 70K users' tweets on the Twitter platform. The average rating of an item was calculated, and a user could refer to the rating information during the experiment. Figure 2 depicts examples of items with their average rating score.

The Categorial Information
The category of a movie, the so-called genre, is a motion-picture category based on similarities either in the factors of narrative, aesthetic, and emotional responses to the movie. Users generally have a sensitivity to the within-category correlation [30], such that the item from a preferred category is more likely to be selected from a user. The poster and title of a movie often do not sufficiently convey the impression of the movie. The genre of the movie can subsidiarily provide general information before the user makes a decision, as it affects the familiarity association as well as the visual and emotional association in the decision-making process of a user [31]. Movielens contains the genre information of a movie, while a movie can belong to more than one genre. In this experiment, we present the categorial information for each recommended item to the participant, who then is more likely to better guess what a movie could be about. Figure 3 shows examples of movies with their genre information.

The Narrational Information
A synopsis is a brief summary that gives users a depictive idea of what a movie is about. It provides an overview of the storyline and other defining factors of the movie, for the benefit of a potential user. The consistency between movie genres and linguistic cues in a movie synopsis positively affects the decision of a user when the synopsis containing linguistic cues confirms the expectancy of a user. On the other hand, a synopsis might negatively affect the decision of a user when the linguistic cues disconfirm the expectancy [32].
In this experiment, a Yahoo dataset, which contains the synopsis data of movies, was used to provide a user with the narrational information of a movie. Figure 4 shows examples of providing the synopsis of a movie.

Synopsis :
A colorful animated version of the screen classic. Phineas Fogg, an English nobleman, makes a wager that he can circumnavigate the globe in 80 days. His world voyage is filled with wonder and excitement.

Synopsis :
Contains five animated BATMAN cartoons: "How Many Herrings in a Wheelbarrow," "A Bird Out of Hand," "From Catwoman With Love," "The 1,001 Faces of the Riddler," and "The Cool Cruel Mr. Freeze."

Synopsis :
One of the greatest action movies of the late eighties, DIE HARD, ushered in a new standard for action films. Although DIE HARD contains many action movie cliches (one-liners, pyrotechnics), it also broke new ground in its genre.

Synopsis :
Based on a true story, SCHINDLER'S LIST is Steven Spielberg's epic drama of World War II Holocaust survivors and the man who unexpectedly came to be their savior. Spielberg's glorious film is wondrously evocative, visually stunning, and emotionally stirring.  Netflix dataset [33]: The Netflix Prize dataset contains over 100 million ratings from 480 thousand randomly chosen customers for over 17 thousand movies and shows (see Figure 5). The data were collected between 1998 and 2005 and provide ratings on a scale from 1 to 5. Each customer id has been replaced by a random id to protect privacy. The dataset includes the date of a rating, the title, and the year of release for each movie. • Tweetings dataset [34]: The Movie-Tweetings dataset had been automatically gathered and therefore depends on the continuation of the IMDb apps and Twitter API (see Figure 6). The dataset consists of 893,866 ratings extracted from the tweets by 69,832 users. It contains 36,737 different movies and shows, and the ratings are scaled from 0 to 10. • MovieLens dataset [35]: The MovieLens (25M) dataset was made by the GroupLens Research Project (the University of Minnesota) and consists of 25,000,095 ratings ranging from 1 to 5 (see Figure 7). The feedback had been collected between 1995 and 2019 from 162,541 users on 62,423 movies through the MovieLens website. Users who wrote less than 20 ratings were excluded. A movie is able to belong to multiple genres among 18 different genres. • Yahoo dataset [36]: The Yahoo movie dataset is collected by Yahoo! Research through the Yahoo Movies website (see Figure 8). 7642 users rated movies using a 13-level rating scale ranging from A+ to F (or from 1 to 13). 5808 movies were used, along with a large amount of descriptive information about the movies including synopsis, genre, ratings, etc.

Evaluation
The common user retention [37] metric, the click-through rate (CTR) [38][39][40][41], was adopted to directly evaluate the impact of the methodology. The metric measures the ratio of clicks to recommendations, as presented in Equation (1).
Additionally, mean average precision [42,43], MAP@K, implies both the sequence of feedback and the total number of item engagements of a user (see Equation (2), where S is the number of samples). Average precision, AP@K, was first defined as the summation of the engaged precision values divided by the number of item engagements, m, where P(i) indicates the precision at i, and δ(i) indicates the bivariate function for engagement (as in Equations (3)-(5)).
A/B experiments [2,37,44] were conducted to verify the effect of diversifying the information suggested to a user based on different baseline methods. We collected data from 33 participants who were asked to click on all the items that they were interested in. Four comparing baseline methods were used: SAR [13], NCF [14], L-GCN [15], SLi-Rec [17], FFM [20], L-FM [23], and W&D [24]. We did not inform the participants of which method was applied in each experiment. The given settings of default parameters were used when adopting the baseline methods based on their open sources.

The Visual Information
We tested the effect of providing additional visual information based on the decision of a user. At most 10 images including different posters and scenes were available before making a decision. The posters often contained important keywords, and the scenes provided information about major figures or backgrounds of the movie. Pictures played an influential role in guiding a user to take an action, showing the picture superiority effect [45], which is the conceptual or perceptual processing advantage of visual information. Compared to providing a single image, users showed a tendency to take more interest in a recommended movie when multiple images were provided. Each user has specific visual preferences that are more likely to be satisfied by multiple images rather than a single image. Table 2 presents the averaged results of the click-through rate and the mean average precision comparatively, showing the difference obtained by providing more cues of visual information (also see Figure 9). The user retention was commonly increased by a significant degree for every baseline method. In terms of both the click-through rate and mean average precision, the tendency of improvement was more or less consistent. Also, it should be pointed out that the collaborative approach (SAR, NCF, and L-GCN) largely benefited from providing complementary visual information compared to the content-based approach (Sli-Rec and FFM) or the hybrid approach (L-FM and W&D).

The Evaluative Information
We then tested the effect of providing evaluative information based on the decision of a user. The given cue of ratings effectively functioned because users mostly lay trust upon the statistical values of other users. Users started paying attention to the movies that had a relatively higher rating or finally made up their minds by finding the rational basis of a decision in the statistics. If making a choice is difficult, the rating information from many other users carries as much credibility as an expert recommendation. Compared to the condition of excluding the evaluative information, the user retention results were substantially improved by including the evaluative information. Users were inclined to turn their attention to the movies with a high rating, even though they did not find the movies interesting from the beginning. Table 3 shows the averaged results of the click-through rate and mean average precision, which highlight the difference made by providing the evaluative information of a movie. The user retention was remarkably improved by the evaluative information in all cases, as shown in Figure 10. Similar to the previous case of visual information, the collaborative approach obtained a greater positive influence by the evaluative information compared to the content-based approach. Yet, in this case, the hybrid approach showed a greater benefit from supplementarily suggesting the rating data of a movie.

The Categorial Information
Next, the effect of providing categorial information was investigated. A movie can be classified into multiple genres, and users have a favorable view of movies whose genres include their favorite films. The genre information efficiently supplements the limited information of a movie introduced by the title and poster image, because users usually begin with identifying whether a movie belongs to their preferred genres. Although the user retention results were improved by the categorial information, it was not as effective as providing evaluative information or visual information. Still, suggesting additional information helped users become interested in certain movies and induced click-throughs to a greater or lesser extent. Table 4 shows the averaged results of the click-through rate and mean average precision, and the effect of suggesting the categorial information of a movie can be observed. Unlike the visual information and evaluative information, the content-based approach benefited more from the categorial information. Figure 11 also visually conveys that the content-based approach, which is based on learning user-specific classifiers for the user's preferences based on the description of an item, takes advantage of the categorial information.

The Narrational Information
The effect of providing narrational information was verified by using the synopsis of a movie. A user was able to open and read the attached synopsis, which satisfies the curiosity of the user about the detail of a recommended movie. However, reading straight through a synopsis takes quite a long time; as such, many of the users skimmed over the synopsis for outstanding keywords.
The results intuitively convey that additional information helps encourage user retention, but the narrational information was not as fruitful as visual information or evaluative information. Table 5 presents the averaged results of the click-through rate and mean average precision and compares the difference created by providing the synopsis of a movie. The hybrid approach was more related to the effect of narrational information (also see Figure 12).

Aggregation of Supplementary Information
Providing additional information about an option can both help and hinder the item evaluation of users [46]. We studied whether multiple types of supplementary information always lead to the betterment of user retention in a recommender system. The test was conducted under the circumstance that a recommender system simultaneously gives users evaluative, categorial, and narrational information. Interestingly, providing evaluative, categorial, and narrational information produces a rather low click-through rate and mean average precision compared with providing only evaluative information. Figure 13 visualizes the averaged results of the click-through rate and mean average precision of the different methods compared in Table 6. This marginal effect by the abundance of supplementary information leaves a suggestive message or contrary evidence that adding more information does not always help a user click more or spend more time on a recommender-based platform.  To summarize the experimental results, Figure 14 depicts the level of improvement achieved by different types of supplementary information, as shown in Table 7: visual, evaluative, categorial, and narrational information, as well as the aggregation of some types. In short, the click-through rate was enhanced with visual information by 12.84%, with evaluative information by 25.14%, with categorial information by 8.20%, with narrational information by 14.04%, and with aggregated information by 18.02%.

Conclusions
We studied the effects of suggesting different types of supplementary information to a user in terms of the improvement of user retention in a recommender system. By understanding more about these effects, a digital platform will be better able to design a structure that can satisfy users who would click more on items suggested by the recommender system and would spend more time on the platform. This study is derived from the idea that there are many potential ways to assist users in finding interesting items, other than high-precision rating prediction. We found that visual, evaluative, categorial, and narrational information differently affect the decision of a user. Firstly, we observed that providing supplementary information is generally effective in improving user retention. It is worth observing that the rating of an item as evaluative information plays an important role in giving a user confidence to click on the item. Secondly, certain types of information can be more effective based on approach: for example, visual information better helps the collaborative approach; categorial information has much to do with the content-based approach; the hybrid approach is benefited more from evaluative and narrational information. In addition, we show a simple counterexample that the richness of supplementary information does not always result in improving user retention. This leaves the next question of how different combinations of supplementary information can have an influence on the decision of a user.
In future work, we plan to extend this approach to examine the effects of other types of information, such as acoustic and dynamic information (e.g., video highlights). Furthermore, the effect of different combinations of information should be studied. Moreover, further research is needed on which practical, cognitive-science-based approach, e.g., in terms of visual saliency and interacting with user feedback, can allow for an improved recommender system.

Conflicts of Interest:
The authors declare no conflict of interest.