Recommender Systems Based on Collaborative Filtering Using Review Texts—A Survey

: In e-commerce websites and related micro-blogs, users supply online reviews expressing their preferences regarding various items. Such reviews are typically in the textual comments form, and account for a valuable information source about user interests. Recently, several works have used review texts and their related rich information like review words, review topics and review sentiments, for improving the rating-based collaborative filtering recommender systems. These works vary from one another on how they exploit the review texts for deriving user interests. This paper provides a detailed survey of recent works that integrate review texts and also discusses how these review texts are exploited for addressing some main issues of standard collaborative filtering algorithms


Introduction
Nowadays, e-commerce websites have been flourishing quickly and permitting millions of items for selling [1]. The choice of an item from this large number of items makes necessary the use of a supplementary tool called recommender system [2,3]. The recommender system (RS) provides an alternative to discover items that users might not have found by themselves. It collects user's information concerning the items he/she prefers and then suggests those items [4].
One of the most widely used recommender systems rely on the Collaborative Filtering (CF) approach, which is utilized by various e-commerce companies [5], including Yelp (https://www.yelp.com/), Netflix (https://www.netflix.com/), eBay (https://www.ebay.com/), and Amazon (https://www.amazon.com/). The mainstream of CF techniques relies on the commonality between users. Analogous users or items are discovered by computing the similarities of the users' common ratings [4]. CF methods perform well when there is enough rating information [6]. Nevertheless, their effectiveness suffers when the rating sparsity issue occurs, for the reason that there are frequently a restricted common ratings' number between users [7]. Another limitation is that CF approaches do not catch the reason for ratings of the user, and consequently cannot precisely catch the preference of a target user [8]. To deal with these problems, several content-based methods have been developed to represent users and items by various kinds of data, including tags [9], items' descriptions [10], and social factors [11]. After all, these techniques are still deficient, particularly when the rating sparsity degree is major, or the target user has not much historical ratings [6]. With the current scenario of the Web, users have become more and more comfortable with expressing themselves and sharing their points of view concerning items on the e-platforms utilizing textual reviews [12]. As a result, user textual reviews have developed into an omnipresent portion of e-commerce nowadays. Forum websites, like TripAdvisor (https://www.tripadvisor.com/) and Yelp, and online retail websites such as Table 1. An example of a rating matrix [15]. A CF system generates recommendations based on the relationships and similarities between users or items [17]. These relations are inferred from the user-item interactions managed by the RS. This later infers the ratings of the target user for the items that have not been evaluated yet. After that, items are ranked according to the estimated rating scores, and then items with high-ranking are suggested to the targeted user [17].

Typical Algorithms of CF
CF is considered the widely studied and implemented approach in RS [4]. Existing CF can be classified into two principal categories of memory and model-based techniques [17,18]. In memory-based CF (also called Neighborhood-based), the ratings matrix saved in the system is straightly utilized to predict missing ratings for target items. Instead, model-based CF exploits the values of the matrix to build a model, which is then utilized to infer the pertinence of novel items for the target users [17].

Memory-Based CF
Memory-based CF approach leverages on the similarities between users or items for inferring the user's probable preference in items which he has not evaluated previously. The memory-based CF method is subclustered into two main classes, namely, user-based and item-based methods [2]. The user-based CF predicts the unknown ratings of the user on the target items based on ratings of similar users on given items [17]. Formally, the rating prediction of the user u to the item j is calculated as follows:r wherer u refers to the average rating of user u, sim(u, v) is the similarity (for a predefined similarity metric) of the users u and v, and N u represents a group of users similar to user u (neighbors) who rated item i. The item-based CF relies on the similarities between items. It predicts the rating of the user for an item based on the user' s ratings for similar items [17]. In these techniques, two items are similar if multiple users have evaluated these items similarly [4]. The rating prediction for item-based CF is formulated as follows:r where N i is the group of similar items to item j, and Sim(j, k) is the score of the similarity between the two items j and k. The calculation of similarity among users/items constitutes a critical stage in neighborhood-based CF techniques, as it may severely decrease their accuracy and performance [19]. Several similarity metrics have been presented in the literature [20], among which cosine measure (COS) [21], Pearson correlation coefficient (PCC) [22] and Jaccard coefficient [23] are ones of the popular standard criteria typically adopted for finding most similar users or most similar items. PCC computes the similarity based on the linear correlation between two rating vectors of users/items. COS calculates the similarity by using the angle's cosine value between rating vectors. Jaccard similarity takes into account the number of common ratings between users/items and ignores the rating values. The choice of the similarity measure should be properly made on the basis of the target dataset [24]. To calculate the similarity measure between two users u and v respectively, these metrics are based on the following expressions: In these Equations, I u,v denotes the items' set rated by users u and v;r u represents the ratings' mean value of the user u, and r u,i represents the u's rating for the item i. I u and I v represent two items sets rated by users u and v respectively. On the other hand, the similarity among two items i and j is computed by involving users' ratings which have evaluated these two items: where U i,j accounts for the group of users who evaluated items i and j, andr i reflects the average value of ratings received by the item i. U i and U j refer to Users sets who rated items i and j respectively. Nevertheless, the major shortcoming of memory-based CF is that these approaches may incur prohibitive computational costs (computation time of similarities among users or items), which augment with the growth in the number of users/items in the system [25]. However, they have become popular because of their uncomplicated implementing process, providing an understandability for the calculated predictions [17,26].

Model-Based CF
Despite the neighborhood-based CF techniques are simple to be implemented and effective in inferring unknown ratings of users, model-based CF approaches generally generate more precise predictions [18]. The basic idea of these types of techniques is the utilization of data mining and machine learning approaches for developing prediction models offline. Based on these models, RS predicts missing ratings in the user-item matrix [27]. During recent years, various model-based CF techniques have been developed, namely, Bayesian networks [28], neural networks [29], support vector machines [30], and very recently, fuzzy-based systems [31] and deep learning techniques [32]. Nevertheless, the Matrix factorization (MF) models [33] are regarded to be the state-of-the-art in RS due to their strengths in terms of accuracy and scalability [18]. MF algorithms use the high-level correlation among rows and columns (users and items) of a target user-item rating matrix for learning the users' and items' latent representations (also called latent factors) [34]. More precisely, each item i and each user u are respectively represented by k-dimensional latent factors, namely q i ∈ R k that represents k-characteristics of the item and p u ∈ R k that refers to the preference of the user for these characteristics. Formally, the rating score of a user u on item i is computed as follows [4,6]: To optimize latent factors which better predictr u,i , the following loss function must be minimized in such a way: where T represents the user-item (u, i) pairs for which real ratings r u,i are observed in training set. And β is a defined regularization parameter which is used to limit the overfitting of the model. In general, the minimization of the loss function (Equation (6)) can be achieved with different techniques such as the Gradient-based or alternating least-squares [16]. Compared to neighbor-based CF techniques, the model-based CF return more accurate prediction results. Furthermore, storage requirements for these approaches are frequently less than those demanded by neighbor-based techniques [9]. This is because, in neighbor-based CF, all ratings are required to be loaded in memory to provide recommendations, while model-based CF involved the learned model, which is generally smaller than the original rating matrix [9]. Nevertheless, the model-building can requires more time and training data [35]. Besides, if novel users and/or products (items) are registered in the system, the new model should be trained multiple times to update it and maintain its accuracy [35].

Evaluation Metrics of CF
Evaluation represents an integral part of any system building process for proven its efficiency for the interest tasks [36]. To evaluate the performance of CF-based RS many evaluation measures have been used by research communities in RS [37]. These can be widely categorized into two main approaches-online and offline [37]. The first approach implies providing recommendations to the users and then querying them regarding how they assess the recommended items. Offline approach does not involve real users' interactions, rather part of the users' historical data is exploited for training the system, whereas another part is utilized for testing the computed predictions. Online approach is considered the best evaluation method, due to its capacity of providing precise feedback of how pertinent the system is through real users [38]. Nevertheless, interactions with real users are mainly time-consuming, thus, many works have adopted an offline evaluation approach [20]. Table 2 presents some of the common evaluation metrics used in CF-based RS, their definitions, as well as their formulas. Table 2. Evaluation metrics used in CF.

Mean Absolute Error
It measures the average of the absolute difference among the predicted ratings and true values.
where r u,i refers to the real rating for user u over item i andr u,i is the predicted rating by a CF system, T = {(u, i)} denotes the set of user-item pairs for which the real ratings r u,i are known. [22]

Root Mean Squared Error
It emphasizes the contributions of the absolute errors between the predictions and the real values.
Precision It computes the rate of the provided recommendations that are pertinent.
where U u represents the number of all items used by the user u and L rec is the list of recommended items. [18] Recall It computes the rate of recommendations that are provided.

ROC curve
It amplifies the proportion of recommendations that are not preferred by the user.
Plots the true positive rate against the false positive rate. [18] Ranking Score It measures the quality of recommendations based on their rank position.
where r (ij) is the item i's rating in the rank j, md refers to the median rating and α is the value of half-life decay. [20] Click Trough Rate It computes the proportion of recommendations ultimately clicked where L cons is the list of consumed items [39,40] Novelty It computes the novelty of the provided recommendations nov(L rec ) = ∑ i∈Lrec min j∈L his dis(class(i), class(j)), where L his is the history's list of the user. dis is a distance measure, class(i) and class(j) represent the classes of items i and j, respectively. [18,20] others _ _ [18,20,22,36]

Main Issues and Challenges on Standard CF Techniques
This subsection investigates the most common issues and challenges encountered in deploying CF-based RS and are considered important in the CF-based RS research.

Data Sparsity
Typically, there are a large number of missing ratings in the user-item interaction data, and the sparsity is frequently superior to 99% [41]. This is due to the difficulty that users encounter when they want to express their interests as numerical ratings on products [42], or because of the poor recommendation space's coverage [10]. This problem has a major negative influence on the effectiveness of CF approaches [43]. Due to the sparsity issue, it is likely that the similarities among users cannot be calculated, decreasing the effectiveness of CF. Alike when the similarities are calculable, they may be unreliable, since the information obtained is insufficient [43]. The review-based recommendation techniques discussed in Section 4 mitigate this problem in different manners.

Cold-Start
This issue takes place when novel users/items are added to the rating matrix. In such cases, CF methods are not able to provide these users with recommendations nor to recommend these items, since the system has not yet collected enough ratings about them [44]. To mitigate this problem, the content of user reviews can be combined with scalar ratings (Section 4).

Scalability
In a CF algorithm, it is expensive to calculate the users' similarity as the algorithm must search the entire database to determine the target user's potential neighbors [45]. Therefore, with a larger data set, algorithms require more resources like memory or computation power, which limit the algorithms to scale [46]. The practical solution to this issue may consist of using clustering CF approaches which search users in small size clusters rather than the complete database [47], or reducing dimensionality per singular value decomposition (SVD) [48], or combining content-analysis and clustering with CF techniques [49]. Another interesting solution for overcoming the scalability relies on the use of distributed computing mechanisms [50]. Different studies have incorporated the standard CF algorithms into a distributed computing engine to improve their computational performance on recommendation applications [50][51][52] through the use of Apache Hadoop or Spark, that are fast and practical frameworks for parallel large-scale data processing [53].

Limitations of Numerical Explicit Ratings
Typical CF methods suffer from a principal problem because of their dependency on users' numeric ratings as their unique source of user preference information [12]. However, the scalar rating information frequently lacking good enough semantic explanation to reflect the actual preferences of the user, thus greatly reducing the recommendation accuracy [9]. To address this problem, various recommendation approaches combine ratings and user reviews (see Section 4).

User Review Texts
The growth of electronic commerce has promoted users to write and share reviews expressing their opinion regarding items. Typically, these users' reviews are in free text form which expresses various dimensions or viewpoints of the experience that a user had for a given item [3]. They thus constitute a very valuable information source on preferences of users and may be used to learn fine-grained profiles of users and improve personalized suggestions. Chen et al. [6] identified different information elements that can be obtained from review texts and can be exploited by RS. Among these review elements, terms (words), aspects and opinions (sentiments) have been proved to be efficient for user modeling. In the following, we present these elements and briefly discuss the possibility of their usage in CF-based RS.
Review Words: The user review is in an unstructured textual form. The easiest way of mining it is to capture the most representative words. For instance, the TF-IDF weight measure [54] can be utilized to indicate the relevance of each word in the review. The extracted review words may be used to compute the similarity among users, rather than utilizing numerical ratings in CF [55].
Review Topics: The topics refer to an item's aspects which a writer reviews in its review. For instance, in the review phrase: "The camera's battery life is superb" the mentioned topics include the camera and its battery life. There are various methods for topics detection in reviews, namely, frequency-based, syntax-based, Conditional Random Fields [56], and topic modeling approaches like Latent Dirichlet Allocation (LDA) [57], Latent Semantic Analysis (LSA) [58], or Probabilistic Latent Semantic Analysis (PLSA) [59]. Review topics can then be used to improve real ratings in standard CF [34]. They can also be combined with latent factors in model-based CF [60] and with the similarity measure in neighbor-based CF [61].
Overall Opinions: They represent the sentiment orientation (i.e., positive or negative) of the user towards reviewed items. Generally, the overall opinion may be deducted by regrouping all opinion words' sentiments in the reviews or by applying a coarse-grained sentiment analysis method based on supervised [62,63], semi-supervised [64] or unsupervised machine learning techniques [65]. The extracted overall opinions can be transformed into scalar ratings, that can be useful to augment CF techniques performance [66][67][68].
Aspect Opinions: They represent the detailed opinions about an item' s particular characteristics. For example, the review phrase "The waiters' attitude is great". discloses a positive opinion on the service aspect. In general, review aspects may reference to a distinct thing like the product itself or one of its attributes ("attitude of waiters" rather than "service"). The typical techniques to feature extraction include linguistics-based methods and statistical methods [69][70][71][72], or structured models, like Conditional Random Fields (CRF) [73], Hidden Markov Models (HMM), and their variations [73,74]. The identification of opinions associated with aspects (features) is then made through word distance or pattern mining [69,75]. Alternatively, an SVM or LDA classifiers can be utilized for identifying the aspect opinions (aspect, sentiment pairs) [6]. In Reference [76], the aspect sentiments were utilized for calculating user similarities in order to cluster users in CF. In Reference [7] they were used to identify user similarities, and then incorporated into standard user-based CF.

CF Techniques Based on User Review Texts
Recently, many attempts have been made for integrating the precious information incorporated in user reviews into the recommendation task [6]. This section summarizes a list of recent works on review-based CF recommender systems (Tables 3-6). Particularly, these works can be classified into three principal techniques, namely, techniques based on words, on topics and finally on opinions.

Techniques Based on Review Words
These techniques use the review words by factorizing them into CF. For instance, Terzi et al. [55] proposed a modification of the user-based technique which computes the similarities among users based on text reviews' similarities, rather than ratings. More precisely, the similarity between two users is calculated by measuring the similarity among reviews' words of these two users for every co-reviewed item. The computed similarities scores are then utilized as a weight in the rating prediction phase.
Kim et al. [77] proposed a Convolutional Matrix Factorization (ConvMF) model, which utilizes reviews text as complementary information. Firstly, this model utilizes convolutional operations and word embedding for capturing the items' latent characteristics from their review texts. After that, the inferred latent features are integrated into a matrix factorization model to compute the users' ratings on target items.
Zheng et al. [78] proposed a Deep Cooperative Neural Networks (DeepCoNN) model which uses two parallel convolutional neural networks (CNNs) and a word embedding method for capturing latent representations for the all reviews' words associated to a target user and item. To perform the prediction task, the model concatenates the user and item representations and then transmits it to a regression layer involving a Factorization Machine (FM) technique. Similar to DeepConn [78], the model developed by Chen et al. [79] (called NARRE) uses CNNs to derive latent embeddings of users and items from review texts. Different from DeepConn, it scores reviews through an attention network to distingue their contribution when learning the latent embeddings. To predict missing ratings, NARRE uses attention scores with user latent rating factors and then incorporates them into an extended MF.
The work in Reference [80] fused the ratings and review information in a unified model. The model exploits CNNs and an attention mechanism to learn the relevant latent features by considering their related reviews. Through a rating-based component, the model constructs latent rating embeddings for users and items from the interaction matrix. To derive the final rating score, the learned content features and latent rating embeddings are integrated into a Factorization Machine (FM).
Very recently, Liu et al. [81] presented a Hybrid neural recommendation model (called HRDR) to capture user and item embeddings from reviews and ratings. Firstly, the rating representations are obtained from rating data by using a Multilayer Perceptron (MLP) network. Then, CNNs with an attention mechanism are used to derive review-based representations where each review is associated with an informativeness score. Finally, a MF is used to compute users' ratings on items based on their latent ratings, review features and ID-embeddings.

Techniques Based on Review Topics
This type of technique extracts aspects from reviews and combines them with ratings for generating recommendations. For example, McAuley and Leskovec [60] proposed a Hidden Factor and Topic (HFT) framework that fuses ratings with review topics. Firstly, it models reviews with the LDA-based topic model and ratings with standard MF. Then, a Softmax transformation function is used for incorporating the latent topics into the learning phase of the latent features model. Based on the trained model, the final rating scores are computed.
In the same way as McAuley and Leskovec [60], the model proposed by Tan et al. [86] (called RBLT) utilizes MF for modeling rating scores and LDA for representing the text of reviews. In their model, items are represented as topical distribution, and the topics in elevated rating reviews are repeated for augmenting their importance. Alike, users are represented in a similar topical space by their numerical ratings. To perform the rating prediction task, the item and user representations are fused into a latent factorization model.
Based on the fact that the LDA technique cannot model the compound topics' distribution, authors of [88] extended HFT [60] by proposing the TopicMF framework. TopicMF captures topics from user review text based on non-negative MF, and utilizes a MF technique for factorizing rating matrix into latent user/item features. For rating prediction, a transform action function is used to join the topic features with the matching latent user/item features.
More recently, Cheng et al. [89] proposed an Aspect-Aware Latent Factor Model (ALFM) that leverages an Aspect-aware Topic Model (ATM) for modeling aspect-level user/item representations as distributions of composite topics, each of which is represented by a set of words. In ALFM, the resulted representations from ATM are fused with latent rating factors to estimate the missing ratings based on the MF model.
Chin et al. [90] proposed an Aspect-based Neural Recommender (ANR) that uses a neural network for estimating the latent aspects ratings and latent aspects importance. The latent aspects ratings are derived through a weighted sum of all the words' embedding in the reviews. The latent aspects importance is inferred by using a shared similarity among each pair of the user' s item' s latent aspects ratings. Finally, the overall rating for any user-item pair is inferred by combining their associated aspects ratings with aspects importance into a modified Latent Factor Model (LFM).

Techniques Based on Review Sentiments
Research works in this area use the user's expressed sentiment on the item itself or on its different aspects in reviews, to boost the rating prediction task. For instance, Poirier et al. [66] transform reviews into overall sentiment scores based on a machine learning method. To do that, reviews vectors fused with users' real ratings are exploited for training a Naive Bayes model on negative and positive classes. This learned model is then utilized for deducting ratings from novel reviews. To predict ratings, the review-based ratings are used for constructing a rating matrix that is integrated into the traditional neighbor-based CF techniques.
Differently, in Reference [91] an Explicit Factor Model was developed to transform user reviews into aspect-sentiment pairs. Based on phrase-level sentiment analysis, it constructs two matrices, namely, user-aspect attention and item-aspect quality, which are simultaneously decomposed with the rating matrix for performing rating prediction in a MF-based model.
The model proposed by Diao et al. [87] (called JMARS), utilizes the relationship between review aspects, opinions and ratings to conduct CF. It exploits the Dirichlet-Multinomial technique for capturing the reviews' word distribution and a MF for generating the aspects ratings which are fused with latent factors to compute the final rating scores.
On the other hand, Ma et al. [7] have presented a user-preference-based CF that integrates aspect-level information to reflect user interests from reviews. Specifically, two metrics for aspect interests have been proposed, namely aspect need and aspect importance for reflecting the differences of opinions to aspects and the aspect relationship to explicit rating, respectively. Based on these measures, the authors compute the similarity between users, which is then incorporated into memory-based CF to further recommendations.
Musto et al. [92] developed multi-criteria user-and item-based CF techniques that integrate opinion information of reviews' aspects. For user/item-based cases, the authors present aspect-based item/user distances, which utilize the sentiment ratings deduced from reviews' aspects. The similarity between users or items is then computed as the inverse of the proposed distances, and ratings are calculated using the standard CF model. In the paper, the authors use the SABRE engine [94] for performing the aspect extraction task.
Shen et al. [68] developed a sentiment-based MF model that incorporates reviews' sentiments. To infer the review's overall sentiments scores, this model sums the sentiment score of each keyword in the target review based on the score obtained from a constructed sentiment dictionary. To perform rating prediction, these sentiment scores are converted into real values and then fused with the users' explicit ratings into an extended probabilistic MF.
In a recent work [93], the authors proposed a unified model to integrate aspects opinion information into CF. The model uses a multichannel CNN that involves word embedding and POS tag embedding layers for extracting review aspects. It regroups aspects by using an LDA technique and then exploits a lexicon approach for building the aspects rating matrices. The aspects ratings are then weighted based on a tensor factorization method and integrated with a rating matrix into an LFM for predicting final ratings.

Practical Benefits of Review Incorporation
From Tables 3-5 and 7, we can see that all works on review-based CF algorithms have proven their advantages compared to the traditional CF recommending approaches. This section discusses the practical benefits of these review-incorporated techniques on two main issues, namely, rating sparsity and rating prediction improvement.

Rating Sparsity
As indicated in Section 2, the lack of pertinent data like sparsity considerably reduces the efficiency of the CF techniques [7]. To tackle this problem, researchers have explored user reviews in different ways (see Tables 3-5

and 7):
The works proposed in References [60,[86][87][88][89] have demonstrated the capacity of their approaches to mitigating the rating sparsity issue. These works exploit review topics (aspects) for enriching the latent factor model. They extract aspects from review texts using topic models and learn latent features from ratings using MF methods. Then, the latent topics and latent factors are combined in a way for boosting prediction performance. For example, HFT [60] uses a defined transform function to learn the latent factors and latent topics together. JMARS model [87] leverages a one-to-one matching among the latent factors and the learned latent aspects for determining the final ratings. Bao et al. [88] fuse aspects in reviews with latent factors in a user-item rating matrix by exploiting a transform function. Tan et al. [86] use a linear combination between them for building the final users/items representations which are then used in the rating prediction task. Cheng et al. [89] extract reviews' topics and associate them with aspects, and then use an extended latent factor model to enrich latent ratings with aspects.
Poirier et al. [66] show that user reviews can be converted into text-based ratings and then used to replace the user explicit ratings in the CF process. This approach first infers opinion ratings from reviews based on a machine learning model and then executes a neighbor-based CF method. Therefore, this work has proved its ability to mitigate the rating sparsity issue by inferring ratings from review texts.
On the other hand, Ma et al. [7] leverage review text for capturing the weights preference which the target user assigns to different aspects. To derive the aspect preferences, all the user' s reviews are used, making easy the similarities' computation among each users' pairs, no importance how a number of items they frequently rate, that can mitigate the data sparseness issue.  [93] have demonstrated the capacity of their approaches to alleviating the sparsity problem by using rich semantic features extracted from review words trough CNNs. Specifically, these studies confirmed that the use of CNN helps adjust latent ratings by efficaciously representing contextual features of user/item review texts when the rating data is sparse.

Rating Prediction Improvement
A lot of works (Section 4) propose to incorporate user review texts for improving the traditional CF techniques (Section 2). These works can be classified into two main categories. The first one focuses on modifying the standard CF techniques to integrate implicit scores inferred from review texts to adjust explicit ratings and get more reliable and fine-grained ratings. For instance, the authors of References [60,[86][87][88][89]91] have presented different modified version of the standard latent factor model, namely, HFT, RBLT, JMARS, TopicMF, ALFM and EFM models for improving numerical ratings by aligning them with latent topics in reviews. The works in References [77,[79][80][81]93] have improved the real ratings in traditional latent factor by fusing them with latent feature vectors inferred from review words trough an integrated CNN architecture. On the other hand, in Reference [7] the traditional user similarity in neighborhood-based CF recommenders has been improved by considering the users' aspect preference vectors inferred from reviews. Moreover, in Reference [68], the standard probabilistic MF has been improved through an adjustment of its real ratings by the sentiment scores inferred from reviews.
The second category focuses on replacing the explicit user ratings in standard CF with implicit ones generated from review texts. For example, the text-based ratings inferred from reviews can replace explicit ratings in neighbor-based CF approaches [66,92]. The review words can be used to improve the traditional user-kNN similarity in memory-based CF techniques [55]. The users' and items' latent embeddings obtained by CNNs from reviews can be used as features in an LFM to conduct rating predictions [78].
These existing review-incorporated works have proven their efficiency in exploiting user review texts (see summaries in Tables 3-5). For instance, in Reference [55] the extended user-based CF approach exploiting text-based ratings has been proven to generate more accurate predictions than traditional ratings-based approaches. The item-based CF approach exploiting reviews' ratings [66] has shown a comparable precision accuracy to standard CF which is based on explicit ratings. In References [7,92], the neighborhood-based CF technique based on inferred sentiment scores has been shown to provide results superior to the traditional memory-based CF approaches. On the other hand, the modified latent factor models that fuse real ratings with review-based ratings have proven to be more precise than the traditional models which only leverage real ratings [60,68,77,[79][80][81][86][87][88][89]91,93]. This is due to the rich information of user interests and item characteristics contained in reviews, that could be practical complementary to numerical ratings.
Furthermore, certain works have compared different review-based CF methods. The neural network techniques [68,[77][78][79][80][81]90,93] usually outperform methods that rely on CF with topic modeling [60,[86][87][88]91] because of the robust representation capacity of neural network architectures, that can capture rich semantic features from review texts for representing users and items. However, techniques relying on topic modeling loss the deep textual characteristics trough this coarse-grained text mining method.
On the other hand, we realize that the techniques [79][80][81]90,93] leveraging attention network usually outperform techniques without attention [77,78]. This is due to the usage of the attention mechanism, that allows capturing the more significant features in reviews and consequently provide a way for deriving users' and items' representations more precisely.

Conclusions
Nowadays, due to the occurrence of modern text mining techniques, much effort has been devoted to incorporating review texts into the recommending task. Different types of review elements, like review words, review topics, and review opinions have been utilized for augmenting the classical rating-based CF models because they allow to represent more accurately items and user's interests. In this paper, we survey existing review-based CF recommender systems and categorized them into three main systems, namely, systems based on words, on topics and finally on sentiments. For each one, we discuss how user review texts have been exploited to enrich rating profiles, and derive feature preference. We also discuss the practical benefits of these review-based recommending systems in terms of alleviating the rating sparsity and augmenting the prediction accuracy. In spite of the remarkable progress in the review-based CF RS research area, we can notice through our survey of different review-based approaches, that further works are needed. For instance, fusing various review-based CF RS might be further efficient than using a single system to predict users' preferences; another area of future work may be relying on the usage of advanced text mining approaches for identifying more complex relatedness among reviews and ratings.

Conflicts of Interest:
The authors declare no conflict of interest.