Application of Trust in Recommender Systems—Utilizing Naive Bayes Classiﬁer

: Receiving a recommendation for a certain item or a place to visit is now a common experience. However, the issue of trustworthiness regarding the recommended items/places remains one of the main concerns. In this paper, we present an implementation of the Naive Bayes classiﬁer, one of the most powerful classes of Machine Learning and Artiﬁcial Intelligence algorithms in existence, to improve the accuracy of the recommendation and raise the trustworthiness conﬁdence of the users and items within a network. Our approach is proven as a feasible one, since it reached the prediction accuracy of 89%, with a conﬁdence of approximately 0.89, when applied to an online dataset of a social network. Naive Bayes algorithms, in general, are widely used on recommender systems because they are fast and easy to implement. However, the requirement for predictors to be independent remains a challenge due to the fact that in real-life scenarios, the predictors are usually dependent. As such, in our approach we used a larger training dataset; hence, the response vector has a higher selection quantity, thus empowering a higher determining accuracy.


Introduction
The information used for the recommendation process can be derived from available content coming from articles and users, or it can be derived from explicit evaluations when users are asked to rate articles. Based on how the information is used, recommender systems are considered to be content-based, collaborative filtering, or hybrid (where both are used) recommender systems [1].
Social network analysis and 'social mining' can be very useful in the context where recommender systems can inherit adequate and useful information from social networks and vice versa, where network formation and evolution can be affected by recommendations.
There are cases where involvement of the neighborhood among items are important, as in cases where two items might be considered as neighbors, where there is more than one user that possibly rated these items somehow similarly [2], and in these cases, we have a matrix which contains a rating of items, and the ratings are populated either by the similarity between items or similarity of the users that are involved in the review process. In such cases, the most popular and used technique is Pearson correlation or cosine-based vector similarity calculations, as explained in [3].
Each recommender system intends to achieve an acceptable rate of recommendations that are accurate based on the predictions that are made, and the results of recommendation to be as closer as they can to what the predicted model estimated. Moreover, one RS should be able to keep the list of items that are seen as of particular interest to a certain user, and the RS must be always updated with eventual new items; hence, a user does not miss any new product/feature that he/she might be interested in. Following the evolvement of RS, it is obvious that the majority of recommendations and their rating approval are based on the friends/peers' recommendations and thoughts, since people have a greater tendency to consider the opinions of the people they know from their social life, hence making a good relation and tie between RS and social network analysis. However, every RS should always take care of the people they involve in the recommendation process, due to a high number of malicious users/tools that seek to orient the recommendation toward a biased item.
In this rationale, in this paper, we have utilized Social Network Analysis (SNA) metrics, combined with trustworthiness metrics, to enhance the process of recommendations.
In our approach, we set the process in three phases, as depicted in Figure 1 below: • In the first phase, we consider the social profile of a user, which includes social and personal information, such as gender, location, age, hobbies. Then, we considered the touristic preferences of a user, such as a category and a subcategory of a point of interest that a user prefers to visit. • In the second phase, we used a dataset from [4] to distinguish certain points of interest that users selected and provided preferences for. From this dataset, we created a training and testing set to be further used in the next phase.

•
In the third phase, we used the training set, to test the evaluation, and at this point, we used the Naive Bayes classifier as a classification and prediction model. Moreover, from this point, not only do we get the recommendation for a certain point of interest based on categories and subcategories users used at training set, where we used vector of dependent features, but we also calculated trustworthiness by considering the confidence of recommendations; hence, besides making the recommendation of a certain point of interest, we assured the user that the recommendation is coming from a trusted source.
Computation 2022, 10, x FOR PEER REVIEW 2 of 14 user, and the RS must be always updated with eventual new items; hence, a user does not miss any new product/feature that he/she might be interested in. Following the evolvement of RS, it is obvious that the majority of recommendations and their rating approval are based on the friends/peers' recommendations and thoughts, since people have a greater tendency to consider the opinions of the people they know from their social life, hence making a good relation and tie between RS and social network analysis. However, every RS should always take care of the people they involve in the recommendation process, due to a high number of malicious users/tools that seek to orient the recommendation toward a biased item. In this rationale, in this paper, we have utilized Social Network Analysis (SNA) metrics, combined with trustworthiness metrics, to enhance the process of recommendations.
In our approach, we set the process in three phases, as depicted in Figure 1 below: • In the first phase, we consider the social profile of a user, which includes social and personal information, such as gender, location, age, hobbies. Then, we considered the touristic preferences of a user, such as a category and a subcategory of a point of interest that a user prefers to visit. • In the second phase, we used a dataset from [4] to distinguish certain points of interest

Related Work
The main intention of each recommender system (RS) is to provide the most accurate and trusted recommendation to its users. For this purpose, there are common classification techniques among RS that are used today, as elaborated in [1]: • Content-based recommendations: where the recommendation is based on users' previous content evaluation.

•
Collaborative recommendations: where recommendations are based exclusively on user and friend reviews, and among this class, there are two subgroups: memory-and model-based collaborative filtering. • Hybrid approaches: combines the benefits of other two popular approaches (collaborative and content-based recommendations.

Similarity of External Information
In hybrid approaches, in cases where external social information of users, such as profession, location, nationality, age, etc., are taken into consideration, there were better recommendation results and satisfaction [5], even though exposing this personal information to an RS might expose one user to a privacy breach [6,7].
Among literature, another significant classification of recommender systems are content-based recommendations [8], where the algorithm calculates and recommends respectively, only based on the previous items that a user of a system previously evaluated. Among this classification technique, the term frequency-inverse document frequency (TF-IDF) measuring method-is mostly used [9], which calculated the frequency of the term t within a document d, among all documents that are used in experiment |D|, as follows: Furthermore, in the literature there is a classification of combination methods, as follows: • Combination of predictions from two RS methods-collaborative and content-based [10,11]; • the models of collaboration, where features of the content are taken into consideration [12][13][14]; • the addition applied to content-based techniques, where features of collaboration are added [15]; • the implementation of a unique model that not only combines, but also integrates into one solution the common features of collaborative recommendations and contentbased recommendations [16][17][18], even though such an approach cannot be used in every domain of RS [19,20].
The process of labeling the systems that use collaborative filtering started since the arrival of Web 2.0 [21], and the extension is made toward applications and scenarios that are e-commerce-related [22,23]. Another enhancement of RS is the involvement of reviews and opinions of every user and respondent [24] by including here even negative reviews and thoughts [25,26], since even in cases where reviews and recommendations are with negative marks, it is important to find the correlation among items proposed and users involved [27]. Such approaches are used in many research works, such as [28,29].
In research groups, recommendations made by a friend or people we know are taken very much into consideration [30], and here social relations and relations among people from real life have significant importance [31,32]. All this leads to the empowerment of the role of social relations [33] and friendship [34] concerning recommender systems.
For recommendations not to be biased, hybrid methods are used in RS to overcome these issues [35,36], and therefore it also solves one of the main problems such as the cold start problem, thus diversifying the recommendation [37].
Among usage of linear mathematic calculations for hybrid methods defined in [10], Markov Monte Carlo methods [38] are being mostly applied today. Complex methods of forecasting combination are also in use to reduce errors with recommendations overall [39].

Trustworthiness
Trust as a concept is linked in social life and social sciences, but it also gained power in all fields of life and technology as well, by including Naïve bayes and probability techniques [40,41] in RS, and trust and semantic [42,43] as notations in RS. Even dealing with the economy and trades, trust is considered as the main factor to bring people together on trade [44], thus in many definitions trust is referred to as a belief [45].
In [46], trust is considered as a probability that is subjective and is dependent on certain actions that a person takes, while in [47], trustworthiness is considered and defined as subjective expectation, and meanwhile different cognitive models were studied [48], while in [49] is argued that trust/belief is very much related and dependent on others' behavior, and the level of trust varies from the situation as well [50].
In [51], trust/reliability/belief is studied from the computer science perspective, where it is stated that "trust is a measurable level of risk through which one agent X assesses the likelihood another agent Y will successfully perform a particular action before X monitors such action and in the context in which it affects own actions." Besides belief and trust, disbelief and untrust [52,53] are also considered in other publications, and the inclusion of negative feedback is worth consideration in RS [54,55].
Furthermore, in [56] main computational characteristic of trust is summarized, such as asymmetry, dissemination-trusting the "friend-of-a-friend", composition-when there is a need to consult the information that we receive from many sources, and nontransitivity, as argued in [57].

Trustworthiness Metrics
The process of trust calculation proved to be very difficult, and hence several approaches are considered today, based on the models and domains to which we must apply such calculations. In [58], trust is calculated based on its features using the 'Fuzzy' logic mathematical model. In another approach [59], the authors proposed a tool that can be used to change the belief accordingly (on the go) as the user reads the content, while the usage of 'historical data' of networks and its peer relation was argued as a feasible approach in [60]. Friend-of-a-Friend (FOAF) ontology and the relationship trust are used as an approach in [61], while in [62], reliability metrics are applied separately between local groups and bigger social groups.
Some of the main metrics used today are 'Page Rank' [63], which is a well-known metric to calculate the importance of a website, and 'Eigen Trust' [64], which is accepted as a global reliability metric for calculation the inter-personal relations among user in peer-to-peer networks.

'Trust-Aware' Recommender Systems
Being in a situation when one is uncertain about its choices, it is normal to seek opinions and reviews from friends and people around you, and hence this is the overall intention of social networks itself.
In this relation, personal opinions are considered more trustworthy than different advertisements that can be made for a certain item, as argued in [65]. In [66], another approach is analyzed, where the information gathered from websites is no longer relevant, compared to information provided by peers and friends.
'Filmtrust', an RS proposed in [67], seeks to get direct trust values for certain movies by reviewers, while in [68], the author tended to calculate trustworthiness based on network trust.
'Trust-aware' recommendations were enhanced with ontologies to create indexed content for semantic websites [69]. In another approach, the 'shuffled frog leaping' algorithm was applied to group users of different social contexts [70]. As an overall outcome, the recommendations coming from RS that include trust in the evaluation process proved to be more accurate and raised the satisfaction factor of its users [71,72].

Bayesian Approach in Our Model
As elaborated in [73], in cases where we want to achieve recommendations that target the personalized recommendations, usually we have to consider groups with two factors. In our case, let us consider the user-object pair, where we depicted user classification and objects in the K user and K object classes, respectively. K user and K object values are parameters in the algorithm similar to K which was a Singular Value Decomposition (SVD) parameter. Assume that there is a simple Bayesian network-the simplest assumption is that r iα depends only on the user class c I and the object class c α . Thus, the probability r iα can then be written as: It is obvious from this equation that in our case the Bayesian network corresponds to the simplest dependency structure linking ratings to user classes and objects.

Naive Bayes
In this work, we opted for a prediction model, that can give results based on the data that is observed. In this context, Naive Bayes (NB) fits our approach, mainly due to the following arguments:

•
It is independent-that means that we can consider all properties as independent given the target Y. • It is equal-an event where all attributes are considered as being with the same importance.
In this rationale, now we need to calculate the conditional probability, which is known as the Naive Bayes algorithm since it uses the Bayesian theorem, and thus calculates the probability of an event, based on the incidence of values in historical data. The following formula, shows the definition of the Bayesian Theorem: where: A-represents the dependent event, B-represents the preceding event, thus predictive attribute P(A)-represents the probability of the event before it is observed. P(A|B)-represents the probability of the event after the evidence is observed While, "Naive" Assumption is defined as the evidence that is divided into pieces, that are meant and defined to be independent P(A) = (P(A1|B) P(A2|B) . . . P(An|B))/(P(B)) where A1 and A2 are independent occurrences.

Naïve Bayes Classifier
Naive Bayes classifiers are a collection of classification algorithms based on the Bayesian theorem. It is defined as a family of algorithms where everyone shares a common principle, meaning each pair of characteristics that are classified is independent of each other.
To begin, let us look at a data set. We considered an imaginary and random data set describing one prediction and three conditional states, to make or not the certain recommendation. Given the prediction and conditions each pair classifies the conditions as suitable to make a recommendation ("Yes") or unsuitable to make a recommendation ("No"). The following Table 1 depicts our data.
The data set is divided into two parts namely the properties matrix and the response vector.

•
The properties matrix consists of the value of the dependent properties. In the above data set, the features are 'Prediction', 'Condition #1 , 'Condition #2 , and 'Condition'#3 .

•
The response vector contains the value of the prediction or results of the property matrix. In the data set above, the class variable name is 'Recommendation'.
At this point, we should consider the pairs of property matrices as totally independent. For example, condition #1 being 'X' has nothing to do with condition #2, or the 'A' in Prediction appearance does not affect condition #3.
For the data to be equally treated, each attribute has the same weight or relevance. For example, just knowing condition #1 and condition #2, one cannot predict the accuracy of the result. None of the attributes is unnecessary and we assume that they contribute equally to the result. Now, with our data set we can apply the Bayesian theorem as follows: P(X) = (P(X|y) P(y))/(P(X)) where, y is a class variable and V is a vector of dependent properties (of size n) where: To clarify, an example of a feature vector and the corresponding class variable could be: V = (A, X, H, N) and y = No So, P(y|X) means, the probability of "Do not recommend" given that the conditions are "Prediction A", "condition #1 is X", " condition #2 is H" and " condition #3 is S".

Naïve Assumption
Now is the time to give a naive assumption to the Bayesian theorem which is independent among other properties. Now we divide the evidence into independent parts.
Since (v 1 ,v 2 ,v 3 , . . . ,v n ) are independent, now if either of the two events A and B are independent then, P(A,B) = P(A) P(B) Therefore, we reached the result: P(v 1 , v 2 , v 3 , . . . , v n ) = P(v 1 , v 2 , v 3 , . . . , v n |y) P(y) P(v 1 )P(v 2 )P(v 3 ), . . . , P(v n ) Which is summarized as: Now, while the denominator remains constant for a given input, we can remove that from the equation: At his point, we have to define a classification model. To achieve this, we have to consider the maximum probability, which is calculated by having inputs for all values of Y: And as a final step, we calculate P(y) and P (v i |y). Where, P(y) is the class probability and P(v i |y) is the conditional probability.

Preliminary Results with the Initial Random Dataset
Let us try to manually apply the above formula to our dataset. For this, we need to make some calculations in our data community.
We find P (v i |y j ) for each v i in V and y j in y. The following Figure 2 depict the calculation process.

Naïve Assumption
Now is the time to give a naive assumption to the Bayesian theorem which is independent among other properties. Now we divide the evidence into independent parts.
Since (v1,v2,v3,…,vn) are independent, now if either of the two events A and B are independent then, P(A,B) = P(A) P(B) Therefore, we reached the result: P v 1 ,v 2 , v 3 ,…, v n = P(v 1 ,v 2 , v 3 ,…, v n |y) P(y) P v 1 P v 2 P v 3 ,…, P(v n ) Which is summarized as: P v 1 ,v 2 , v 3 ,…, v n = P(y)∏ i n P(v i |y) P v 1 P v 2 P v 3 ,…, P(v n ) Now, while the denominator remains constant for a given input, we can remove that from the equation: At his point, we have to define a classification model. To achieve this, we have to consider the maximum probability, which is calculated by having inputs for all values of Y: y = argmax y P(y)∏ i n P(v i |y) And as a final step, we calculate P(y) and P (vi|y). Where, P(y) is the class probability and P(vi|y) is the conditional probability.

Preliminary Results with the Initial Random Dataset
Let us try to manually apply the above formula to our dataset. For this, we need to make some calculations in our data community.
We find P (vi|yj) for each vi in V and yj in y. The following Figure 2 depict the calculation process.  So, from the data in the Figure 2 above, we calculated P ( v i y j ) for each v i in V and y j in y manually in Figure 2. For example, the probability of getting a recommendation since the condition #1 is Z, i.e., P(condition #1 = Z|recommendation = Yes) = 3/9. Also, we need to find the probabilities of class (P(y)) which is calculated in Figure 2. For example, P(recommendation = Yes) = 9/14.
At this stage, the precalculations are completed, and the classifier is ready. Now, we consider a new set of features (that we call today): So, the probability of getting a recommendation is given by: today = (C, X, I, S) P(Yes|today) = P(C − Prediction|Yes)P(X − cond .#1|Yes)P(I − cond .#2|Yes)P(S − cond .#3|Yes)P(Yes) P(yes) (11) and the probability of not playing golf is given by Since P(today) is common to both probabilities, we can ignore P(today) and find the proportional probability as: And, Now, since P (Yes|today) + P (No|today) = 1 (13) These numbers can be converted to a probability by making the sum equal to 1 (normalization): And, P(No|today) = 0.0068 0.0141 + 0.0068 = 0.33 Since, P(yes|today) > P(no|today) So, the prediction that we are getting the recommendation is 'Yes'.

Evaluation Results Based on Social Network Data
Considering the dataset that we used from [74], as a total there were 11,326 unique users, 2.2 million check-ins, 15 million venues, and more than 47 thousand social relations, while the check-ins are accessed through API in Foursquare application [75].
To adapt to our approach and model, we took into consideration a total of 724 users, who had more than one and less than 10 check-ins for distinct touristic points of interest.
In the next phase, we organized the data into two sets, where our training set consisted of 601 user inputs, while our testing set have 123 users, while the response vector contains exactly m = 75 values (distinct points of interest) of the class variable.
The following section will depict the calculation process, where we set a new feature, which we will call "Meal": meal = (Male, "Collierville, TN", Food, "American Restaurant") We see that the vector of dependent features has the size n = 4. By agreement let us call the partial probabilities, the probabilities of the feature vector dependent on the variable of the respective class, and denote them by: P(x i |y ) respectively i = 1, 2, 3, 4 for k = 1, and let the random selection by the value of the variable of class "Wrigley Field". So, the probability of eating at "Wrigley Field" is given by: P(x 1 |y )= P( Gender = Male|"Wrigley Field") P(x 2 |y )= P( HomeCity = "Collierville, TN"|"Wrigley Field" ) P(x 3 |y )= P( Category = Food| "Wrigley Field" ) P(x 4 |y )= P( Subcategory = "American Restaurant"|"Wrigley Field" ) P( y)= P( POI = "Wrigley Field" ) P(meal)= 1 P( "Wrigley Field"|meal) = P(x 1 |y ) * P(x 2 |y ) * P(x 3 |y ) * P(x 4 |y ) * P(y) P(meal) P( "Wrigley Field"|meal) = After the first iteration, let us go over the next 73 iterations which continue similarly. For k = m = 75.
Let the remaining selection be the class variable with value of the "Park Tavern". So, the probability of eating in "Park Tavern" is given by: P x 1 |y = P Gender = Male|"Park Tavern" P x 2 |y = P HomeCity = "Collierville, TN"|"Park Tavern" P x 3 |y = P Category = Food|"Park Tavern" P x 4 |y = P Subcategory = "American Restaurant"|"Park Tavern" P y = P POI"Park Tavern" P meal = 1 P "Park Tavern"|meal = P x 1 |y * P x 2 |y * P x 3 |y * P x 4 |y * P(y) P(meal) P "Park Tavern"|meal = 5 6 * 2 6 * 6 6 * 6 6 * 6 601 1 ≈0.002773 Assuming that the sum of the probabilities of the response vector over meals is: In the next step we normalize, and for: k = 1, P "Wrigley Field"|meal = 0.0 ∑ P y i |meal After applying the algorithm on the training set, we reached an accuracy of recommendations of 89%, with a confidence of approximately 0.8943, which was our intention of this study, because except for recommendation, we wanted to raise the level of confidence and trustworthiness.
An expert of the results is given in the following Figure 3, where the focus should be on the last 2 columns. The POI-point of interest column shows if there is a certain POI k = 75, P( "Park Tavern"|meal) = 0.002773 So, the recommendation for the point of interest is "Park Tavern". After applying the algorithm on the training set, we reached an accuracy of recommendations of 89%, with a confidence of approximately 0.8943, which was our intention of this study, because except for recommendation, we wanted to raise the level of confidence and trustworthiness.
An expert of the results is given in the following Figure 3, where the focus should be on the last 2 columns. The POI-point of interest column shows if there is a certain POI recommendation for a user (identified with ID in the Figure 3), and the last column Confidence, empowers the potential recommendation by giving the confidence and trust that a user has for a recommendation made to him/her.
The calculations for every dataset record are realized using Python-programming language, and an overview of the main parts of the algorithm where confidence and prediction are calculated is depicted in the Appendix A section.
The process starts with the calculation of similarities between training and testing set users, where initially the social profiles of tourists and reviewers are compared for similarities, and then the subcategories and categories of POI are compared. Therefore, whenever there is a match on these attributes (gender, home city, category, subcategory) the algorithm will recommend the POI to the new tourist. The process of calculating the probabilities for all potential POIs, as depicted in detail in Section 4.3, is realized using the findPropProb function that is defined and depicted in Appendix A, to further continue with the process of confidence calculation by using function predictAccurancy, that is depicted in the second part of Appendix A, where confidentiality is calculated as full (or 1), in cases where from a potential set of POIs, for example, 4 or 5 POI-s, the tourist is most satisfied with the recommended POI which is in a top list of a reviewer. recommendation for a user (identified with ID in the Figure 3), and the last column Confidence, empowers the potential recommendation by giving the confidence and trust that a user has for a recommendation made to him/her. The calculations for every dataset record are realized using Python-programming language, and an overview of the main parts of the algorithm where confidence and prediction are calculated is depicted in the Appendix A section.
The process starts with the calculation of similarities between training and testing set users, where initially the social profiles of tourists and reviewers are compared for similarities, and then the subcategories and categories of POI are compared. Therefore, whenever there is a match on these attributes (gender, home city, category, subcategory) the algorithm will recommend the POI to the new tourist. The process of calculating the probabilities for all potential POIs, as depicted in detail in Section 4.3, is realized using the findPropProb function that is defined and depicted in Appendix A, to further continue with the process of confidence calculation by using function predictAccurancy, that is depicted in the second part of Appendix A, where confidentiality is calculated as full (or 1), in cases where from a potential set of POIs, for example, 4 or 5 POI-s, the tourist is most satisfied with the recommended POI which is in a top list of a reviewer.

Conclusions
There are many studies that utilized Naive Bayes and probabilistic algorithms on a recommender system such as [9,40,41], as well as the inclusion of trust as a notion and semantic into recommender systems [42,43], including other studies that we mentioned in related work, of which the majority treated the approaches individually. In our work, we presented an approach where we correlate the usage of the Naive Bayes classifier and trustworthiness into social networks. By experiments conducted and evaluation results that we have, using our datasets, we agree that the Naive Bayes algorithm is very important in use for classifying data properties. Moreover, it is shown that the implementation of the Naive Bayes classifier is simple because it assumes that it only has the class node connected by a link to all the nodes of the other attributes. The preliminary results show our approach as a feasible one since it has reached the accuracy of recommendation of 89%, with a confidence of approximately 0.89.
We can conclude that the usage of a Bayesian classifier in a recommender system is known to greatly reduce errors and can involve a large set of training data. The Bayesian classifiers also need empirical mitigation, and the mitigation technique depends very

Conclusions
There are many studies that utilized Naive Bayes and probabilistic algorithms on a recommender system such as [9,40,41], as well as the inclusion of trust as a notion and semantic into recommender systems [42,43], including other studies that we mentioned in related work, of which the majority treated the approaches individually. In our work, we presented an approach where we correlate the usage of the Naive Bayes classifier and trustworthiness into social networks. By experiments conducted and evaluation results that we have, using our datasets, we agree that the Naive Bayes algorithm is very important in use for classifying data properties. Moreover, it is shown that the implementation of the Naive Bayes classifier is simple because it assumes that it only has the class node connected by a link to all the nodes of the other attributes. The preliminary results show our approach as a feasible one since it has reached the accuracy of recommendation of 89%, with a confidence of approximately 0.89.
We can conclude that the usage of a Bayesian classifier in a recommender system is known to greatly reduce errors and can involve a large set of training data. The Bayesian classifiers also need empirical mitigation, and the mitigation technique depends very much on each case. Bayesian classifiers are powerful, but lack of data or lack of access to proper data may be the first cause of incorrect implementation of classifiers.  Data Availability Statement: Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The following code is used to calculate calculating the probabilities for all potential POI-s with function findPropProb, and the process of confidence is calculated by using function predictAccurancy.

Conflicts of Interest:
The authors declare no conflict of interest.