Measuring Product Similarity with Hesitant Fuzzy Set for Recommendation

: The processing of a sparse matrix is a hot topic in the recommendation system. This paper applies the method of hesitant fuzzy set to study the sparse matrix processing problem. Based on the uncertain factors in the recommendation process, this paper applies hesitant fuzzy set theory to characterize the historical ratings embedded in the recommendation system and studies the data processing problem of the sparse matrix under the condition of a hesitant fuzzy set. The key is to transform the similarity problem of products in the sparse matrix into the similarity problem of two hesitant fuzzy sets by data conversion, data processing, and data complement. This paper further considers the inﬂuence of the difference of user ratings on the recommendation results and obtains a user’s recommendation list. On the one hand, the proposed method effectively solves the matrix in the recommendation system; on the other hand, it provides a feasible method for calculating similarity in the recommendation system.


Introduction
In the era of Big Data, information overload has become a common phenomenon that plagues people. As an effective way to solve information overload, the recommendation system emphasizes discovering users' hobbies and guiding them to discover their information needs. A sound recommendation system can provide users with personalized services and can establish close relationships with them, making users rely on recommendations. Studies have indicated that the e-commerce recommendation system plays a vital role in consumer decision making and enterprise product sales [1,2].
Recommendation systems utilize the input of recommendations based on the information of users and products, the recommendation algorithm, and the recommendation output. Among them, the input of recommendations is the "source" of the whole system, as it largely determines the user experience and influences the system's recommendation quality. The recommendation algorithm is the "core" of the entire system, reflecting its intelligence. The mainstream recommendation algorithms include content-based recommendations [3,4], collaborative filtering recommendations [5], knowledge-based recommendations [6], and hybrid recommendations [7]. Finally, the recommendation output is the "face" of the system, reflecting its quality.
Irrespective of the recommendation algorithm, obtaining input data is the primary step of recommendations. Two kinds of input data exist in the system: Implicit browsing input and explicit rating input. The former requires mining users' browsing time, browsing paths, browsing behavior, and other implicit information, which has a high degree of uncertainty and passive features; hence, they are extensively studied by scholars. Meanwhile, the latter is evaluated by historical ratings provided by users, constrained by the number of data and the efficiency of the data. Typically, after purchasing a product, the user provides a product score to express their satisfaction and preferences. For instance, the Movielens, Taobao, and JingDong scoring systems are adopted at level 5. However, the sparsity of the scoring matrix renders inconvenience to the improvement of the recommendation quality.
In the explicit scoring input, the sparsity of the scoring matrix brings much inconvenience to the accurate implementation of recommendations. In e-commerce websites, the number of user-rated products accounts for only a small part of the total number of products, leading to high data sparsity. Consequently, the sparsity of user rating data leads to significant errors in the similarity calculation of users and products, as the accuracy of user score prediction decreases sharply. Based on this, many scholars have proposed many methods to improve the sparse matrix. For example, Zhang et al. [8] used a random algorithm to deal with cold start problems when the data were sparse, and when the data reached a certain level, the hybrid algorithm was applied to an incremental recommendation. Li et al. [9] proposed a hybrid recommendation algorithm based on content and user collaborative filtering. Taking advantage of collaborative filtering, when the number of users and the evaluation level are large, the user scoring data matrix becomes relatively dense to reduce the sparsity of the matrix and to enable a more accurate collaborative filtering. Zhang et al. [10] proposed a new depth variational matrix factorization (DVMF) recommendation method for large-scale sparse datasets, and obtained the potential characteristics of users and items respectively through the depth nonlinear structure. Based on the potential factors combined with the matrix factorization method, an optimization method of the DVMF algorithm has been proposed. Wang et al. [11] put forward a recommendation model of prior relations for low-rank sparse matrix factorization, which predicted users' ratings of the item by learning the sum of the low-rank and sparse matrices, then effectively reducing sparsity and cold start problems using prior information. Liu et al. [12] proposed a new item recommendation algorithm based on a pattern recognition and statistical model to analyze and predict user behaviors. This algorithm can be applied to sparse user behavior datasets, avoiding the problems faced by collaborative filtering algorithms when the datasets are sparse. Huang et al. [13] devised a new CDCF algorithm, the low-rank sparse cross-domain (LSCD) recommendation algorithm, to extract potential feature matrix of users and items for each domain, instead of decompressing the matrix of each domain into three low-dimensional matrices by three factors to solve the problem of sparse data. To sum up, there are three primary approaches to improving the sparse matrix: (1) Increase data while maintaining an unchanged scale [14,15]; (2) reduce scale and data [16,17]; and (3) use neural networks and deep learning methods to predict user ratings for improving the sparse matrix [18]. These methods have three drawbacks in sparse matrix processing. First, in the process of complementing data, various complement strategies increase the uncertainty of information. Second, it may lose part of the helpful data information in the process of dimension reduction. Finally, it is relatively challenging to realize the deep mining of user and resource information. In this case, exploring the new sparse matrix solution is one of the core elements of e-commerce recommendation input research.
As an extension of fuzzy sets [19], hesitant fuzzy sets have been applied to many decision-making [20,21] problems: Mardani et al. [22] extended a new fuzzy approach under the hesitant fuzzy set (HFS) approach using stepwise weight assessment ratio analysis (SWARA) and the weighted aggregated sum product assessment (WASPAS) method to evaluate and rank the critical challenges of DT intervention to control the COVID-19 outbreak. Colak et al. [23] proposed an integrated MCDM model consisting of the Delphi, analytic hierarchy process (AHP), and VIsekriterijumska Optimizcija I Kompromisno Resenje (VIKOR) methods to evaluate EST alternatives for Turkey under a hesitant fuzzy environment. Sahu et al. [24] found that the hesitant fuzzy-based symmetrical technique of the analytic hierarchy process (AHP) and the technique for order of preference by similarity to the ideal solution (TOPSIS) is an effective methodology for evaluating web applications' durability. Pratibha et al. [25] proposed a novel framework based on the COPRAS (complex proportional assessment) method and the SWARA (stepwise weight assessment ratio analysis) approach to evaluate and select the desirable sustainable supplier within the HFSs context. Although hesitant fuzzy sets have been used in many fields, they are rarely used in the field of recommendation systems.
Considering the above ideas and the application of fuzzy tools to recommendations [26,27], this paper discusses how to make full use of known data to obtain high-quality recommendation outcomes without losing information in a sparse matrix. Its primary contributions are as follows: (1) The similarity between hesitant fuzzy theory and recommendation system is discussed; (2) the hesitant fuzzy set theory is applied to describe the embedded historical ratings in the recommendation system. Such an idea can ensure the similarity relationships between products without losing the sparse matrix rating information, provide a new way for sparse matrix processing and similarity discussion in the recommendation system, and find a new field for the research of hesitant fuzzy sets.
This paper is structured as follows. Section 2 mainly analyzes the characteristics of sparse matrix data and explores a suitable processing approach according to those characteristics. Section 3 introduces the hesitant fuzzy set theory to realize seamless docking between the fuzzy set and the recommendation system. Section 4 architects a measurement model of product similarity in the electronic commerce recommendation system with the thought of hesitant fuzzy seta. Section 5 extracts the data from Movielens to conduct empirical research. Section 6 compares the results based on user recommendation with the results based on product similarity in the preceding part to verify the effectiveness of the proposed method, while Section 7 provides the conclusions and prospects.

Feature Description of Sparse Matrix in the E-commerce Recommendation System
The sparse matrix in the e-commerce recommendation system is represented by the sparsity of the user's rating matrix. The basis of e-commerce recommendation utilizes the rating matrix to dig the similarity of users or products and then to produce high-quality recommendation strategies. Typically, it can attain two different recommendation strategies with the aid of the user's rating matrix; one is collaborative filtering recommendation based on users, and the other is collaborative filtering recommendation based on products. The former applies the rating matrix to obtain the recommended user's nearest neighbor, and the latter helps recommend products to find its similar products. The essence of the two recommendations is seeking similarity, but the former studies the similarity of users, while the latter studies the similarity of products.
In the e-commerce system, the user rating matrix indicates the feature of sparsity, but also shows the 4V feature of Big Data. The formation of these data is accompanied by many uncertain factors, such as the attitude of users when providing rating data, which affects the reliability and authenticity of the data. The personality characteristics of users have a certain impact on the display of data; users may hesitate between different rating levels when rating. Based on the differences in the environment, situation, and time, the rating results may even be contradictory. For the same product, different users may separately use five or four points to express their satisfaction, while the same user in different situations can express their satisfaction or dissatisfaction with four points. Therefore, when the rating data are uncertain, the recommendation system should retain existing data as much as possible to avoid information loss.
Taking product ratings as an example, irrespective of the number of ratings, all data should be used as far as possible in later studies. If a product has multiple ratings, it can be considered that the evaluation of the product is hesitating among multiple values, and each value can reflect the actual attributes of the product from one angle or side. In the product-based collaborative filtering recommendation algorithm, it is necessary to describe the similarity between products. Typically, this similarity calculation needs to ensure that two products have the same evaluation number, while the massive product evaluation number in the network cannot reach an agreement. Fortunately, the hesitant fuzzy set [28] adequately expresses the thought of "dithering" in the progress of ratings, and it provides a convenient means to solve this problem.
There is much correspondence between hesitant fuzzy set theory and the e-commerce recommendation system, as shown in Table 1. Notably, the similarity between products or between users can be studied based on the rating matrix. Under the condition of a hesitant fuzzy set, the essence of the two is the same. If a product is defined as an element in the hesitant fuzzy set, we can discuss the similarity between the products and then produce the collaborative filtering recommendation algorithm based on products. Meanwhile, if a user is defined as an element in the hesitant fuzzy set, we can explore the similarity between the users and then produce the collaborative filtering recommendation algorithm based on users. Without losing generality, this paper explores the former case.

The Introduction of a Hesitant Fuzzy Set
The hesitant fuzzy set was put forward by Spanish scholar Torra [29] in 2010, and it is a further extension of the fuzzy set theory. The idea of a hesitant fuzzy set is that people hover among multiple possible values when deciding the membership of an element belonging to a certain set, and then the multiple values are listed as membership. Thus, the hesitant fuzzy set can more carefully describe the uncertain characteristics of a decision-maker's understanding of things.

Basic Definition
Definition 1 [29]. Let X be a reference set. Hesitant fuzzy set (HFS) A is a set of different numbers of membership functions h A (x) on X valued on [0, 1].
To be easily understood, Xu and Xia [30] expressed HFS as a mathematical symbol: where h A (x) = {γ|γ ∈ h A (x)} is a set of some different values in [0, 1], γ represents the possible membership degree of the element x ∈ X to A, and h A (x) is called a hesitant fuzzy element (HFE) [30], which is a basic unit of HFS.
In the hesitation fuzzy set theory, given a reference set, the membership function does not provide only one value, but rather a set of them, which provides a way of modeling hesitation. In the e-commerce recommendation system, a large number of users will provide a large number of ratings, and each rating actually has a certain degree of hesitation. Then, the rating for a product is actually a set of multiple user ratings, that is, the set of multiple fuzzy numbers, which just constitutes a hesitant fuzzy set. Hence, we can interpret the different users' ratings for a given item as the hesitation about the item.

Similarity
Similarity measures are fundamentally important in a variety of scientific fields, including decision making, pattern recognition, machine learning, and market prediction, and lots of studies have been conducted regarding this issue of hesitant fuzzy sets. Xu and Xia [31] originally developed a series of distance measures for hesitant fuzzy sets based on the proposed corresponding similarity measures.
Definition 2 [32]. Let A 1 and A 2 be two HFSs on X, then the distance between A 1 and A 2 is defined as d(A 1 , A 2 ), which satisfies the following properties: Definition 3 [32]. Let A 1 and A 2 be two HFSs on X, then the similarity between A 1 and A 2 is defined as S(A 1 , A 2 ), which satisfies the following properties: In fact, the calculation of the similarity of hesitant fuzzy sets has a precondition that the number of elements in two sets is equal. However, this is hard to guarantee in reality, which is also a manifestation of the diversity of hesitant fuzzy sets. Thus, the number of elements should be complemented before similarity calculation. There are many approaches to complement elements in hesitant fuzzy sets, such as the mean value approach, the modal number approach, and so on, but how to ensure the quality of the complement is a key problem in this paper.
By analyzing Definitions 2 and 3, it is noted that S(A 1 , A 2 ) = 1 − d(A 1 , A 2 ). Therefore, the distance measurement formula can obtain the measure of similarity in the hesitant fuzzy set. The shorter the distance between the two sets, the higher the similarity between them. The similarity measurement of hesitant fuzzy values is similar to that of hesitant fuzzy sets. In fact, it only needs to measure the distance of each membership function between two hesitant fuzzy values. Xu et al. [33] provided the similarity calculation formula for two hesitant fuzzy values, h 1 and h 2 , based on the distance measurement formula: where h Obviously, the similarity measure of the hesitant fuzzy value adopts the idea of the Pearson's similarity measure. This measurement method effectively avoids the uncertain information generated during the measurement of two hesitant fuzzy values.

Affiliation of User Ratings
In an e-commerce recommendation system, there may be some differences in the rating results provided by users with different personality characteristics. For instance, some users think that four points is already high, while others think this is poor. Therefore, it is necessary to preprocess the user's rating to eliminate differences and uncertainties in the rating.
Here, R ij indicates the rating of User i on Item j , and R i = m ∑ j=1 R ij m is the average rating of all products rated by User i in the system. Then, the membership degree γ ij of R ij is defined as: where R represents the rating system in the recommendation system. Generally, Taobao and JingDong adopt a five-point system, so maxR = 5 and minR = 0. Therefore, the membership degree of a five-point system is γ ij = R ij 5 . There are also three-point and ten-point evaluation methods in evaluating enterprise after-sales services. In fact, a user's rating of the product is a typical expression of the degree of membership, which intuitively expresses their satisfaction.

Product Rating Representation
Based on the above discussion, all of the ratings obtained by Item j in the system can be expressed as a hesitant fuzzy set: where l j represents the total number of ratings that product Item j receives. Obviously, l j is an uncertain value, reflecting the sparse degree of the rating matrix.
In the collaborative filtering recommendation, the recommendation among the products is converted into a recommendation among the hesitant fuzzy set, and the product similarity is converted into similarity among the hesitant fuzzy set.

Horizontal Comparison of Products
Considering the objective existence of rating matrix, the number of elements in h(Item j ) and h(Item k ) is usually l j = l k , but when calculating the similarity, two hesitant fuzzy sets usually need to have the same hesitant fuzzy elements. How to fill in the short hesitation fuzzy sets has become one of the urgent problems to be solved. In order to solve the problem and calculate effectively, the following agreement is made.
The elements in h(Item j ) and h(Item k ) are arranged in ascending order. If and only if γ ij = γ ik (i = 1, 2, · · · , l), h(Item j ) = h(Item k ), here, γ ij , γ ik represents the i-th element, which is ordered by the ascending order in hesitant fuzzy set of h(Item j ) and h(Item k ). Elements are added into the hesitant fuzzy set, which has fewer elements until l = max l 1 , l 2 , · · · , l j , l k , · · · .
In the personalized recommendation system, the ultimate purpose is to recommend the product Item j to User i ; thus, the preference of User i determines how to add elements. There are two strategies for adding elements in the hesitant fuzzy set; one is the forward strategy, and the other is the backward strategy: (1) Forward strategy: If the average of all ratings R ij (j = 1, 2, · · · ) of historical products from User i , which is expressed as R i , meet R i ≤ 4, indicating that User i is a pessimistic user, l − l j γ l j j is added before the first element of h(Item j ). Hence, γ 1j = γ 2j = · · · = γ l−l j ,j = γ 1j . (2) Backward strategy: If the average of all ratings R ij (j = 1, 2, · · · ) of historical products from User i , which is expressed as R i , meet R i > 4, indicating that User i is an optimistic user, l − l j γ l j j is added after the last element of h(Item j ). Hence, γ l j +1,j = γ l j +2,j = · · · = γ l,j = γ l j ,j .

Similarity Calculation of Products
Based on the above methods, we can guarantee that all the products in the system have the same number of ratings, from the perspective of hesitant fuzzy set, indicating that the amount of membership in each hesitant fuzzy set is the same. Furthermore, we can calculate the degree of similarity of two products by determining the degree of similarity of two hesitant fuzzy sets.
The common similarity calculation methods are Cosine similarity, Pearson's correlation coefficient, and the Jaccard coefficient. Here, according to the similarity formula of hesitant fuzzy sets, which is proposed by Xu [31], we can obtain the similarity between two products, h(Item j ) and h(Item k ), in the recommendation system:

Algorithm Implementation of Product Recommendation
On the basis of the previous known product similarity, we can make a recommendation based on the product. That is to say, for users with a high score of product Item j , we can recommend the first few products with high similarity to product Item j , so as to realize the recommendation of User i .
Without available product similarity, we can make a recommendation based on users. The similarity of User i and User x can be calculated by the following formula: where Item (User i ∩User x ) represents the quantity of the same product purchased by User i and User x , and Item All represents the quantity of all products. Obviously, the larger the S(User i , User x ), the higher the similarity between User i and User x . We can recommend the first few products that similar users have purchased and scored higher to User i , so as to realize the recommendation for User i .

Case Application
This paper selected the ml-latest-small dataset from Movielens, which contains 100,836 ratings of 9742 movies by 610 users. Considering the complexity and repetition of the computational process, this paper extracted 10 movies randomly whose MovieID = 260, 293, 316, 349, 457, 527, 661, 736, 1222, and 2502 from 9742 movies of "ratings.dat" in the ml-latest-small dataset. The rating data of each movie were obtained, as shown in Table 2.
As Movielens applies the five-point evaluation rules, we transformed the score value into membership according to γ ij = R ij Table 2. Movie rating sheet. ID  260  293  316  349  457  527  647  736  1222   We can obtain the data description of each product and the number of scores by further applying the hesitant fuzzy set: Obviously, l = max l 1 , l 2 , · · · , l j , l k , · · · = 9. We selected one active user, User 58 , as the object to be evaluated from Movielens in order to implement recommendations. The user scored a total of 112 movies, and the average score of these movies was 3.90 points. Thus, we can consider the user as pessimistic and we can obtain the hesitant fuzzy set that increases the data after sorting in ascending order. ≈ 0.996080

Movie
In the same way, the similarity between other movies can be calculated, as shown in Table 4. Item 316 and Item 1222 have the highest similarity in Table 3. After the data addition process, it can be considered that there is a certain degree of substitution between products. Moreover, there is a high similarity between Item 457 and Item 527 , as well as Item 293 and Item 349 . Item 260 and Item 736 have the lowest similarity and the worst substitution.
From the Movielens dataset, it can be seen that the five-point movies watched by User 58 are Movie ID = 293, 457, and 527. The top five movies similar to these movies that have not been watched by User 58 as recommended movies: Movie ID = 316, 1222, and 260.

Algorithm Verification
To verify the effectiveness of the proposed algorithm, we used user-based recommendation to recommend movies for User 58 . From the selected part of the Movielens score table, we know Item All = 10. According to Equation (6), we can calculate the similarity between User 58 and User i (i = 1, 6, 17, 28, 42, 57, 64, 68, 84, 91), as shown in the following Table 5. It is concluded that the largest number of users who have seen the same movie with User 58 are User 6 and User 28 , then found out the movies that User 6 and User 8 scored higher and that User 58 did not watch: Movie ID = 316, 1222, and 647.
It can be seen from the results based on the user and product recommendations that the method proposed in this paper is highly effective.
In order to verify the practical effect of the recommendation algorithm based on hesitant fuzzy sets, we randomly extracted ten groups of data from Movielens to express the calculation process. The verification of other data can be obtained through similar calculations. The calculation of all data can be obtained by computer, but the data conversion process cannot be seen in the programming process, so this paper only used part of the data calculation.

Conclusions and Prospect
The steps of the method proposed in this paper are as follows: First, the rating matrix of the data for processing is obtained in the recommendation system, which is often a sparse matrix. Then, the sparse matrix in the recommendation system is supplemented by the forward or backward strategies. Consequently, the similarity between the products is calculated using the supplemented sparse matrix and the idea of hesitant fuzzy sets, while the product-based recommendation is obtained according to the similarity. Finally, the algorithm is verified based on user recommendation, and the results indicate that the proposed method is very effective. The main innovations of this paper include two aspects. On the one hand, it proposes making up the sparse matrix through the forward or the backward strategies. It ensures that the similarity between the products is obtained without losing the rating information of the sparse matrix. The effective information of the rating matrix is maximized, thus providing a new approach for the sparse matrix processing and similarity discussion in the recommendation system. On the other hand, beginning from the inherent uncertainty in the recommendation system, the hesitant fuzzy set can solve the recommendation quality problem with the help of the processing tool of uncertain problems, which undoubtedly finds a new field for the study of hesitant fuzzy set.
Nevertheless, there are still some problems in the research of this paper. First, this paper attempted to use hesitant fuzzy set theory to solve the complex problems in the recommendation system. Although there was a good docking between the two, the recommendation quality was compared only to a simple user-based recommendation, and the results may be inaccurate. Determining other more complex and accurate methods for comparison is one of the focuses of the authors' subsequent work. Second, because the calculation of the whole dataset needs to be realized by computer programs, and this process cannot describe in detail how to use hesitant fuzzy sets for data conversion, this paper no longer shows the program code.
Author Contributions: Formal analysis, C.C.; methodology, Z.Z.; writing-original draft preparation, J.L. and C.C.; writing-review and editing, Z.Z. All authors have read and agreed to the published version of the manuscript.