Multi-Criteria Recommendation Systems to Foster Online Grocery

With the exponential increase in information, it has become imperative to design mechanisms that allow users to access what matters to them as quickly as possible. The recommendation system (RS) with information technology development is the solution, it is an intelligent system. Various types of data can be collected on items of interest to users and presented as recommendations. RS also play a very important role in e-commerce. The purpose of recommending a product is to designate the most appropriate designation for a specific product. The major challenge when recommending products is insufficient information about the products and the categories to which they belong. In this paper, we transform the product data using two methods of document representation: bag-of-words (BOW) and the neural network-based document combination known as vector-based (Doc2Vec). We propose three-criteria recommendation systems (product, package and health) for each document representation method to foster online grocery shopping, which depends on product characteristics such as composition, packaging, nutrition table, allergen, and so forth. For our evaluation, we conducted a user and expert survey. Finally, we compared the performance of these three criteria for each document representation method, discovering that the neural network-based (Doc2Vec) performs better and completely alters the results.


I. INTRODUCTION
According to [1], digital transformation facilitates new ways of value creation at all stages of the consumer decision process: pre-purchase (need recognition, information search, consideration or evaluation of alternatives), the purchase (choice, ordering, payment), and the post-purchase (consumption, use, engagement, service requests).This value creation is especially relevant in retailing to ensure competitiveness and gain a larger market share.Digital transformation came hand in hand with the penetration of mobile devices and data science in e-commerce.Although digital transformation [2] has been addressed from several approaches; multi-channel solutions, user modeling, Internet of Things, etc., all of them rely to some extent on the availability of information on operations, supply chains, and consumer and shopper behaviors.One of the imperatives in this digital transformation is obtaining a view of customer insights.
From the early steps (Amazon, 2003 [3]), the time to select the desired product has been the main issue for customers, especially if the high volume and rhythm of incorporation of products are considered.From more than two decades, Recommender Systems (RS) in e-commerce have tried to provide the most suitable products of services, to mitigate the product overload problem and to narrow down the set of choices [4]- [7].Success of major products & service providers mainly relies on RS, such as Amazon [3], Netflix [8], and Google [9].RSs improve customer satisfaction by reducing customer search efforts and as a consequence, they increase product/service sales.RSs provide users with items based on their interests, the preferences of other users and the item attributes.The recommendation can be carried out form several approaches depending on the type of data collected and the ways it is used by the RS: Content-Based (CB) filtering, Collaborative Filtering (CF ), and hybrid.Both systems CB and CF are widely used, and specially the item-based collaborative filtering where the similarity between items is calculated using users' ratings of those items.(developed by Amazon [3]).
Although RSs are used by users regularly in almost all digitalized sectors, its popularization in the grocery market, i.e., a retail store that primarily sells food products, has been delayed as a consequence of the low penetration of online grocery, the implementation of e-commerce for grocery goods.Recently, as well as in other sectors, the grocery industry is harnessing digital to innovate through data-drive business models.Online grocery is considered a central element in the new normal.In this respect, grocery recommendation uses customer's shopping history and product information to address various added value scenarios; predicting customers' future shopping, selecting best value for money products, offering new products user may like, etc. Besides, the availability of data about products and shopping positively affects the retailer by easing a sustainable business; offers & featured products, stock management, customer profiling, etc.
To meet the challenges above, in this paper, we use two document representation methods; BOW and Doc2Vec, to manage product data.We also address the three-criteria recommendation systems; Product, Package, and Health for each document representation model to the specific problem of, given a source product P , applying RSs to suggest similar alternative products where similarity is defined on the basis of a product taxonomy, as well as product characteristics; composition, packaging, nutrition table, allergens, etc.The solution to this problem supports various regular use cases in the grocery market, such as out of stock products, inventory clearance, best value options, new products, etc.In order to obtain the recommender model and to validate them, we use a real grocery dataset, referred to as MDD-DS, provided by Midiadia, a Spanish company that works on grocery catalogs.MDD-DS was constructed by analyzing the product's information (product labeling) and by experts' manual annotation so that products are assigned to a specific variety in a hierarchical structure for products.Therefore, the major contributions of this research work are the following ones: 1) Definition of an appropriate data structure to manage the different kinds of information linked to commercial products (especially in the food industry).2) Definition and identification of the appropriate document representation that works with MDD-DS to represent the products.
3) Design and implementation of a RS that automatically provides alternative products when the user's choice is not available.
The RS do not work with user's profile, it is exclusively based on the product's characteristics and the available catalogue.4) Design of three recommendation approaches based on the product's characteristics; composition, packaging, nutritional table, allergens, etc. 5) Proof of concept and validation to test the RS performance.We have conducted a survey for users and for experts to evaluate the RS approaches.The rest of this document is organized as follows: In section II, we briefly reviewed RS and document representation methods to manage product data in RS.The grocery MDD-DS is describing in section III.In section IV, the recommendation methodology is introduced with three specific approaches to product similarity, based on product composition, packaging, and healthy characteristics.To implement these three approaches to product similarity, we deployed two kinds of document representation techniques: a simple BOW (Bag of Words, in section V) and a neural network-based word embedding, Doc2Vec in section VI.For the two product representation models, experimental evaluation and discussion are described in section VII.Finally, in section VIII, we conclude the current work with some future research directions.

II. RECOMMENDER SYSTEMS
RS are a fundamental task for e-commerce, as the personal RS recommends providing items or products that satisfy the interests of different users according to their different interests and also recommends unknown items for the users that satisfy their interests [10], [11].As mentioned above, the three most commonly used methods in the RS are CB filtering, CF , and hybrid approach.
CB filtering [12]- [14] is one of the standard techniques used by RS.CB identifies items based on an analysis of the item's content, similar to items known to be of interest to the user.For example, a CB website recommendation service can work by analyzing the user's favorite web pages to generate a profile of commonly occurring terms.Then use this profile to find other web pages that include some or all of these terms.
CB technique has several issues and limitations [15]- [17].For example, (i) having no mechanism to assess the quality of an item supported by CB methods.Furthermore, CB methods generally require items to include some type of content that is amenable to feature extraction algorithms.As a result, CB technique tend to be ill-suited for recommending products, movies, music titles, authors, restaurants, and other types of items with little or no useful and analyzable content; (ii) CB is also have another problem that they rarely reflect current user community preferences.In a technique that recommends products to users, for example, there is no mechanism to favor items that are currently "hot sellers."Moreover, existing systems do not provide a mechanism to recognize that the user can search for a particular type or category.
CF [18], [19] is another common recommendation technique.In general, the CF recommends the item to the user based on a community of user interests, without any analysis of the item content.CF idea is to build a personal profile of ratings data through each item sold and rate it through the user.Besides the CF technique's concept to recommend the item to the user, the user's profile is initially compared with other users' profiles to identify one or more similar users.These similar users' highly-rated items are recommended to the user.A significant benefit of CF is that it overcomes the previously mentioned shortcomings of CB filtering.
The main issue in the above is how to measure user similarity.This problem inspires memory-based methods [20], which can be implemented as user-based [21] or item-based [22], [23].User and item-based methods have similar mechanisms, but item-based methods are used more to perform better at scale and with a lower rating density.
A hybrid approach is an approach that combines CB and CF (user-based and item-based) filtration approaches with attempts to eliminate their flaws and provides a more efficient result.It usually perform better than either filtering method alone.Here, the hybrid approach does combine between the CB and CF to solve the significant problems that are cold start [24] and sparsity problems [25].The cold start problem occurs when there is not enough new user data or ratings for a new item, so it is difficult to make recommendations for that new user or present the new item to a user.Regarding sparsity, it occurs when the user has not rated most of the items and the ratings are sparse.
In our work, we have some issues in providing a recommendation service and associated methods for generating personalized items.Science, the recommendation is based on the user's interests without considering the user profile.In this paper focused solely on the user's interest and how to recommend suitable items to each user.The benefit of this work is also that recommended items are identified by lists of similar items to the desired item.As mentioned above, in our paper worked on combine between CB filtering and CF (item-to-item) such as Amazon [3].Amazon has invented an algorithm that began looking at items themselves.It analyzes the recommendations through the items purchased or rated by the user and matches them with similar items, using metrics and composing a list of recommendations.That algorithm is called "item-based collaborative filtering.".This approach was also very appropriate and faster, especially for huge data sets.It was also developed in 2017 [26] to aggregate data about the user to develop a RS to rely on the data and the user behavior in selecting the items.It is still based only on the analysis of the items.However, it combines the analysis of the items with the user's data and choices.Regarding the related works, we see that the most widely used in the previous works is collaborative filtering.As shown in the following paragraphs.
In [27] used a collaborative filtering method to create the proposal for various items using accessible ratings and comments on Twitter.The authors have also evaluated the reviews given by blipper (a review website) for four unique products using the CF method.
When dealing with video as data to find suitable items for the user, there are also research works that apply collaborative filtering to recommend products through this kind of data.For instance, in [28] introduced an approach that includes item-to-item collaborative filtering to discover exciting and meaningful videos among the large-scale videos.This method runs on Qizmt, which is a.NET MapReduce framework.The RS in [29] also depends on monitoring the video content the user watches, the customer carrier database, and the vector database of products; therefore, the idea is to identify an item related to a part of the video content the user viewed that, and consequently determine the product category associated with the item, then analyze the characteristics of items similar to the item.That has been identified through the video's visualization, and it compares the customer value vectors and the product characteristics vectors.Moreover, start showing the recommended product to the customer.Other approaches take user interactions into account to recommend the right products.For instance, in [30], the recommender system collaborative filtering uses user interactions and keeps them to benefit the recommendation.It does not stop at the items that have been selected only from the users, but the proposed system is related to the category of items.
Recommendation systems usually require a large amount of user data.Safeguarding the privacy of this information is an important aspect that must be taken into account.For instance, in [31], an arbitrable remote data auditing scheme is proposed.This is based on a non third-party auditor for the network storage-as-a-service paradigm.The authors have designed a network storage service system based on blockchain, in which the user and the network storage service provider will generate the integrity metadata of the corresponding original data block respectively.All of that reach a consensus on the matter by means of the use of the blockchain technique.
Other approaches solve some problems in the recommendation system, such as scalability and cold start problem.For instance, the authors of [32] implements a user-based collaborative filtering algorithm on a distributed cloud computing platform that is Hadoop to solve the scalability problem of the collaborative filtering method.Besides, the authors of [33] propose a keyword-Aware Service Recommendation method called KASR.They also present a personalized service recommendation list and keywords used to indicate user preferences.A user-based collaborative filtering algorithm is adopted to generate the recommendations.They implemented KASR on Hadoop with real-world data sets to improve its scalability and efficiency in a big data environment.Furthermore, in [34] proposed a novel approach based on item-based CF use of BERT [35] to help understand the items and work to show the connections between the items and solve problems that are related to the traditional recommender system as cold start.This experiment was performed with an actual data set large scale with a whole cold start scenario, and this approach has overtaken the popular Bi-LSTM model.It used the item title as content along with the item token to solve the cold start problem.The approach also further identifies the interests of the user.
Other approaches consider recommending products that are in line with the user's interests without being affected by the problems faced by the recommendation system mentioned above and the problem of data sparsity.For instance in [36], a product recommendation system is proposed where an autoencoder based on a collaborative filtering method is employed.The experiment result shows a very low Root Mean Squared Error (RMSE) value, considering that the users' recommendations are in line with their interests and are not affected by the data sparsity problem as the datasets are very sparse.
In e-commerce, user data and purchasing behavior play an important role [37], [38].However, in our scenario we are totally agnostic about the customer behavior.The company Midiadia does not provide complete e-commerce solutions, but gives enriched catalogues to e-commerce platforms.Consequently, Midiadia has not information about the customers interactions, habits or any kind of profiling.To the best of our knowledge, no other study provides a solution to this problem (recommending a similar product) taking exclusively into account the product information: ingredients, size, packaging, health messages, allergens, etc.All this consideration without going back to the customer data, depends only on the product description, such as name, brand, ingredients, legal name, and size; likewise, other data helps to know that the product is also healthy, such as sugars, fats, carbohydrates and excluding all the contents that can cause allergies.Our proposition fills an exciting void for many e-commerce dominants.

A. Representation Models
Regarding document representation models, we provide some representation models regarding the techniques used in this paper.We start with simple techniques such as Bag-Of-Words, TF-IDF.
Frist, Bag-Of-Words (a.k.a.BOW [39], [40]) is a basic, popular, and most straightforward approach among all other feature extraction methods.It is used to create document representations in Natural Language Processing (NLP) [41] and Information Retrieval (IR) [42].The text is represented as a bag that contains many words.It forms a word presence feature set from all the words of an instance.The method does not care how often the word appears or the order of the words; the only thing that matters is whether the word is in the word list.It is generally used to extract features from text data in various ways.A bag of words is the presentation of text data.It specifies the frequency of words in the document.A feature generated by bag-of-words is a vector where n is the number of words in the input documents vocabulary.Second, TF-IDF [43] short for term frequency-inverse document frequency, is a technique that can be used as a weighting factor not only in IR solutions but also in text mining and user modeling.This method, as in the bag-of-words model, counts how many times a word appears in a document.However, words which are repeated so many times like the stopwords (the, of,...) are penalized with this technique because of the inverse documentary frequency weighting.Here, the more documents a word appears in, the less relevant it is.Therefore, a word that is distinctive and frequent will be high-ranked if it appears in the query introduced by the user.
On the other hand, Word embedding is a term used for the representation of words for text analysis [44]- [47].It also maps of words in vectors of real numbers using the neural network, the probabilistic model, or the dimension reduction on the word co-occurrence matrix.Word embeddings are also very useful in mitigating the curse of dimensionality, a very recurring problem in artificial intelligence [48].Without word embedding, the unique identifiers representing the words generate scattered data, isolated points in a vast sparse representation [49].With word embedding, on the other hand, the space becomes much more limited in terms of dimensionality with a widely richer amount of semantic information [50].With such numerical features, it is easier for a computer to perform different mathematical operations like matrix factorization, dot product, etc. which are mandatory to use shallow and deep learning techniques.
Regarding word embedding, unfortunately, the representation of meaning with different symbols cannot orchestrate the same meaning as words.Early attempts solved this problem by clustering words based on the meaning of their endings and representing the words as high-dimensional spaced vectors.A new idea was recently proposed inspired by the neural network language model, and the model proposed is known as Word to Vector (word2vec) [51].These embeddings are easy to work with since the vectors can be manipulated by many algorithms like dimensionality reduction, clustering, classification, similarity searching, and many more.
Two models generate the representation of word2vec have been presented in order to produce such dense word embeddings: the Continuous Bag of Word (CBOW) model [52] and the Skip-Gram model [53], [54].Each of the two models train a network to predict neighboring words.Suppose that a sequence of tokens (t1, . . ., tn)is provided.The CBOW model, first randomly initializes the vector of each word and then using a single layer neural network whose outcome is the vector of the predicted word, optimizes the original guesses.One can easily understand that the size of the Neural Network controls the size of the word vector.The Skip-gram model uses the word, in order to predict the context words.After explaining the meaning of Word2Vec, however, the goal of doc2vec is to create a digital representation of the document, regardless of its length.But unlike words, documents don't come in logical structures like words.In [55] they used Word2Vec template and added below paragraph id to build doc2vec.

III. DATASET
The data set used in this paper was provided by Midiadia, a Spanish company which works to convert textual information in the product package into product category and product attributes by mixing automated natural language processing and manual annotation.The Midiadia DataSet (MDD-DS) is taxonomy where the 3 upper levels are called Category, Subcategory and Variety.Every product in MDD-DS includes; the taxonomy position, i.e. values for Category, Subcategory and Variety as well as a set of product attributes.e.g.name, ingredients, legal name, brand, product size, etc., as shown the extract of real data in Table I.We have also used these product components before in [56], [57] to provide a solution to automatically categorize the constantly changing products in the market, which is the first part of our investigation.
• 'European Article Number' (EAN) is an internationally recognized standard that describes the barcode and numbering system used in world trade to identify a specific product that is specifically packaged and has a specific manufacturer in retail.• 'Category', 'Subcategory', and 'Variety' are a hierarchy and can be displayed by a company as catalog organization levels in the classification.The companies manufacture the products and each company has an identifying name and is listed as the brand.• In addition, there are some properties compatible with the EU regulation [58], for example, name, legal name and ingredients, as indicated in Table II.• 'Servings' is a number that determined based on the amount of product and is sufficient for how many people.In addition, Midiadia supported us with two versions of MDD-DS to implement recommendation systems and cover all the company's requirements.The basic version which was called MDD-DS1, contained all the above information plus some information related to the nutrition table, such as sugar and fat, and some messages on the product packaging such as the sugarfree or the free gluten and other messages on the cover of the product.Of course, these messages are placed according to the components of each product, as shown in Table III.The extended version which was named MDD-DS2, contained all the above information besides the characteristics of the Brand Type and Brand attributes, and the price was also added randomly besides more information about the nutrition table such as carbohydrates (Carbs), dietary fiber (df ), and a percentage of saturated fat (sf ) and good fat (gf ), protein (pn) and salt (sa).It also contains allergens such as soy, fish, eggs, nuts, etc., as characteristics that will be mentioned in detail and how they are used in our research, as shown in Table IV.
• 'Carbohydrates' are considered one of the three main food categories and a source of energy, and they are also basically sugars and starches that the body breaks down into glucose (we can say that it is a simple sugar that the body can use to nourish its cells).• 'Dietary Fiber' is part of the food that has been separated from plants and cannot be completely broken down by human digestive enzymes.

IV. METHODOLOGY OVERVIEW
Taking into account out dataset, the proposed recommender systems does not have information about user's history so that CF should be excluded.An hybrid item-based CF is designed for the specific scenario of finding similar products to a source product P where similarly will be defined according to, first, the V ariety of the product in the MDD taxonomy and, second, other attributes of the product.The alternative product to P will be a product in the same V ariety which moreover meets other similarity requirements over the product attributes.Three similarity approaches have been defined: (i) Product Composition (PRO-COM), where similarity is scored according to product composition (ingredient, name, legal name, etc.); (ii) Package-based (PK-BD), where similarity is scored according to the size of the product chosen by the user; and (iii) Health-based (HTH-BD), where similarity is scored according to a healthy grade by using the product nutrition table.The recommendation methodology considers allergens apart from these three similarity approaches as follows.In MDD-DS, several product attributes are related with allergens: (Nuts, egg, hazelnuts, fish, sulfates, peanuts, mollusks, lupine, gluten, mustard, soy, crustaceans, milk and its derivatives including lactose, sunflower seeds and sesame).Allergens are considered pre-conditions for suggesting an alternative product, that is, if the user-chose a product which includes sugar, water and nuts), the allergen precondition for the alternative products is possibly containing nuts but not other allergen.So, the alternative product may nuts or not, but it should not contain other allergens.The proposed methodology is shown in Figure 1.In order to obtain the model a training set is defined in order to obtain the recommender model with the following steps: (A) MDD-DS is preprocessed; (B) for every product P the dataset is filtered by allergen preconditions; (C) the three similarity scores are obtained (PRO-COM, PK-BD, and HTH-BD).Then at the bottom of the model is the automated recommendation when the user selects the product.The recommendation system recommends an alternative based on the three approaches.A survey is conducted to consider the users in the three approaches.
To implement the recommendation, we carried out collaborative filtering as a first model.Then we add more features and a neural network solution to improve our results.The Figure 2, illustrate the strategy of this paper.First model, The dataset (MDD-DS1) is analyzed by preprocessing.Three approaches were then developed, which are (PRO-COM, PK-BD, and HTH-BD) by collaborative filtering.A survey is carried out to take the users' opinions in the three approaches.In the second model, the approaches are redeveloped based on user feedback.We added more features and more filters, like filtering by allergen features.We added a neural network solution to improve our results.Therefore, the company extends the dataset, called (MDD-DS2), to contain additional features to develop the approaches, so the data is analyzed through preprocessing.A neural network is built on the products.Then, it extracts the product as a vector and compares it to the rest of the products using similarity techniques and then makes the approaches (PRO-COM, HTH-BD, and PK-BD).
All three approaches take allergens features into account, which means that as the same explain above, if the product is, for example, nut-free, the alternative products are too.Then the approaches are sent to an expert by the company for evaluation.This has indicated that the modification is suitable for the company's requirements.Hence, a questionnaire was published for users to evaluate the recommendation system after these modifications.The last thing was a comparison of the evaluation of the users.

V. RECOMMENDATION SYSTEM BASED ON ITEM-BASED COLLABORATIVE FILTERING (RS-CF)
This paper proposes a methodology to develop RS-CF for the retail sale of products.Three recommendation methods have been developed, each of which recommends alternative products to help the user obtain the product of interest.Our solution implementation takes the data (shown in the section III) for each approach as the input control variables.Alternative products are then recommended for each approach and then presented to the user to choose the right product for him and evaluate RS-CF.The modeling methodology consists of 2 main steps as show in Figure 3: (A) data pre-processing; (B) build the RS-CF approach; the RS-CF was done in three ways, namely: (i) Product Composition (PRO-COM) approach, where similarity is scored according to product component (ingredient, name, legal name, etc.); (ii) Package-based (PK-BD) approach, where similarity is scored based on the PRO-COM result besides the size of the product chosen by the user; and (iii) Health-Based (HTH-BD) approach, where the similarity is scored according to the PRO-COM result and taking into account that the allergen information is being considered along with a healthy degree using the product nutrition table.In order to evaluate the RS-CF approaches by the user, we conducted a survey that includes many of the products and similar products.It begins with PRO-COM and PK-BD approach to performing preprocessing, and building the product matrix.The algorithm made in both approaches will be explained similar with the added feature of PK-BD approach.

A. Preprocessing
The PRO-COM and PK-BD approaches are the first two RS-CF approaches that use the first data release (MDD-DS1).MDD-DS1 comprises 29,236 products, so the data was processed and cleaned by removing the empty rows from the variable name and serving as well.The EAN column is then scanned for duplicates and removed.Next, we ignore that the variety contains less than four products; this number is required to implement the algorithm 1 .Therefore, the number of products after cleaning the data is 20,371, we mentioned the last steps in Algorithm 1 as a Cleaning(MDD-DS1) step.The data is then preprocessed by extracting all the words for the attributes name, legal name, and ingredients.Consider a Corpus C of each product p, C(p[Name], p[Legal Name], p[Ingredients]).That means that the three attributes are combined in a single text to describe the product des(p).This description was obtained (des(p)) after cleaning the product Clean_p by following these steps: (i) transform the parentheses into space; (ii) the numbers, stopwords, punctuation, and extra spaces are removed; (iii) all letters are converted to lowercase; and (iv) duplicate strings are removed.Algorithm 1 shows all the preprocessing steps for PRO-COM and PK-BD.
Algorithm 1 RS-CF: PRO-COM and PK-BD preprocessing pseudocode end for 13: end procedure Thus, the words are divided and a vector of words is created for product_words(p), an example is shown in Table V.We  The product matrix X[m, n] is a M xN matrix, where each row M represents a product p of the MDD-DS1 so that M is the total number of products; and each column N represents a token ti ∈ ⃗ t.The next step is to use BOW to rate the words on each product.The goal is to convert each free text product into a vector that we can use as an RS model input.Since we know that the vocabulary in all_products_words contains 10,707 words, we can use a fixed-length document-representation of 10,707, with a position in the vector to score each word.The simplest scoring method is to mark the presence of words as a boolean value, 0 for absent, 1 for present.Using the arbitrary order of all_products_words listed above in our vocabulary des(p), we can loop through the products and convert them to a binary vector, as shown in Table VII.

B. RS-CF: Product Composition Approach
Product composition (PRO-COM) is the main approach upon which the recommendation system is built.PRO-COM is built to obtain the alternative product based on the similarity ratio.A product matrix is used and these steps are followed to build PRO-COM approach.Let Z a variety.Let ⃗ d be a |Z|-dimensional vector.Here, ⃗ d = (d1, . . ., d |Z| ) where di = d(qi, qj) denoting the following distance between the products qi and (qj ∈ Z).
The product p is calculated by getting the absolute value of the difference considering each column of the product matrix (all_products_words) and then adding up all the distances as shown in Equation 1.When a product p ∈ Z that is not available, the RS-CF (PRO-COM) recommended pa ∈ Z alternative product would be obtained.If there is more than one pa value ⃗ pa = (pa 1 , . . ., pa t ), a t-dimensional vector ⃗ pa is created being t the number of alternatives.The alternative products given (⃗ pa) will be those that have the lower distance to the product p as show that in Equation 2 and can be seen an example of a distance vector ⃗ pa = (pa 1 , . . ., pa t ) in Table VIII.

C. RS-CF: Package-based Approach
The PK-BD approach is to offer alternative products with product size in mind, so PK-BD approach adds more condition using an additional feature called serving (size per person).In the same time, taking the PRO-COM distance result into account.They are compared again to an unavailable product p but with regards to the package size.Each product p of the variety Z, a distance between the product p is calculated the absolute value of the package size difference.Then, the alternative products pa given will be those that have the lower distance with respect to the product p.Let ⃗ s be the vector that contains the package size of the products in Z (|Z|-dimensional vector).Hence, ⃗ s = (s1, . . ., s |Z| ).Let ds(qi, qj) the following distance between the products qi and qj according to their package size as shown in Equation 3.
Considering the PK-BD approach, if a product p ∈ Z is not available there are two steps to follow in order to get the alternative product pa ∈ Z; (i) First of all, the PRO-COM distance is taken into account by applying Equation 2. (ii) Next, the package size distance is additionally applied to the products in vector ⃗ pa|p = (pa 1 |p, . . ., pa t |p) in order to select the alternative product pa to be offered to the user as show that in Equation 4. If there is more than one pa| l value, a u-dimensional vector ⃗ pa| l = (pa 1 | l , . . ., pa u | l ) is created being u the number of alternatives.An example of a matrix having two distance vectors taken into account the criterions selected is shown in Table IX.

D. RS-CF: Health-based Approach
The health-based approach (HTH-BD) is the tricky one to consider recommending health products to the user based on their choices.The most common nutritional table properties fats, sugar are used to help recommend healthy products.The cleanliness of the data mentioned in subsection V-A is used in addition to replacing the serving with the additional properties, which are fats, sugars, so that the rows with blank values for the name of the product, sugar and fat are eliminated, so that the sugar values in the remaining products range between 1 and 1087 grams, and the fat values in the remaining products also range between 1 and 937 grams.Additionally, 13 additional columns named Messages that provide allergen information are being considered.The number of products becomes 20,259 products and 24 features after cleaning the data.
About the Messages columns, after analyzing all the tags indicating the absence of allergens 50 different strings are obtained in the form Table X and stored in a vector named withoutwords.Here, taking into account the law of the European Union3 on the labeling of food products that obliges companies to report certain allergens that may endanger the health of the customer, sensitivity will be taken into account.Most likely, a law with similar objectives was previously approved in Spain in 2004 and [without colorings, without preservatives, without additives, without transgenics, without gluten, without artificial colors, without trans fats, without artificial flavors, without fat, without molluscs, without lactose, without artificial preservatives, without sugar, without egg, without milk and its derivatives, without cholesterol, without added preservatives, without added sugar, without salt, without palm oil, without soy, without added salt, without nuts, without peanuts, without palm , without sesame, without peanuts, without sulfites , without mustard, without saturated fat, without alcohol, without calories, without caffeine, without sweeteners, without hydrogenated fats, without palm oil and fat, high protein, without added phosphates, without allergen, starch free, celery free, without artificial sweeteners, without fish, without crustaceans, without glutamate, without lupins, low fat, low in energy] amended in 2008 4 .The 50 different strings obtained previously, 17 are the relevant ones in terms of allergies.As it can see in Table XI.After performing the necessary analysis and clarification, the information obtained from MDD-DS1 is useful to develop the RS-CF HTH-BD algorithm.
TABLE XI: Allergen features in the withoutwords vector.
[without allergens, without gluten, without crustaceans, without egg, without fish, without peanuts, without peanuts, without soy, without milk and its derivatives, without lactose, without nuts, without celery, without mustard, without sesame, without sulphites, without lupins, without molluscs] Aside from the data obtained from the Ingredients variable, the Messages columns associated with the respective product are also obtained for each iteration.Here, for each product, the 13 Messages columns are handled in the following way: (1) 13 columns for the current product in the iteration are obtained, with the blank columns removed.(2) To remove additional information unrelated to the allergen, values are also removed from columns that do not begin with the string "without".(3) The duplicate strings obtained are removed, strings are converted to lowercase.(4)The strings are divided by a point followed by a space, substrings preceded by a comma are removed.( 5) Some incorrect parsed characters (overridden characters such as \r and \n backslashes) are removed, as well as some strings with errors and full stops are removed.The word vector is constructed with the resulting string.
As in the PRO-COM and PK-BD approaches, the product_words list is generated with the difference that here just the Ingredients column is considered.This is, it contains a number of elements equal to the number of different products existing in the MDD-DS1 (in the HTH-BD approach, the MDD-DS1 has 20,259 elements).The vector of words belonging to each product obtained in the text string processing is stored in each element of the list after using the steps of Clean_p.The list is shown in Table XII.In addition, a list called withoutlist which stores the vector with the healthy features obtained from the Messages columns for each product is created.Also, the vector withoutwords stores once the different healthy features, having 50 elements.The list and the vector are shown in Table XIII and Table XIV, respectively.The entire preprocessing is shown in Algorithm 2. It is relevant to know that, a subset comprising 17 elements of the withoutwords vector is considered in order to check for allergens in a product, whose data about it can be accessed by indexing the withoutlist with the index of the product in the MDD-DS1.Cleaning(MDD-DS1) 3: product_words[] ← new_list(m) 4: withoutlist ←− new_list(m) 5: withoutwords ←− new_vector (0) 6: for i ← 1 : m do end for 24: end procedure After processing the data to be valid for building the health-based approach.Let ⃗ g be the withoutwords vector (50 elements).Then, ⃗ g = (g1, . . ., g50).Let ⃗ a be the subset of the withoutwords vector considering allergens (17 elements).Hence, ⃗ a ⊂ ⃗ g / ⃗ a = (a1, . . ., a17).Let ⃗ v be a the m-dimensional wordvectors list.Each element contains a vector ⃗ vi.Hence ⃗ v = (⃗ v1, . . ., ⃗ vm).Likewise, ⃗ vi = (vi [1], . . ., vi[dv i ]) where dv i is the length of the vector contained in the i element of the list ⃗ v.Note that ∀ k ∈ [1, . . ., dv i ], vi[k] is a string.Let ⃗ vs be a |Z|-dimensional subset of the Z elements of the m-dimensional ⃗ v wordvectors list.Each element contains a vector ⃗ vs i .Hence ⃗ vs ⊂ ⃗ v and ⃗ vs = (⃗ vs 1 , . . ., ⃗ v s | Z| ).Likewise, ⃗ vs i = (vs i [1], . . ., vs i [dv s i ]) where dv s i is the length of the vector contained in the i element of the list ⃗ vs.Note that ∀ k ∈ [1, . . ., dv s i ], vs i [k] is a string.Let ⃗ n pl be a |Z|-dimensional vector.Here, ⃗ n pl = (n pl 1 , . . ., n pl |Z| ) where n pl i = dv sq i . Each element denotes the processing level of a product.
Let ⃗ w be the m-dimensional withoutlist list.Each element contains a vector ⃗ wi.Hence ⃗ w = ( ⃗ w1, . . ., ⃗ wm).Likewise, ⃗ wi = (wi [1] .Each element denotes the number of healthy features of a product. Let ⃗ c be a |Z|-dimensional vector.Here, ⃗ c = (f1 + s1, . . ., f |Z| + s |Z| ) where ci = fi + si.It stores the fat and sugar features about the products.Here, fi and si denote, respectively, the fat and sugar quantities in grams of the product qi.Let da i = da(qi, qj).This denotes the following similarity measure (taking into account allergens) of the product qi with respect to the product qj as shown in Equation 5. where

. , dw sq i ]
Being the product p ∈ Z unavailable, the alternative product pa ∈ Z is obtained by following the next steps: (1) The first criterion is to consider the similarity about allergens.Thus, the alternative product pa|a is selected according to that measure: If there is more than one pa|a value, a ua-dimensional vector ⃗ pa|a = (pa 1 |a, . . ., pa ua |a) is created being ua the number of alternatives.(2) Secondly, the sum of the sugar and fat quantities are considered to select the alternative product pa|c among the ones in vector ⃗ pa|a: If there is more than one pa|c value, a uc-dimensional vector ⃗ pa|c = (pa 1 |c, . . ., pa uc |c) is created being uc the number of alternatives.(3) The next criterion to get the alternative product pa| h (among the ones in vector ⃗ pa|c) is the number of healthy features: If there is more than one pa| h value, a u h -dimensional vector ⃗ pa| h = (pa 1 | h , . . ., pa u h | h ) is created being u h the number of alternatives.( 4) The last step to get the alternative product pa|n pl involves the level of processing of the products selecting from the vector ⃗ pa| h : If there is more than one pa|n pl value, a un pl -dimensional vector ⃗ pa|n pl = (pa 1 |n pl , . . ., pa un pl |n pl ) is created being un pl the number of alternatives: In conclusion, the algorithm 3 compares first each qi product in the variety Z to the product p with regards to the similar features about allergens.That similar products are then ranked considering this features in order: the sum of the fat and sugar amounts (in increasing order), the number of healthy features (in decreasing order) and the processing level (in increasing order).An example of a matrix with vectors defining each of the criterions as columns is shown in Table XV.After building the three approaches, a user survey was conducted.Products and alternatives were presented according to each approach.Subsequently, the analyses of the results were compiled.We developed the approach to improve the results and meet the company's requirements.
Algorithm 3 HTH-BD approach of RS-CF: Algorithm pseudocode end for 15: The products are sorted.The minus sign means the order is decreasing.16: end procedure

VI. RECOMMENDATION SYSTEM BASED ON NEURAL NETWORK-BASED (RS-NN)
The idea of improving RS-CF is based on improving the result and considering more conditions and filtering: (1) Adding allergens' properties as a pre-condition in the recommendation for three approaches (PRO-COM, PK-BD, and HTH-BD).For example, the product includes (flour, eggs, water, nuts, and salt), so the alternative product will include free allergens, or the maximum allergens are eggs and nuts.( 2) We also consider more conditions for three approaches based on using more additional features such as brand type, brand attribute and price.(3) Besides considering the more characteristics of the nutritional table, such as carbohydrates, dietary fiber, a percentage of saturated fat, good fat, protein and salt to improve the HTH-BD approach.(4) Rearrange the approaches of PRO-COM, then HTH-BD, then PK-BD.Also, to improve the result, we thought about using a deep neural network like Doc2vec to represent the data set and build a model to help obtain alternative products.That is why we call this model a Recommendation system based on neural networks (RS-NN).
After that, we use many of the similarity techniques like Cosine, Jaccard, Euclidean, and Manhattan to obtain and sort similar products.Subsequently, we conduct a comparative study to determine which technique is best to sort similar products based on the experts' results.

A. Preprocessing
In order to build the RS-NN approaches, we will use some more features , such as the brand type and the brand attributes, as well as the addition of 16 characteristics that cause allergies (Nuts, egg, hazelnuts, fish, sulfates, peanuts, mollusks, lupine, gluten, mustard, soy, crustaceans, milk and its derivatives including lactose, sunflower seeds and sesame).So, the Midiadia extend new version of data set (MDD-DS2) which is the number of products 45,198 product and the number of features is 368 features.To improve the approaches, the data cleansing, preprocessing, and approach building phase was reused, so blank rows were removed from each name, brand, brand type, brand attributes, variety, ingredients, and legal name.Also, the elimination of the duplicate rows that have the same name and brand and finally the empty and duplicate rows were eliminated from the EAN variable.The main idea of the recommendation system is to have alternatives for the product of the same variety.Therefore, we will remove all variety with less than the first quarter, which equates to 15 products, Table XVI shows the products per variety after eliminate the variety for PRO-COM approach (subsection VI-D).The variety Z, p ∈ Z that is not available, the recommended pa ∈ Z alternative product.After that, a product p is in a specific variety called "other" (p ∈ Z = "other"), it is listed at a subcategory.let SC a subcategory, where Z ∈ SC, so the other subcategories SC will be remove.Besides, let Ba a Brands attributes have a product p ∈ Ba that is not available, the recommended pa ∈ Ba alternative product.Finally, it filters the products p according to the allergens feature, let af be allergen features.Eliminating all the products that contain more or different allergens feature af .
The products p and pa alternative product will be preprocessed by extracting all the words for each of the following attributes: Name, brand, Ingredients, legal Name, and allergens feature for each product p, pa alternative product.Consider a Corpus C of (N ame, brand, Ingredients, legalN ame, allergensf eature) to describe the product des(p).This description (des(p)) was obtained after the 3-step purge process: (1) Use the tokenize function to make a list to convert everything to lower words and separate each word product_words(p).(2) Use stopwords in Spanish to filter by stopwords such as remove [and, or, etc.], use the number filter to remove all numbers from the list.(3) Using a lemmatization step takes the tokens and divides each one into a lemma 5 , similar words are then removed, as show in Table XVII step (3).

B. Product Representation
As explained earlier the doc2vec in subsection II-A, it shows the simplest way to convert a token to a fixed-size digital vector, as it proposed a neural network-based word representation method called Word2Vec.Give a sequence of training tokens [t1, t2, . . ., tN−1, tN ]; the goal of Word2Vec is to maximize the average log probability [59].
where s is the size of the window to preserve contextual information, the token tn can be easily be predicted using a multilabel classifier like SoftMax function: p(tn|tn−s, . . ., tn+s) = e j tn i e j t i , where each jt i is the (ti) output value of a feed-forward neural network calculated with where x, h, f , and R are terms for the bias between the hidden and output layers, the weight matrix between the hidden and output layers, the mean or sequence of product tokens, and the word embedding matrix, respectively.Doc2Vec extends from Word2Vec that tries to define, in this case, a continuous vector fit to a product to preserve the semantic relationship between the different products [60].Like Word2Vec, each token is represented by a d-dimensional continuous vector (d << |v|, which is the size of the vocabulary in the des(p)).Furthermore, the same product p is also represented by a continuous vector in the same space as the word vectors.In Doc2Vec, each product p is assigned to a unique vector that is represented by a column in matrix D, while each token in the des(p) is assigned to a unique vector that is represented by a column in matrix T .Therefore, the only change in the network formulation is to add D to the Equation 12 as follows: When the network is adequately trained, it can obtain a distributed representation of each of the products p.Therefore, the products were trained using three elements of the Doc2Vec model, vector size with 50 dimensions ⃗ a = (a1, . . ., a50), and iteration over the training set 40 times.Set the minimum word count to two to ignore words with very few frequencies.Finally, we have a product matrix X[m, n] is a M xN matrix where each column N represents a vector for each product ⃗ a = ([a1, . . ., a50], tag), ∀ k ∈ [1, . . ., 50], where tag ∈ pid, and each row M represents a product p of the MDD-DS2 so that M is the total number of products ⃗ p = (p1, . . ., pm), as shown in Table XIX.

C. Similarity
The RS-NN approaches used the similarity techniques such as (Cosine, Jaccard, Euclidean, and Manhattan) to calculate the distance between the product qi and qj.Let di = d(qi, qj), this denotes the following similarity measure taking into account variety Z, brand attribute Ba, and allergens features af of the product qi with respect to the product qj as show that in Equation 14.
Having a product p ∈ Z and p ∈ Ba that is not available, the recommended pa ∈ Z and pa ∈ Ba alternative product would be obtained as follows taking into account the allergen featuresas shown in Equation 15, the first equation is the output from Cosine, Jaccard, the second one for Euclidean, and

D. RS-NN: Product Composition Approach
The product composition (PRO-COM), where similarity is scored according to product matrix to offer alternative products.In addition, the alternatives taking into account the distance based on d(qi, qj), they are compared to the unavailable product but with regards to the brand, brand type, and price.For each product of the variety Z, and p ∈ Ba brand attribute, a distance between the product p is calculated using similarity techniques Equation 14.
Considering the PRO-COM approach, if a product p ∈ Z, and p ∈ Ba is not available in order to get the alternative product pa ∈ Z, and pa ∈ Ba.Let ⃗ b be the vector that contains the brand of the products in Z (|Z|-dimensional vector).Hence, ⃗ b = (b1, . . ., b |Z| ).Beside, let ⃗ bt be the vector that contains the brand type of the products in Z (|Z|-dimensional vector).Hence, ⃗ bt = (bt1, . . ., bt |Z| ).In addition, let ⃗ P R be the vector that contains the price of the products in Z (|Z|-dimensional vector).Hence, ⃗ P R = (P R1, . . ., P R |Z| ).Considering of verifying the brand b and brand type bt in the product p and that pa alternative product contains the same value for the two variables (b, bt) , so we found three possibilities: (1) The alternative product qj has the same attributes value for (b, bt) of the product qi.
The alternative product qj has the attribute value of one of (b, bt) of the product qi.
The alternative product qj does not have the same value for (b, bt) of the product qi.
And also to check the price, there are two options in each possibility of variables (b, bt); the price P R of the alternative product qj is higher than the product qi or vice versa.Let CJ[dr i ] = CJ[dr(qi, qj)] for cosine and jaccard, let EM [dr i ] = EM [dr(qi, qj)] for euclidean and manhattan.This denotes the following similarity measure of the product qi with respect to the product qj as show that in Equation 16.

CJ[dr(qi, qj)]
(P R[q j ] / P R[q i ]) / d(q i ,q j ) (16) After check the possibilities for pa alternatives product of variables brand and brand type (b, bt), and calculate the distance CJ[dr(qi, qj)], EM [dr(qi, qj)].Let CJ[dm i ] = CJ[dm(qi, qj)] for cosine and jaccard, let EM [dm i ] = EM [dm(qi, qj)] for euclidean and manhattan.This denotes the following similarity measure of the product qi with respect to the product qj as shown in Equation 17. Lastly, we will multiply the distance CJ[dr(qi, qj)], EM [dr(qi, qj)] with weight like (100, 10, 1) to help the pa alternative product's ordering.
CJ[dm(qi, qj)] = dr (q i ,q j ) × 100, if P os (1) dr (q i ,q j ) × 10, if P os (2) dr (q i ,q j ) , if P os (3) EM [dm(qi, qj)] = dr (q i ,q j ) , if P os (1) dr (q i ,q j ) × 10, if P os (2) dr (q i ,q j ) × 100, if P os(3) (17) The distance is additionally applied to the products in order to select the alternative product to be offered to the user as shown in Equation 18.If there is the output from the similarity techniques (Cosine, Jaccard, Euclidean, and Manhattan) more than one alternatives product pa| b value, a y-dimensional vector ⃗ pa| b = (pa 1 | b , . . ., pa y | b ) is created being y the number of alternatives.
Finally, after its development, PRO-COM works on three main characteristics which are the brand, brand type and the price.After obtaining a vector ⃗ pa| b , the alternative products are ordered from closest to furthest.

E. RS-NN: Healthy-based Approach
The health-based (HTH-BD) approach depends on the result of PRO-COM approach and make an equation for nutrition table features.The HTH-BD was based on the most health-based characteristics found in the nutrition table, namely fats (f ), containing a percentage of saturated fat (sf ) and a percentage of good fats (gf ); (sf, gf ∈ f ), carbohydrates (Carbs), and containing dietary fibers (df ) and sugars (s) (df, s ∈ Carbs), and finally salt (sa) and protein (pn).
Eight characteristics play an important role in the product, whether or not it becomes a healthy product.So, the products that do not have values for these characteristics are removed.Also, as we mentioned before, if the product p in the variety Z which means p ∈ Z and brand attribute Ba include variety Z, Z ⊂ Ba, the HTH-BD approach recommends pa alternative products within a variety Z, then the products p are analyzed within the variety Z, and Z that contain less than four products are removed, which is the first quarter of the value of the products for each variety, so it becomes the minimum variety that contains 4 products and the maximum number of product per variety is 203, the median is 13 products and the mean contains about 24.3 products for the one variety, this is after analyzing the products p of HTH-BD approach, shown in Table XVI.
After that, we check the values of nutrition tables characteristics that have the same unit of measurement such as (Grams, %, etc.).It turns out that the nutrition tables characteristics are measured in grams except the percentage of good fats gf and dietary fiber df , and each of them is measured in percentage.They are converted to grams [61], [62] using gf = (gf /100) * f .Also, converted the dietary fiber variable, df = (df /100) * Carbs.
This approach has used some nutrition books and nutrition experts [63]- [66] to arrange the nutritional table features used in this approach.The result of this arrangement was (protein, then good fats, then dietary fiber, then salt, then sugars, then carbohydrates, then saturated fat, and finally fat).In our research, an additional weight value was added to each nutrition table feature to help us arrange the product alternative.
Let ⃗ N t be a |Z|-dimensional vector.Here, ⃗ N t = ( ⃗ h1, . . ., ⃗ h | Z|) where N ti = ( ⃗ hi).Let d h (qi, qj), this denotes the following similarity measure according to their nutrition table of the product qi with respect to the product qj Equation 19, the similarity calculated based on the output of PRO-COM Equation 18.
Being the product unavailable, the alternative product pa| h is selected according to that measure.The less value in the alternative product becomes a healthy product for the user: If there is more than one pa| h value, a u h -dimensional vector ⃗ pa| h = (pa 1 | h , . . ., pa u h | h ) is created being u h the number of alternatives.

F. RS-NN: Package-based Approach
The Package-based (PK-BD) approach is considered to include all the approaches together as it depends on the PRO-COM and HTH-BD approaches.The algorithm was developed based on the result of the HTH-BD approach.First, products that do not contain values for the three variables, which are product size, units of measure, and servings are removed, and these are the variables on which this approach depends.Second, as mentioned above, the product p and alternative products pa must be within a variety Z, and within a brand attribute Ba, so the quantity of products within the varieties is analyzed so that the varieties containing less than the first quarter value are removed from the number of products within each Z and its value is 4. Therefore, in the PK-BD approach as shown in Table XVI, the minimum product per variety is four products, and maximum of the product per variety is 203 products, and the median number of products is 13, and the average becomes 24.3 products.The algorithm is based on arranging alternative products pa based on the servings Sg value of the product p.Let ⃗ Sg be the vector that contains the servings of the products in Z (|Z|-dimensional vector).Hence, ⃗ Sg = (Sg1, . . ., Sg |Z| ).The value of servings Sg in the product qi is compared to the alternative product qj , and there are two possibilities: namely that the product qi has servings Sg value greater than the servings Sg value of the alternative product qj, or vice versa.Let dse i = dse(qi, qj).This denotes the following similarity measure of the product qi with respect to the product as shown in Equation 21.
in the order given by the various approaches of the RS, so that the MSE [70] has been calculated for each approach taking into account the following three groups: • Group 1: All the questions answered by the users are considered.
• Group 2: The questions having untied answers and the questions in which just the top-2 choices are tied are considered.
• Group 3: Just questions with untied answers are considered.The formula used to calculate the MSE is the following: Where the value Ŷi is the value of the answer chosen by the user and Yi is the top-1 product, having always a value of 1.The value of Ŷi would be (1; 0.5; 0) if the user chose the first, second or third product of the survey, respectively.The values would be (1; 1; 0.5) if there is a tie between the top-2 products and it would be (1; 1; 1) if the tie happened between all the products.The results are shown in Table XX, taking into account the three groups.Accuracy (ACC) was also calculated for the result (only for Group 3) as shown in Equation 24.
where the value n is the number of questions in group 3 and xi is the answer chosen by the user of group 3.It includes the answers in which the choice of the first or second product will be declared as positive while the third product will be declared negative.The result is shown in Table XXI.

B. RS-NN Expert Survey
The company provided experts to evaluate the three approaches in the recommendation system, and expert opinions are important in evaluating the recommendation system for several reasons, the most important of which is that the experts fully know the products and also know the alternative products, so they can easily give their opinion that the recommendation system recommends suitable alternative products or not.Four surveys were sent of each approach to the experts, the surveys are the result of the techniques used; are (Cosine, Jaccard, Euclidean and Manhattan similarity) those mentioned in subsection VI-C.In each survey, the expert must answer three questions, namely: (1) Would you select any of these 3 options (alternative products)?(Yes/no).(2) If yes, select which one? (for example, 3).(3) Elaborate a raking to order the options (from the most similar product to the less similar product).Example: 3, 1, 2.
Figure 6 shows the results of the surveys indicating how many times the alternative product, be it the first, second or third, was chosen for each technique and also for each approach.The results show that the first approach has 80% of the questions that have suitable alternatives.The expert reported that the second approach recommends that 90% of the questions have suitable Fig. 6: The number of times the alternative products was chosen alternative products.The expert also stated that the third approach also recommends 80% suitable alternative products.
Therefore, a user survey was created based on the opinions of the experts.Cosine similarity was chosen for all three approaches.

C. RS-NN User Survey
This survey was built after taking the result from the expert, and this survey was very similar to the first survey we did, but augmented with clear images to make it easier for the user to quickly get to know the product and choose between alternative products, it is easier than.This survey is considered including three blocks.The first block expresses the PRO-COM approach (subsection VI-D), the second block is dedicated to the HTH-BD approach (subsection VI-E), and the last block is also implemented to evaluate the PK-BD approach (subsection VI-F).
As shown in Table XXII, the survey results were also calculated using MSE as we did in the first survey after receiving 65 responses from users.The same groups that were used before were used to compare the results between the two investigations.

D. Evaluation and Discussion
Performing offline experiments by using a pre-collected data set to let users choose or rate items is a usual way to estimate the performance of recommender systems, such as prediction accuracy [71].In this case, the dataset is usually divided into (i) a training sample to build the model based on the user rating and (ii) a test sample to calculate the measurement parameters such as accuracy, precision, recall and f-score.Since our recommender system is uniquely based on the characteristics of the products and we are not considering the customer profile, this kind of offline experiment is not provided for our evaluation purposes.
However, we have decided to opt by a most direct evaluation based on the feedback from two important sectors: customers (users) and experts (workers in the food retail sector).For this, we created a large-scale experiment on a prototype through a user survey, that is, an online experiment.The results is the direct feedback and opinion of the performance of the recommender system according to the users' perspective.Consequently, the feedback obtained would depend on a variety of factors, such as the user's intent (for example: how specific are their information needs), the user's context (for example: what items are they already familiar with, in addition, how much they trust the system) and the interface which the recommendations are presented through.This is a more realistic scenario and it will give as strong evidences about the recommender system's results: that is, if the suggested product is on the user would buy instead of the required one or not, obtaining, therefore, a good value for the accuracy.
Since we do not have information about the costumers (profiling, interactions...), we have worked on the whole data without dividing the data set.In order to save time and optimize operational performance when recommending alternative products, we took the following steps.We filter and pre-process the data set by two methods (BOW, Doc2Vec) for each approach based on the desired product characteristics, such as (variety, size, allergen, etc.).In RS-CF, we compare the desired product with the rest of the data and order the alternative based on the similarity ratio.In RS-NN, we build the model for the desired product using the neural network and then classify the alternative based on the similarity ratio.
Surveys evaluated the recommendation system, which is RS-CF user survey and RS-NN user survey, where comparison was made between them, as shown in Figure 7, which shows the difference between the accuracy results of the two surveys Fig. 7: Comparative study using accuracy considering the group 3. considering group 3.The results showed that the RS-NN user survey performed better in all three approaches.Also, Figure 8 shows the difference between the MSE results as it showed that the RS-CF user survey results are the best for the PRO-COM approach for the first and second groups, but the RS-NN user survey is the best for the third group.Also, the RS-NN user survey is the best for both approaches: PK-BD and HTH-BD.The comparatives prove that using the neural network-based completely alters the results, and taking price and brand into account was something that users wanted.Also, using more nutrition table features gives better results.It also proved that a PK-BD approach based on the HTH-BD approach is far better than relying solely on a PRO-COM approach.
Fig. 8: The MSE of three approaches Finally, we evaluated the RS with multi-criteria through a user survey using MSE to calculate the average error for the responses of the users of three groups, which is the main evaluation of the users' responses.We also use accuracy to evaluate the responses of users in group 3 only because for the other two groups it approximately resulted a 100%.

Fig. 1 :
Fig. 1: Description of the proposed model definition and evaluation.

Fig. 2 :
Fig. 2: More details on description of the proposed model definition and evaluation.

Figure 4 Fig. 4 :
Fig. 4: The modeling methodology for RS-NN.did a user survey based on the result of the expert.

TABLE I :
Extract of the MDD-DS

TABLE II :
Product attributes in the dataset

TABLE III :
Extract of the MDD-DS1

TABLE IV :
Extract of the MDD-DS2

TABLE V :
Examples of product_words for every p 29, 167 ['oil', 'parsley', 'lemon', ...] obtain all_products_words unique tokens/words extracted from the corpus C(p[Name], p[Legal Name], p[Ingredients]), which is the different meaningful tokens in the dataset after preprocessing.Therefore, all_products_words contains 10,707 unique tokens, an example 2 shows in the TableVI.Let ⃗ t be the n-dimensional vector obtained from all_products_words such that ⃗ t = (t1, . . ., tn) and ∀ k ∈ [1, . . ., n], t k is a string ∈ all_products_words and N = dim(all_products_words).The N tokens will form des(p) and the count vector size in product matrix X will be given by M xN .

TABLE VII :
Example of a product matrix.

TABLE VIII :
Example of a distance vector ⃗ p a = (p a1 , . . ., p at ) for PRO-COM.

TABLE IX :
PK-BD approach: Example of a matrix with distance vectors as columns after sorting the products.

TABLE X :
String values obtained from the Messages columns.

TABLE XII :
Example of the list which contains the vectors of words belonging to each product considering the Ingredientes column (product_words).

TABLE XIII :
Example of the list which contains the vectors of the features included in the Messages columns belonging to each product (withoutlist).

TABLE XIV :
Example of the vector which contains all the different features obtained from the Mensajes columns (withoutwords).
, ..., wi[dw i ]) where dw i is the length of the vector contained in the i element of the list ⃗ w.Note that ∀ k ∈ [1, . .., dw i ], wi[k] is a string.Let ⃗ws be a |Z|-dimensional subset of the Z elements of the m-dimensional ⃗ w withoutlist list.Z| ).Likewise, ⃗ ws i = (ws i [1], . . ., ws i [dw s i ]) where dw s i is the length of the vector contained in the i element of the list ⃗ ws.Note that ∀ k ∈ [1, . . ., dw s i ], ws i [k] is a string.Let ⃗ n h be a |Z|-dimensional vector.Here, ⃗ n h = (n h 1 , . . ., n h |Z| ) where n h i = dw sq i

TABLE XV :
HTH-BD: Example of a matrix with the considered criterions as columns after sorting the products.

TABLE XVI :
Products per variety after eliminate first quarter.

TABLE XVII :
Examples of product_words for every p

TABLE XVIII :
The first two des(p) by Tagged_Document.

TABLE XIX :
The vector for first product ⃗ a(p 1 ).

TABLE XX :
The MSE considering the three approaches as well as the different groups of products tested.

TABLE XXI :
The accuracy considering the group 3.

TABLE XXII :
The MSE considering the three approaches, the three groups of products evaluated (Second survey).ACC) for the result of Group 3 was also calculated as shown in Table XXIII using Equation24as calculated in the first survey.

TABLE XXIII :
The accuracy of user survey using ML (group 3).