Tourist Recommender Systems Based on Emotion Recognition—A Scientometric Review

: Recommendation systems have overcome the overload of irrelevant information by considering users’ preferences and emotional states in the ﬁelds of tourism, health, e-commerce, and entertainment. This article reviews the principal recommendation approach documents found in scientiﬁc databases (Elsevier’s Scopus and Clarivate Web of Science) through a scientometric analysis in ScientoPy. Research publications related to the recommenders of emotion-based tourism cover the last two decades. The review highlights the collection, processing, and feature extraction of data from sensors and wearables to detect emotions. The study proposes the thematic categories of recommendation systems, emotion recognition, wearable technology, and machine learning. This paper also presents the evolution, trend analysis, theoretical background, and algorithmic approaches used to implement recommenders. Finally, the discussion section provides guidelines for designing emotion-sensitive tourist recommenders.


Introduction
Nowadays, people find various information related to service portfolios (for instance, books, videos, and tourist attractions) to choose the most relevant to their personal needs. Although many times, the choice of a service or product does not generate the expected results. For this reason, Recommender Systems (SR) are valuable tools that provide adequate and contextualized items to the users' preferences. Emotion Recognition (ER) [1][2][3] and sentiment analysis [4][5][6] are vital contextual factors to improve user satisfaction and accuracy in tourist recommendations. So the user's affective context has been inferred from social network reviews [7][8][9]. Emotion detection, based on the physiological signals collected from wearable devices, has been used to personalize the user's context [10][11][12].
In recent years, the development and use of wearable technology have increased [26][27][28]. In particular, CCS Insight predicted that by 2021 technology companies will produce around 185 million wearable devices (such as a smartwatch, bracelet, cameras, audible devices, footwear, glasses, and jewelry) [29]. A wearable device is worn on the body. It has computational capabilities to detect, process, store, and communicate data [30,31]. They are also equipped with sensors to capture physiological data [17,32,33] and data about the user's environment [34,35]. Therefore, this data collection and processing have become a tremendous technological challenge to improve the user experience using the ER [10][11][12]36].

Materials and Methods
This section describes the bibliographic dataset collection, the preprocessing, and the review methodology applied to this review's bibliographic dataset.

Dataset Collection
Initially, a specialized search of scientific papers from the Clarivate Web of Science and Elsevier's Scopus platforms was performed. These bibliographic databases contain information on high-quality multidisciplinary research published in scientific journals of meaningful global impact and allowed the consolidation of a dataset to contribute to this study. The search string was "(((recommender OR recommendation) AND system) AND (tourist OR tourism OR emotion OR physiological OR affective OR wearable))". The first part of the string refers to the recommender systems, and the second part mentions the recognition of emotions. The information was extracted from the bibliographic platforms on 15 July 2020, filters were applied to the search chain by subject (Computer Science, medicine, engineering, business, telecommunications, artificial intelligence, psychology, multidisciplinary and tourism ) and by years (2001 to 2020) A representative dataset of 1829 documents was obtained, corresponding to 33.6% from WoS and the remaining from Scopus (see Table 1).
The bibliographic dataset preprocessing was generated with the ScientoPy tool [37]. Table 2 shows a summary of the preprocessing of the duplicate documents that were removed from the consolidated Scopus and WoS dataset. Besides, it presents the bibliographic dataset statistical information filtered by type of documents (conference papers, articles, reviews, proceedings papers, and articles in press) and duplicate records in the DOI match. In particular, the first column of information describes the input dataset. The second column specifies the number of published documents and the number of papers resulting from the duplicate filter. Finally, the third column shows the relative percentages before and after the filter. The bibliographic dataset is available in the repository (https://github.com/luzsant amariads4a/carswearable), so researchers interested in this knowledge domain can use this resource.

Review Methodology
The research field was systematically determined as following the scientometric review methodology [38]: • First, the subject of the review was searched in the Scopus and WoS databases. The search string was designed according to the research topic of recommendation systems in the tourism domain based on recognizing emotions from wearable devices' physiological data. • Secondly, the scientometric tool ScientoPy [37] was used, which pre-processed these two bibliographic databases' files. In this way, several clusters were determined, and the categories related to the research topic were formed. Besides, the lead authors' first 1000 keywords were chosen from this dataset consisting of 1449 documents. Then, the most relevant author keywords from this list were analyzed to consolidate 16 categories (recommender system, tourism, emotion recognition, machine learning, social media, user modeling, collaborative filtering, mobile application, context, personalization, sentiment analysis, wearable, healthcare, ontology, affective computing, and physiological signal). Later, the categories presented in the graphics cluster the similar author keywords that belong to the same topic (such as words in plural/singular, acronyms, classes, or category types). For instance, the RS topic includes the keywords (recommender system, recommendation system, recommendation, recommendation systems, recommendations, and others), and the deep learning topic includes the keywords (convolutional neural networks, convolutional neural network, CNN, deep neural network, LSTM, and others). • Third, it shows the statistical graphs of the bar and parametric trend analysis constructed with the indicators of Average Documents per Year (ADY) and Percentage of Documents in Recent Years (PDLY) [37]. It is interesting to highlight the rise of the RS and tourism as transversal and thematic axes. Figure 2 shows the trend bar graph of the main categories and highlights in the orange bar the documents published in the last four years in sentiment analysis, wearable devices, physiological signals, and use of ML algorithms in the ER. Also, it includes the value of PDLY (2016)(2017)(2018)(2019).
Similarly, the trend analysis in Figure 3 uses the ADY and PDLY indicators to describe the behavior of the strongly related themes to SR-based research. The graph on the left shows the evolution of the S curve of technology or category calculated by the number of documents accumulated per year (logarithmic scale). It represents the initial evolution, the period of growth, and the boom of the publication of documents related to research topics. While the parametric scatter graph located on the right side visualizes the growth of publications in recent years (2016)(2017)(2018)(2019)

Recommender Systems
SRs are software tools and techniques that provide suggestions for items that are likely to be of interest to a particular user. The documents cited in this section are related to recommendations for tourism, videos, music, content-based filtering, and collaborative filtering (see Figure 4). Although the search spanned the last two decades, most of the papers related in this section have been published in the last five years and have the highest PDLY. The RS landscape has been diverse in developing research prototypes that integrate Web technologies, mobile computing, and social networks in tourism [39][40][41]. Furthermore, RS approaches have evolved concerning the application, the business model, the user profile, the techniques, and the algorithms implemented.
The RS architecture integrates data collection, preprocessing, prediction models, and recommendation services [4,13,14]. Each stage was focused on the referenced papers according to the recommendation process's applicability and functionality (see Table 3). Moreover, the preprocessing stage extracts the relationship between the user, the item, and the contextual features represented in a data model (vector or tensor matrix). The prediction stage then generates a relevant list of items calculated with algorithms and recommendation models based on similarity. Finally, the recommender specifies the services related to the users' interests, such as listing the most innovative items and adapted to the users' demand.

Content-Based Filtering
A typical recommendation approach shares a mechanism to describe the detailed features of items that may be of particular interest to a user [42]. Based on the representation of these items, a user preference profile is built. Through an ML algorithm, it compares the item features with the user's profile and generates the recommendation list. The items' similarity is calculated based on the attributes associated with the compared items. For example, in a music recommender, a user rates a relaxation song with a high estimate, then the system learns to suggest other songs of the same emotional state. The song features can describe both structured data (song title, singer name, music genre, year of release, and emotional state) and unstructured data (user comments and song description).
Some studies have used the Cosine Similarity (CS) metric [6,43,44] to determine the similarity of the items represented in the n-dimensional space vectors (for example, a matrix of similarity between songs and emotional state). In contrast, the Euclidean Distance (ED) [45][46][47][48] was used to measure the actual distance between the elements and the user's profile. The recommenders' implementation based on content emerges as an alternative to personalize the multimedia, tourist, and entertainment content available on the Web. Emotions have aroused intense interest in the design of user preference models. For example, the influence of affective metadata on image rating performance using the Support Vector Machine (SVM) algorithm [49]. The travel profiles' definition using a multiple regression model implicitly obtained the users' preferences through the POI images [45].
Due to the semantic ambiguity of unstructured data, Probabilistic Latent Semantic Analysis (PLSA) techniques have been proposed for POI image annotation and ontological representation of user profile data [50][51][52]. Also, [53,54] described a tourism approach based on social relationships and user preference profiles to calculate the similarity of POIs. A hybrid approach in [7] compared the Rocchio algorithm for customizing required queries in the classification of candidate POIs with the k-Nearest Neighbors (kNN) weighted classifier query builder.
Although content-based approaches have limitations for predicting novel items, they have datasets that enrich domain knowledge and avoid cold start problems [55].
To overcome the problems of prediction accuracy, some researchers have proposed hybrid approaches. In particular, [56] presented a framework of tourist' mobile services based on the semantic relationship of the agreement of words and frequency of terms to determine the item's similarity to recommend. The architecture of a content-based and semanticconscious SR [57] described the components from a computational perspective. It introduced a cleaning user-profile method and overcame the magic barrier problem by detecting the semantic similarity between the item and the profile. Besides, it used a filtering component to generate the recommendation list appropriate to the user's preferences.

Collaborative Filtering
Unlike content-based filtering, Collaborative Filtering (CF) automatically learns the relationship of items, extracts their features, and discovers new interest items to users [55]. FC methods generate user-specific item recommendations based on rating patterns from multiple users who share similar preferences [42]. The data sources indicate the behaviors and interests that users have had in the past concerning the products. These can be implicit (for example, tourist attraction reviews, review history, and search patterns) and explicit (for example, scaling from 1 to 5 to quantify liking for a tourist site). The ratings recorded by users are related to the dataset elements and form a two-dimensional matrix. CF recommendation models calculate similarity weights between users and items [58].
User-Based Collaborative Filtering (UBCF), also known as neighborhood-based, establishes a target user's neighborhood by analyzing historical behavior and preferences to find the best similarity between other users' items similar to the liked target user [8,15,16,59]. In comparison, Element Based Collaborative Filtering (IBCF) predicts the rating of a new item and weights the ratings of the item set by the similarity of the target user behavior [8,17]. The CF approaches used Pearson's Correlation Coefficient (PCC) [7,14,16,[59][60][61] and CS [4,8,13,62] metrics to generate a list of product recommendations of interest to the target user.
The recommenders, faced with the problem of cold start and the scarcity of user behavior data, have implemented mining and affective computing techniques to obtain implicit information [4,14,16,63]. In personalization of tourist attractions and multimedia content, CF hybrid models merged the emotions of user comments, contextual data, and explicit's ratings available on online social networks [7,53,64,65]. Tourist destination recommenders used CF review extraction methods to refine user preferences and article reputation [4,54].
In contrast to CF algorithms, model-based approaches are categorized into factoring machine, matrix factoring, and ML algorithms. These models are scalable and handle sparse data [42,58]. The Factoring Machine (FM) is a general-purpose regression method that models the interaction between contextual variables [42]. The Stochastic Gradient Descent (SGD) algorithm with regularization hyperparameters optimized the recommenders' FMs that integrated tourist attractions' features and affective factors [63,66,67]. Too, Matrix Factorization (MF) is a model of latent factors represented in a three-dimensional grade cube denoted by users, items, and values of the contextual dimension [4,42,58,68,69].
Furthermore, the Singular Value Decomposition (SVD) algorithm transforms the original rating matrix R = users * items into a matrix of users with latent features U = users * latent f actors. Then, it calculates the transpose of the original rating matrix RT = items * users and generates a matrix of items with latent features RT = items * latent f actors. Lastly, the prediction function for a specific user rating is given by R = U * MT [14,58,70]. Simultaneously,The SVD ++ algorithm is a specific variant of SVD that handles both implicit and explicit interactions [42,58]. Some studies [71] modified the SVD ++ model by merging user sentiment and tourist destinations' temporal influence in the POI recommendation. Also, [72] used the emotion label's weighting as a tensor value of the High-Order Singular Value Decomposition (HOSVD) method to consider the preference and interest in movies' suggestion.
Recently, the research challenge of developing recommendation models with a contextualized approach arises to overcome users' limitations in terms of geographic coverage and social interaction [55]. Most recommender architectures are hybrid because they combine various approaches with IBCF and UBCF [73]. In particular, [74] proposed a tourist system that matches the user's location with the top-k recommendations through a linear distance for the contents and the CS for the relationship between the user profiles. Also, [75] developed an approach to extract information from users' preferences of a website, established the similarity of users, and generated a tourist attraction with the Slope One algorithm. Considering the problem of cold start and the scarcity of the CF algorithms' information, in [76] developed an architecture of a deep neural network based on an MF of latent characteristics of the project developers, their tasks, and their relationships.

Knowledge-Based
A knowledge-based recommender (KB) consolidates data on user preferences, restrictions, and needs essential for item suggestions [77,78]. Knowledge-based systems satisfy user preferences using knowledge bases that associate item features with user requirements [79][80][81]. KB recommender in [54] compared user requirements with candidate travel destinations by assigning a score to each dimension (location, tourist profile, type of attraction, transportation costs). Then It performed a weighted average predicts the rating.
Online social networks provide information related to the profile, location, and feelings of users for the construction of ontologies that have been used in the monitoring of emotional health [82]. However, the recommendation's performance depends on the knowledge base, and its implementation is costly due to the quality of the information [83].

Tourist Context
This section details the recommenders' categories of the tourism context. Although the search spanned the last two decades, most of the mentioned papers in this section have been published in the last five years and have the highest PDLY. Travel planning and e-tourism documents are closely related to emerging topics of Point of Interest (POI), tourist trip design problem, travel, and smart tourism (see Figure 5).
In the tourism sector, experiences are the main product and directly impact receptive tourist satisfaction [84,85]. For this, stakeholders prepare the tourist destination to have positive experiences in the social and physical context [21, 86,87]. One experience is inherently personal and can involve an individual on different rational, emotional, sensory, physical, and spiritual levels [88]. Smart tourism transformed information services to support the design of personalized tourism experiences in a ubiquitous context [22,79,89]. Therefore, recommender as technology tools provides valuable suggestions on tourist attractions tailored to personal preferences and restrictions.
In fact, in a smart tourism ecosystem [79,90], wearable devices' sensory technology can be considered the enabling layer that supplies the context factors and user data. Meanwhile, the recommender displays the suggested contents about the tourist experience and is part of the facilitation layer. Precisely, mobile tourism [89,[91][92][93][94] is an emerging field that combines various ubiquitous devices, technologies, and services necessary to provide well-being to tourists in the destination. Precisely, the heterogeneous data extracted in a smart city favor the design of tourism behavior models based on digital patterns of travel routes [95]. Furthermore, [96] proposed a cultural heritage route recommender with a user theme similarity model and a mean-shift clustering algorithm for visitor location. Location-based tourism recommenders [18][19][20]64] have used the technological capabilities of mobile devices to provide information to the user on points of interest (POI) near your geographic position. In [97] developed a recommender based on a clustering algorithm to discover user preferences' behavior. It used the CS technique to extract the unvisited places from the profiles. Then, it visualized the maps with the cartesian coordinates of the most novel interest points for users. Furthermore, [98] proposed a context-sensitive itinerary recommender based on a routing algorithm that used the user's social information, popularity, and distance from the POI. Mobile communications and social media allowed users to share ratings and experiences related to POI comments based on their preferences [13,[99][100][101].
The problem of tourist travel design [19,44,[102][103][104][105][106] has involved the implementation of tourist route planning to meet the trip's expectations, the novelties in the destination, and the visitor's satisfaction. For instance, the routing model based on metaheuristics made it possible to search for POIs located on the journey routes [107,108]. The recommender based on Dijkstra's algorithm [105] constructed short tourist trips within a feasible time frame according to the user's preferences and context. While in [109] proposed a POI recommender based on MF algorithms and an enriched cultural typology.
The trip planning of itineraries to tourist places has incorporated the user's relevance, location, and travel time between POI [110][111][112]. Some travel recommendation methods [113][114][115] generated a list of POIs that matched the user's preferences obtained from geotagged photographs and comments from tourist experiences posted on social media. Hybrid location-based recommenders considered dynamic user interaction to suggest custom POI using an intelligent swarm algorithm [59] and hybrid selection scoring algorithm [116].
On the other hand, the destination recommenders have guided the tourists in the trip purpose, adapting their personal needs and preferences [106]. In recent years, travel destination recommenders have extracted user sentiment trends toward preferred items from social media and addressed data scarcity limitations [4,54,71]. In [117] proposed a cultural, social recommender based on a heterogeneous and directed social graph with a CF algorithm. Other studies identified an emerging destination with intangible dimensions related to the destination's features, spatial coverage, and demand for tourist attractions [118][119][120][121].
Recently, rural tourism is a field that offers exciting challenges in the context of recommending rural tourism experiences [122]. Some studies [66,123] proposed methods for the extraction of geographic features from rural tourism attractions. Besides, medical tourism recommenders have supported users in health care and medical care while traveling [124]. For instance, the health-conscious ubiquitous context approach used visitor physiological sensor data [125]. And the social trust-based approach developed an anthology to generate suggestions for medical tourism services [81].

Context-Aware
In recent years, contextual information has been significant in describing current user behavior, scenarios, and mobile recommenders' application domain [89,[126][127][128]. Contextual information can involve various contexts related to user features, technological resources, and physical conditions [42,58,129]. The first involves user interaction on social media, mood, experiences, and preferences. The second describes the communication and computing capabilities of the user's ubiquitous devices. The last one specifies using the sensors to measure the climate, the weather, and the recommendation's location. For the above, some studies proposed a multi contextual perspective of mobile tourism SR by integrating users' location with environmental, temporal, and social factors to generate more effective predictions [93,[130][131][132][133].
Unlike traditional recommendation approaches, the Context-Aware Recommender System (CARS) added contextual information to the multidimensional classification prediction function user * item * context −→ rating [42]. The three CARS categories that adapted the user's contextual information in a prediction model are pre-filtering, post-filtering, and contextual modeling [42,58]. Pre-filtering, preference data is selected according to the context before algorithms calculate predictions [22]. In post-filtering, context is used to filter recommendations once predictions have been calculated with a traditional approach [22]. In contrast, contextual models incorporate contextual data directly into the prediction model. Some studies [15,134] demonstrated better results in the suggestion of movies by incorporating contextual dimensions of the emotional [135] to the context-sensitive algorithms (items, users and User Interface, UI), Differential Relaxation Context (DCR), and Differential Context Weighting (DCW). Similarly, [16] used a hybrid CF approach based on mood, the fusion of preferences, and users' ratings with similar interests. While [70] adopted multiclass classification algorithms (Decision Tree -DT, Random Forest -RF and SVM) to predict interactive emotional states. Furthermore, musical CARS investigations [8,17] have used CF approaches to extract emotional labels from songs associated with users' physiological states. Also, they implemented neural network models for the representation of the users' musical sequences [136].
Semantic Web techniques have enabled recommenders to add reasoning ability to context information. Ontologies semantically describe the concepts for modeling the features of user profiles, preferences, and items [80,81]. The personalized recommenders of tourist activities are based on ontologies built from various data sources (travel motivations, user opinions, geographic information, ratings, among others) [137]. Particularly [61] proposed a travel SR based on the contextual information of emotions [138] extracted from social networks with semantic analysis techniques. Additionally, [139] proposed a cultural hybrid SR of personalized itineraries based on social networks' activities, the linked open data, and the physical context. For this, it implemented the semantic-based match algorithm for the user's profile.
Also, POI recommenders have implemented mining techniques to identify contextual user preferences in social media reviews. In [116] generated the candidate POIs with the Adaptive KNN and Social Pertinent Trust Walker (SPTW) algorithms. Then, it displayed the recommendations with the Hybrid Selection Score (HSS) method. Another study [59] incorporated pre-filtering user preferences and a CF algorithm based on its proximity. On the other hand, [69] proposed the Largest Deviation technique to estimate the selective, parsimonious, and most relevant context of user preferences when rating POI items.
Compared to traditional SR frameworks, the majority of CARS research demonstrated better performance on prediction results when implementing sentiment analysis and sentiment mining techniques [140]. Besides, some studies described recommended architectures in various tourist settings. In [141] presented a POI itinerary recommender architecture sensitive to the user's physical and social context. It used semantic similarity algorithms based on a graph for the extraction and filtering of the multimedia content of LinkedGeo-Data. Likewise, in [142] designed a travel itinerary recommender based on dimension trees of contextual features, an inferential tourist guide engine, and a recommendation engine. In [143] proposed a recommender of cultural routes based on the geotagged photos' content, the temporal context, and the geographical location. For this, it used a thematic model based on the PLSA of POIs and visitors. On the other hand, [144] designed a mobile system to detect danger sources in the tourist destination. The system integrated the risk analysis component of technological, socio-political, and natural situations to generate recommendations for a safe trip.
The analysis of user behavior is very relevant for constructing service frameworks and personalized applications in the tourism field. In [95] proposed an ontological framework for predicting temporal events based on tracking tourist behavior changes. It used a data lake repository to store contextual information, implemented neural networks to classify the level of satisfaction from road trips, and grouped tourists into five clusters.

Emotion-Based
Affective computational models are increasingly efficient in generating personalized recommendations by detecting the user's emotions. Understanding and predicting user behavior is vital to an affect-sensitive recommendation system. Emotions are closely related to people's physical features and are considered a relevant contextual factor in the recommendations [46,49,145,146]. Some studies [147,148] designed user models based on personality traits and emotional states. These models comprise a conceptual level composed of profile data, physiological measures, contextual data, and subjective user attributes. In contrast, the specific domain level defines the connection between emotional states and affective elicitation attributes that can influence the recommendation process.
The emotional information of users can be obtained with explicit and implicit methods and in a non-intrusive way. In [63] integrated the prediction model of long-term users' moods, and the fashion recommendations improved compared to short-term emotions. In [67] presented a recommendation system sensitive to affect that infers the emotional features [135] of multimedia contents. It used a cluster-based Latent Bias Model (LBM) to predict the probability that a user would click on images taking into account emotional contexts, mobile behavior, and social closeness.
The exponential growth of content on online social networks has made it possible to identify users' affective features to improve recommendation quality. Emotional data is restricted by the scarcity and noise of user reviews. However, emotional information extraction avoids negative posts with the probability of increasing precision in the prediction [83]. In [13] developed a recommendation sensitive to the effect with a lexicon of emotions [135] extracted from the comments of the location social networks [102], and based on the emotional context, it generated a list of points of interest. In [14], a recommender based on implicit feedback data merged social information, rating, and emotion, maximizing the probability of user selection behavior through hybrid features.
Some emotion-sensitive SR approaches use social information to prompt users to provide implicit feedback on an item's rating. In [73] presented an algorithm that extracts emotional information from a social network's digital element rating. Then, it used the user satisfaction scale to generate a list of neighbors based on the similarity of emotions. In addition to the products' rating, both the textual emotion analysis that detects affective polarization and the extraction of the labels (user preferences and intrinsic attributes of the product) favor user satisfaction when purchasing products [60,62,72]. In [149] defined emotional contagion and user satisfaction in a group recommender that suggests sequences of items obtained with emotion decay and mood assimilation that impact future items' satisfaction.
Emotion-sensitive SR architectures improve the user experience by implementing services adapted to the current emotional state. In [150] developed a song recommender based on contextual data, current emotion, and musical preferences. It showed a better prediction when incorporating the emotions (happy, neutral, and sad) concerning the recommenders of similarity of content and feedback of the electroencephalogram signals. In particular, [151] designed a platform sensitive to emotions to improve people's productivity in smart offices. It proposed a module that recognizes the emotional context by obtaining data from sensors (temperature and humidity), detecting emotions (facial expression, voice and text analysis), and information from the Internet. Then, semantic rules were used in the task automation module.

Sentiment Analysis-Based
Social networks' affective content is an indispensable source of data to determine users' point of view concerning a product or service. Emotion information can be extracted with sentiment analysis techniques to infer the user's emotional context [7][8][9]. Hybrid recommender approaches reduce the cold start problem using data from users posted on social media [4][5][6]152,153]. Opinion mining detects and extracts affective states subjectively expressed by users in reviews, texts, and documents shared on online social networks [77,82,154,155]. Preprocessing can use many techniques such as tokenization and stemming that remove irrelevant data, divide text reviews into small parts (tokens), and classify them by the highest frequency into emotional polarity (positive, negative, or neutral) [72,156].
Some studies have used emotion analysis to predict online product tastes and musical choices of users [14,157]. The Word2Vec and fastText techniques generated the corpus of embeddings of words to suggest smartwatches [158] and the concatenation of specific information from the corpus of words grouped by sentiments [159]. The Term Frequency -Inverse Document Frequency (TF-IDF) technique was used to weigh the review features of tourist destinations [4,43,61], measure the relevance of POI tags [54], and the vector representation of social data [6,48]. In [71] identified the text clauses' polarity and calculated the trend value of tourist destinations' sentiment. In [116] built a hybrid user preference algorithm based on a multi-criteria technique and used an affective lexicon. Then, analyzed reviews to determine the likelihood of a new POI feeling.
RS approaches based on data mining techniques take advantage of accessing large amounts of user comments shared on social media. Researchers highlight the relevance of incorporating rich text sources to discover emotional patterns using natural language processing techniques, opinion mining, and ML [140]. Ref. [160] proposed a structured music recommender in a content analyzer component that labels an emotion from a thesaurus and a user preferences model. Also, [161] specified a framework for analyzing of negative emotions disseminated on social networks. Then, it used a corpus for community detection of affective nodes defined with a frequency of word co-occurrence. Unlike previous techniques, in [162] considered a multi-tag toxic comment classification approach with the Apache Spark Framework ML library. The results demonstrated better precision in word embeddings compared to a bag of words.

Evaluation of Recommender
The evaluation datasets were extracted from social networks and publicly shared databases. The data has an overview of recommended items, user preferences, and historical reviews from visitors. Depending on the experimental design, the algorithms can implement cross-validation techniques. Initially, the item review dataset is split into a significant percentage to train the recommender and the other to test the model's performance. Some studies used the k-fold Cross-Validation (CV) technique [4,11,12,50,63] to verify the precision of each fold of the comparative methods. On the other hand, the Leave-One-Out Cross-Validation (LOOCV) technique [4] eliminates each user's item that ensures the impartiality of the system to recommend items that were left out of the training data.
The challenge of providing high-quality recommendations involves using evaluation methods to extract value from the prediction from a technical and experimental POI [42,58,83]. In general, the recommendation and affective detection models according to the performance indicators used accuracy metrics (MAE and RMSE) [59], decision support metrics calculated in the confusion matrix (precision, recovery, and F1 score) [8,13,59,72,136,158], and metrics with recognition of range (MRR and NDCG) [7].  [7,13,112,116]. Also, the Metric Media Precision (MAP) metric compares the generated recommendation list with the list of relevant recommendations for users [17,109]. Besides, it contains the preferred items associated with the current context of the user [8,13,116,136]. Table 3 compares some recommender implementations that involved emotional data in the personalization of music, movies, tourist attractions, and online products. Initially, the recommended approaches were described previously (Content-based filtering CB, Knowledge-based KB, and Collaborative-Filtering CF). The data collection section lists the user model features and the datasets that provided the recommendation process's contextual factors. Then, the algorithms of the context-aware recommender system approaches were specified (pre-filtering PRE, post-filtering POST, contextual modeling CM, based on emotion EM, and SA sentiment analysis). Finally, in the machine learning section, the algorithms, similarity metrics (Sim), validation (Valid), and evaluation of the proposed recommendation models' performance results were synthesized.

Emotion Recognition
This section describes the diverse approaches supported by technology and emotional models to identify people's emotions. Although the search for the documents spanned the last two decades, most of the documents related in this section have been published in the last five years and have the highest PDLY (see Figure 6). Also, the ER based on physiological signals and brain activity (see Table 3) has involved knowing different areas and the specification of a framework for analyzing and detecting the emotional patterns [10][11][12]36]. Initially, the experimental design definition enables collecting objective and subjective data from the participants exposed to stimuli in a controlled environment. The application of preprocessing techniques to reduce noise and artifacts of physiological signals. The extraction of relevant features applying statistical and mathematical models. The identification of ML algorithms for the detection of the emotional states of the participants. Finally, the application of performance metrics to validate and evaluate the prediction results. In particular, [167] provided recommendations related to affective detection using a multimodal human-computer interaction system [168,169]. These automated systems can recognize and interpret the emotional states of a person through physical and physiological measures. Physical conditions represent communicative signals such as facial expressions [46,170,171], speech detection (speech) [47,172], body gestures [47,173], and eyetracking when viewing interactive content [32,174]. Whereas the physiological measurements involve the recording of bodily variations such as the change in temperature and the increase in blood pressure [1][2][3]. Physiological information collected from wearable devices can be used as personalized multisensory emotional support in the user's context.

Emotion Models
Emotion is a conscious and subjective experience associated with moods, physiological changes, and behavioral responses [175]. Affective states can be classified into a categorical model of emotions made up of basic emotions and a dimensional model of emotions represented in a coordinate map. In the categorical model, human beings' basic emotions generate automatic and temporary reactions to stimuli in the environment, daily life events, physical activities, or personal memories [176]. Ekman [135] proposed six discrete categories of emotions (anger, disgust, fear, sadness, happiness, and surprise) associated with facial expressions. Emotions are related to physiological variations. For instance, the state of fear increases heart rate measurements and, skin conductance compared to the state of disgust [175]. In [138] developed the eight emotion wheel (anticipation, joy, trust, fear, surprise, sadness, disgust, and anger) and can lead to more complex emotions. Also, physiological measures are vital indicators for detecting stress and emotions that a person feels [177,178].
The dimensional model conceptualizes emotions in continuous data in the twodimensional central affect space of arousal and valence [11]. In the arousal dimension, the autonomic nervous system (ANS) regulates the physiological changes of the human body, and the sympathetic nervous system (SNS) responds to an emotional activation produced by a threatening or challenging situation [176]. Sympathetic activation increases electrodermal activity, respiratory and heart rates associated with "fight or flight" reactions [179]. These responses lead to the suppression of systems that are not essential for immediate survival. In contrast, the parasympathetic nervous system (PNS) keeps the body in a state of relaxation by decreasing physiological measurements' frequency. The valence dimension indicates the degree of pleased or displeased in response to emotional motivation [147].
Additionally, the multidimensional model incorporated arousal, valence, and dominance, the latter defining emotional experience (on a scale from low to high) [163,180]. Essentially, Russell's circumflex model [181] has significantly influenced the studies proposed for ER (see Table 3). This model defines a two-dimensional circular structure that interrelates emotional states with discrete measurements on the axes of arousal (Low to High) and valence (Low to High). There is an inverse correlation between the quadrants' emotions on the other side of the circle structure (HAHV quadrant: happy emotion with LALV quadrant: sad emotion and HALV quadrant: anger emotion with LAHV quadrant: calm emotion) [172]. The emotion recommenders use the multidimensional model for statistical calculations of emotions, although they are not understandable. For this reason, the basic model provides a mapping of the emotions collected from users to a multidimensional model [172].

Emotion Measurements
Most of the studies used various stimuli to provoke the emotional states of the participants. Various methods have been described, including viewing video clips [11,163,165,173,[182][183][184], images [27,49,169], listen to music [8,12,185,186], read texts [6,32], and doing physical activities [177,187,188]. Emotions can be assessed through subjective and objective methods.
In the first method, people record subjective measurements on Positive and Negative Affect Schedule (PANAS), and Self-Assessment Manikin (SAM) instruments [63,173,189]. During the process of eliciting emotional states, the user performs a self-analysis of what "he/she feels" and assigns the ratings to each of the SAM parameters (arousal, valence, or dominance) on a nine-point scale. Meanwhile, PANAS evaluates two 10-item scales (rating from 1: not at all to 5: very much) to estimate positive affect on the vertical axis and negative affect on the horizontal axis. Furthermore, valence and arousal dimensions are in a 45-degree rotation about these axes [190].
The second method uses sensors or wearable devices for the measurement of physiological signals, as in [12] defined a framework that recommends songs based on the variability of the heart rate of users, a music database classified into four categories based on the degree of arousal (0 extremely low HRV to 1 extremely high HRV ) and in the degree of valence (1 very negative to 5 very positive). Also, in [47] used four domains of the emotional semantic space model (arousal, valence, sense of control, and ease of finding a goal) [181] to categorize users' affective states interacting with a video game.
Alternatively, the consolidation of multimodal datasets has prompted the analysis of emotional stimuli with publicly available physiological data such as the Database for Emotion Analysis Using Physiological Signals (DEAP) shared by Koelstra et al. [180], The International Affective Picture System (IAPS) published by Lang et al. [191] and Nencki Affective Picture System (NAPS) proposed by Marchewka et al. [192]. Particularly DEAP [180] contains information on peripheral physiological signals, brain activity signals, levels of arousal, valence, and dominance, and the subjective rating of the emotions perceived by 32 participants during video viewing. Based on DEAP in [184] proposed a music recommendation framework, and in [163] developed an emotional model on users' behavior in an educational environment. While IAPS [191] and NAPS [192] have a repository of photographs with the arousal and valence scores registered in SAM.
In conclusion, these datasets have been used to design affective models for images classification [27,49] and in the simulation of quantifiable emotional stimuli to obtain physiological data [27].

Wearable Technology
There are multiple applications supported in sensors and wearable devices for the collection of user data. Mainly, this section includes the use of physiological sensors for the ER. Although the search for the documents spanned the last two decades, most of the documents related in this section have been published in the last five years and have the highest PDLY. Besides, Figure 7 shows the recent trend of wearable technology used in monitoring and tracking user activities. The convergence of wearable devices and the Internet of Things (IoT) has had enormous potential as a source of data to provide personalized and contextualized services that operate on cloud computing, edge computing, and mobile computing platforms [26][27][28]193]. Big Data and multilayer modeling architectures validated the data collected from sensors using edge and cloud computing to be more efficient in the music information system's performance and storage capacity [185,194] and tourist attractions [22,94,195]. Furthermore, smart devices' selective suggestion was based on a trusted IoT edge computing system [196] and a corpus of reference phrases to recommend smartwatches to users [158,193].
Regarding the framework design using wearables, in [197] proposed a generic sensor framework for personalizing medical care based on household monitoring of physiological measurements. Each sensor used a java component to store data records and manage access to the system. Also, in [198] defined an IoT services framework with a semantic component for detecting falls and recognizing stress. Besides, it used a notifications component to generate statements resulting from health monitoring. On the other hand, a data model supported on wearable devices [199] identified the physiological conditions related to health in the context of tourism.

Devices
Wearable technology is an emerging trend that enables digital traces of people to provide contextualized and personalized information. The study of these digital life records has promoted recommenders' development that positively affects people's lives [200,201]. Such as the suggestion of activities based on timeline sequences [34,202,203], the sentiment analysis of registered users in health trackers' reviews [204]. Besides, other studies use physical activity and patient health history data to predict clinical diagnoses in healthcare [23][24][25]198,[205][206][207][208].
While the evolution of wearable and ubiquitous computing has enriched the construction of user models with data obtained from information systems, social networks, and the context of people's daily lives [195,[209][210][211].Wearable devices' sensors collect this individual data related to the user's behavior, physical and physiological states. Precisely, data modeling provides the knowledge of users essential in the design of personalized services oriented to favor well-being in health [212][213][214], the location of tourist activities [199], and travel by public transport [215].
On the other hand, wearable wristbands and smartphones have supported monitoring the user's activities in real-time with the data obtained by the accelerometer, proximity sensor, skin temperature (TEMP), and calorie consumption [30]. Some studies involve the identification of physical activities to improve their lifestyle [31,34,216]. Also of interest is detecting emotional activation using biosensors [32,187,217,218] in the performance of high-stress computing tasks and for personalization of musical preferences.
Eventually, augmented reality integration to smart glasses [219] has favored developing applications related to the personalization of real-time conversations [220], tourist activities guide [221], and specialized remote assistance [222,223].

Sensors
Wearable devices incorporate various sensors to collect and process data to monitor human activities and affective detection [166,184,205]. Some studies have developed wearable prototypes to measure physiological signals based on emotional elicitation [187,188], whose purpose is to improve the user experience [11,224] and provide personalized emotional support in the educational field [1,2].
Additionally, the users' physical activities have been monitored with inertial locomotion sensors such as the accelerometer, gyroscope, and magnetometer with mechanisms to collect data to monitor people's movement [34,35,225]. Also, in [30,31,216,226]used the data collected from inertial sensors to extract the features required in recognition of human activity.

Physiological
Affective states and physiological data are closely related to the elicitations that people perceive in daily life [179]. The following describes the detection of the emotional patterns extracted from the features of the physiological signals: • The ANS directs the physiological responses associated with emotional ones derived from stimuli from the external environment or the human body's reactions [11]. • Physiological indicators are monitored through various sensors that measure cardiac and electrodermal activity [176,179].

•
The raw physiological data is processed by applying resampling and filters to reduce noise, detect the affective components in the signals captured within a time window [187].

•
Manual or automatic feature extraction methods facilitate the detection of emotional states. Depending on the classifier's approach, statistical, frequency, and non-linear techniques can be used for the physiological segments [36,184].
Eventually, the analysis of the features in the time domain shows the change of affective patterns in a temporal sequence calculated by parametric methods such as the mean, minimum (min), maximum (max), variance (var), Standard Deviation (SD ) and mediate. Also, the Frequency Domain (DF) features are derived from the Fourier transform and the power's spectral density [227].
The cardiac monitoring sensors capture the heart rate (HR) of the beats per minute and the time recording of the intervals between beats (IBI) of the heart rate variability (HRV) [17,32,228]. The analysis of emotions derives from the features extraction in time series and different rhythms of the Electrocardiogram (ECG) and Photoplethysmogram (PPG) signals. The ECG measures the heart muscle's electrical activation, and the PPG measures the arterial volume through the skin [217,229]. The parameters in the time domain of the IBIs established SD in RR intervals (SDNN) applying Levene's test, and t-test to the data by gender [12]. Also, in [17,27] used Root Mean Square of the Successive Differences and percentage of adjacent RR intervals that differ by more than 50 milliseconds (pNN50). It should be noted that the detection of the R peaks resulted in different features of the intervals between the peaks of the signals [36,166,187]. Regarding the spectral analysis of the HRV time series, in [12] used the band's high-frequency HF (0.15-0.4 Hz), low-frequency LF (0.04-0.15 Hz), very low frequency, and VLF (0.003-0.04 Hz).
Also, Electrodermal Activity (EDA) or Galvanic Skin Response (GSR) signals to measure the skin's electrical conductivity variations produced by the sweat glands. The features of Skin Conductance Response (SRC), Skin Conductance Level (SCL), and the detection of EDA peaks recorded the changes in the affective states of the people [32,33,188,230]. While, in the EDA and HR signals [11], applied a moving window for the extraction of features, the principal component analysis (PCA), and the selection of the features with a priority of weighting of the input variables ( calculated with PCC, minimum redundancy maximum relevance and joint mutual information).
However, Electroencephalogram (EEG) signals calculate electrical variability in the brain using ionic current-voltage fluctuations within neurons. The EEG features used to operate in the delta (1-4 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29), and gamma (30-47 Hz) frequency bands. The last three bands seem to differentiate the affective conditions better [10,11,231]. The original signals were pre-processed with downsampled techniques, and the bandpass filters extracted the artifacts and noise from the EEG [189]. Statistical methods and wavelet transformation [163] supported the feature extraction process. Some studies found a strong relationship between the EEG and the musical categorization by emotions [163,231], the communication of emotional states transmitted by movements, and people's sign language [173]. Table 4 consolidates some works on non-intrusive sensors for emotion recognition of participants (pt) based on physiological signals and EEG features. The experimental designs were focused on evaluating the performance of emotional estimators according to the measurement of emotions (Arousal A and Valence V), the stimuli for the participants' affective elicitation, subject (Sb), and the physiological responses collected with the sensors. Affective detection implies the adaptation of computational processes that have enabled the interpretation of emotions related to users within a specific application context [169]. EDA and HR signals displayed better accuracy to predict arousal [11], while EEG signals were more effective with valence. Also, [36] presented representative results to predict arousal with ECG signals and detect valence with GSR signals. The comparison of the classification algorithms [12,163,173,184] allowed to validate the emotional detection performance.

Machine Learning
This section outlines the ML techniques and algorithms used to implement SR (see Table 3) and ER (see Table 4). Although the search for the documents spanned the last two decades, most of the documents associated in this section have been published in the last five years and have the highest PDLY. Figure 8 highlights the implementation of deep learning approaches for sentiment analysis and RE as an emerging issue.

Classification
In Section 3, research related to SR approaches to tourism is defined. The recommendation models used ML algorithms to validate and compare the performance of the classifiers of emotions, tourist attractions, and multimedia content (see Table 3). Some studies used KNN and SVM algorithms for POI classification [7,53,116] and image classification [46,49,67]. Moreover, [232] proposed an approach based on a decision tree that uses the users' predictions and historical interests to generate movie recommendations. Other studies used Linear Regression (LR) and neuronal network algorithms to classification trip profiles [45] and road trips [95].
The integration of wearable technology with ML approaches is being used to identify patterns that support personalized clinical diagnoses for health care systems [229] through kNN algorithms [205] and DT [206,233,234]. The users' lifestyle was supported in physical activity recommenders based on SVM algorithms, RF [30,235,236], kNN [225], and LR [31]. Some affective recognition studies based on data collected from sensors used decision rule classifiers, and DT required in the music recommendation [12,185,188]. In particular, the analysis of physiological signals [180] with the techniques of Naïve Bayes (NB), RF, and SVM was used in the emotional detection [32,70,163,173,184].
On the other hand, a multimodal approach for collecting affective responses (facial movements, speech, and interactive activities in a video game) demonstrated greater efficiency with the use of multiple sensors in SVM and DT emotion classifiers [47]. The direct measurement of physiological signals from visual stimuli made it possible to design an estimator of emotional state based on the Artificial Neural Network (ANN) of Multilayer Perceptron (MLP) and Generalized Regression Neural Network (GRNN) [11]. While, in [166] extracted the peaks features of the physiological signals (ECG and PPG), estimating the blood pressure with the ANN, SVM, and Least Absolute Shrinkage and Selection Operator (LASSO) regression models. Regarding affective recognition in the video analysis in [165], the SVM algorithms were used to classify the input hybrid features and the regression of support vectors in the arousal detection.

Clustering
RS approaches have implemented clustering algorithms as an alternative to overcome data scarcity problems and reduce the response time of predictions [55]. The grouping of users based on the features extracted from the datasets of social networks has made it possible to detect the relationships between user interests, affective states, and the similarity of POIs. Just like, the k-means and k-modes methods customized the grouping of users with standard profiles and felt [64,67,112]. Besides, the fuzzy c-means algorithm used demographic and preference data to construct the behavior profile of user activities [116]. The hierarchical grouping algorithm has grouped the geotagged images of the tourist destinations based on Haversine Distance (HD) [50]. Another SR [95] travel study presented a tourist clustering based on preferred attractions, travel expenses, route features, ratings, and tourist sites reviews. To do this, it defined a neural network model to simplify user parameters on a two-dimensional map.

Deep Learning
Recent studies used Deep Learning Networks (DLN) to construct recommendation models for automatic notifications, content classification, and pattern recognition [65,68,155]. DLNs differ from ANNs by the interconnection of multiple layers that handle various weights and trigger functions between the hidden layers' inputs and outputs. The deep architecture allows forward or backward propagation with adjustment of weights during feature learning and detection. Loss functions are used in classification or regression tasks to determine the difference between the labels predicted by the DLN and the actual labels in the dataset. Unlike ML, DLN models use unstructured data, reduce computational costs, and the performance scale is directly proportional to the data amount [237].Considering the cold start problem and the scarcity of CF algorithms' information, [76] developed a recommender based on a DLN and an MF of latent features to manage software projects.
Convolutional Neural Networks (CNN) are DLN used to identify patterns in input data segments that operate in one, two, or three dimensions. Unlike classical ML approaches, CNN uses filters to automatically extract features, reduce complexity, and overfit with pooling layers [112,186]. The specific class classification process is supported in Fully Connected Layers (FCL). Particularly in [226] used CNN to extract features from georeferenced images used to recognize human activities. Also, [36] proposed combining CNN and FCL models to extract affective features from physiological signals (ECG and GSR), surpassing traditional techniques' precision. These models (CNN and FCL) extracted affective features from multimedia text [156] and discriminatory features from optical flow images [165]. A hybrid CNN model [159] used one-hot vectors in the prediction of sentiment polarity.
The Long Short-Term Memory (LSTM) approach is a version of the Recurrent Neural Network (RNN) that overcomes gradients' problems by remembering long-term sequential data, with its structure that includes inputs, outputs, and gates of forgetfulness regulated with the sigmoid function [237]. Specifically, in [164] integrated the 3DCNN and LSTM algorithms to extract Spatio-temporal features in gesture detection. Some affective semantic analysis studies used the CNN and LSTM RNN algorithms in the classification of emotions from movie comments [9] and the detection of stress in psychological phrases [82]. Addi-tionally, [48] proposed the CNN and LSTM algorithms to extract the contextual features of tourist attractions' sentences. In [158], the CNN and RNN techniques were used to predict the phrases related to the users' perception and intention to recommend smart wearable devices.
Lately, the integration of ML algorithms and chatbots has enormous potential for recommending tourist destinations. In [238] designed a POI recommendation architecture based on decision trees to establish the profile of the user of a social network with the history of visits. Besides, in [239] proposed an LSTM RNN model to detect the users' interests based on the history of preferences. The query chatbot provided travel options based on the detected profile.
Most studies showed better performance in emotion-sensitive SR when using DLN algorithms. In [150] customized an emotion-sensitive using a DCNN to classify songs in a dataset based on user profile and history of preferences. It defined latent features and musical relationships with the Weighted Feature Extraction (WFE) algorithm based on a weighted MF.

Clusters Mapping
This section analyzes the co-occurrence mapping to identify the themes related to SR and tourism's transversal axes. For this purpose, the dataset preprocessed with SientoPy (which unifies Scopus and WoS) was used to generate a network map with the VOSViewer tool [240]. Initially, the author keyword co-occurrence map was created by setting up a thesaurus file to combine standard terms of technologies and algorithms to implement recommenders based on emotions. Also, 35 words unrelated to the theme described were filtered. The network map formed five clusters from selecting 52 keywords merged based on the co-occurrence links' total strength values. The merged network characterizes the thematic areas' development over time (2000 to 2019), showing the most meaningful traces of the related research documents (see Figure 9). Each point represents a node in the network, and the lines connecting the nodes are co-occurrence links. The five clusters show homogeneity with the thematic categories considered in the sections preceding.

•
The first red cluster focuses on implementing machine learning algorithms to recognize emotions based on physiological data from wearable devices [11,12,32,36,184,187] and social networks' affective data [15,16,46,49,63]. The emerging IoT topic encourages collecting large datasets analyzed in big data architectures that support smart tourism applications [22,94,195] and health care recommenders [24,25,198,206,208].

•
The third purple cluster emphasizes sentiment analysis using data mining algorithms to process and extract contextual features in social network datasets [43,77,82,154,155]. The emerging topic of deep learning applied to the recommendation based on emotions [9,36,82,156,159] is highlighted.
Wearable device Wearable technology Figure 9. It is a co-occurrence network mapping of author keywords related to the RS, tourism, ER, and ML. Also, it displays five color clusters made up of nodes identified by labels. The grouping of relevant documents defines the nodes' size and the lines between the nodes.

Discussion
The key challenges identified in the evolution, trends, and co-relationships of SR in the domain of emotion-based tourism are described below. This paper provides an overview of the background, algorithmic approaches, data models, and emerging technologies involved in sentiment analysis and affective recognition. These guidelines for the design of tourist recommenders with affective contextual information are aimed at the academic and scientific community.
First, challenging the emotional context leads to improved user experience and accuracy of travel recommenders. Initially, Section 3 analyzes the dominance of SR architecture approaches, platforms, and components [4,13,14,50]. Table 3 chronologically summarizes some studies on CARS with data sources, user models, algorithms, similarity metrics, and performance evaluation. It shows that most of the works used sentiment analysis techniques to extract the emotional context of the users' comments posted on social networks.
However, the collection of physiological data with wearable devices for emotional recognition in tourism has been little explored. Also, rural tourism emerges as an area of interest in planning personalized trips to manage geographical, emotional, and environmental factors. Additionally, both wearable, IoT, and Big Data technologies are emerging in smart tourism to implement recommenders of positive and satisfactory tourism experiences [22,93,94,94].
Second, the emotion recognition of Section 4 describes the framework for the analysis of physiological signals, affective detection, and validation of the classifier's results. The relationship between physiological changes and emotional models was evidenced, emphasizing Russell's circumflex [181]. In particular, the measurement of the dimensions of arousal and valence in the face of short-term stimulus elicitation in a controlled laboratory environment. Additionally, Table 4 chronologically summarizes some studies with the experimental design of the collection of affective data, extraction of features from physiological signals, and prediction algorithms.
As a result, the detection of arousal achieved similar or better accuracy than the detection of valence [11,36,184]. However, in the tourism domain, emotions are considered a relevant contextual factor in the recommendation's satisfaction. For this reason, there is the challenge of proposing well-defined experimental designs to obtain physiological data and measurements of emotions in everyday life.
Third, wearable technology and IoT environments have supported the infrastructure for data collection to personalize healthcare services [24,25], music recommendations [185,194], and suggestions of e-commerce products [158,196]. In particular, Section 5 related recent studies of emotion recognition based on data from physiological sensors (see Table  4), recognition of human activity using inertial sensors [30,226], and augmented reality applications supported on Smart devices Glasses [221,222].
Besides, the investigations evidenced the correlation between emotions and data from the physiological sensors of Empatica E4 wristband devices (EDA and HR) [32], Gear live smartwatch [12] (HRV) and electrodes (ECG, GSR, and EDA) [11,36]. Hence, in the tourism domain, wearable sensors' integration could improve the recommenders' prediction by defining a user model with various contextual factors.
Fourth, the ML approaches referenced in Section 3 and Section 6 were organized into classification, clustering, and Deep Learning Network (DLN) algorithms. First, the classification approaches in most of the studies described in Table 4 used classical ML algorithms based on feature extraction engineering (KNN, SVM, and RF). Besides, in the personalization of clinical diagnoses [205,233], physical activities [225,235], and multimodal approaches for affective prediction (MLP, ANN, and GRNN) [47,166].
Also, in Table 3, they implemented classic ML algorithms to classify candidate films, images, travel profiles, and POIs. Second, the clustering algorithms (k-means, k-modes, and Fuzzy-C-means) made it possible to design the users' preference models (see Table  3). Third, unlike previous algorithms, DLNs lower computational costs and require large datasets. Recent studies (see Table 3 and Table 4) used CNN to extract affective features from physiological data [36], detect human activities in images [226], and analyze feelings in comments of tourist attractions [82,159]. Consequently, the challenge arises to propose deep learning approaches to extract emotional pattern features from online social media datasets and multimodal physiological signals to improve the quality of tourist recommendation services.
Finally, future trends in recommendation platforms are oriented towards collaborative environments to support accessible tourism [241,242], and POI recommenders based on the contextual data gathering of the users lifelogs [200,201]. Besides, developers could propose real-time recommendation approaches that are more efficient, and that solve data scarcity problems using cloud computing, edge computing, big data, and IoT platforms [26,27,95,196,243].

Conclusions
This paper presents a review of the literature related to emotion-sensitive SR in the tourism domain. The analysis carried out showed several heterogeneous data sources drawn from wearable devices, IoT, and social networks. The user profiles' definition contains explicit and implicit information collected from daily life records about emotional states, physiological signals measurements, geographical location, and tourist attractions reviews. This definition could be applied in the design of behavior models and recommendations according to the user's preferences, based on recognizing emotions.
The scientometric review focused on analyzing technological research on users' emotions in the framework of tourist recommenders. The architectures proposed in the RS investigations that develop efficient approaches to processing, data storage, and access to services in mobile or cloud computing environments were considered. In tourism, the need to develop personalized and innovative applications to help users suggest travel experiences is highlighted. User emotions are closely related to positive satisfaction with a recommendation. Therefore, the research challenge arises from integrating data from IoT sensors, wearable devices, smartphones (heart rate, EDA, and affective states) into the recommendation process.
Based on the analysis of the research works listed in Table 3, the following findings were identified that should be taken into account in the construction of emotion-aware SR: • User models are the starting point of research approaches and, based on contextual data, recommendation services are defined in various application domains. User models have evolved by delving into daily life data obtained from ubiquitous devices. Although in medical tourism, physiological measures have already been used for health care. The user models have not yet been enriched with the data recorded from the wearables devices intended to design personalized services according to the tourist's affective state.

•
The tourist information sources come mainly from user reviews on social networks and openly available datasets. There is a limitation in using other sources to discover contextual patterns that enrich the data models. Furthermore, the restriction of heterogeneous information access on tourist behavior directly impacts the performance of the ML models.

•
Approaches based on user emotions increased the predictive capacity of recommendation models by fusing contextual features and sentiment analysis. Also, the emotions polarity, POI ratings, and contextual factors infer behavior from user preferences. In most researches, affective states were taken into account for the recommendation process's implicit feedback. Table 4 consolidates some research of emotion recognition with data from wearable devices useful in designing SR frameworks in the tourism domain. Affective sensing systems extract emotional patterns from non-intrusive sensor signals associated with heart activity, electrodermal activity, and brain variability. The research opportunity arises to deepen the relationship of affective with physiological changes and emotional models. The experiments carried out with physiological datasets reported better results in predicting emotions with deep learning algorithms.
There are research gaps focused on developing secure tourism recommenders with models for detecting danger sources to mitigate tourists' risks in the destination. Besides, the implementation of interest detection algorithms for travel planning, using chatbot applications and deep learning techniques. The recommenders require algorithms to determine affective similarity and detect the emotions resulting from tourist preferences and the construction of user-profiles based multimodal approaches that allow the extraction of emotional features from speech analysis, physiological measurements, and facial recognition. Funding: The research project was funded by the Departamento Administrativo de Ciencia, Tecnología e Innovación (733-2015), and by the Universidad Santo Tomás Seccional Tunja. This paper was funded by the Universidad del Cauca (501100005682).

Conflicts of Interest:
The authors declare no conflict of interest.