Service-Aware Interactive Presentation of Items for Decision-Making

: Current information exploration models present low-level features or technical aspects related to the paradigm used to generate results. While this may increase transparency, it does not help the user form a personal opinion about items because it does not describe the overall experience with them. In order to address this issue, we propose the INTERactivE viSualizaTion model (INTEREST) that supports the exploration and analysis of search results by means of a graphical representation of consumer feedback aimed at making the user aware of the service properties in all the stages of fruition, focusing on the data that is most relevant to her/him. INTEREST is based on the Service Journey Maps for the design and description of user experience with services. We applied it to the home booking domain by developing the Apartment Monitoring application that supports overviewing and analyzing online reviews about rented homes. In a user study, we compared the decision-making support provided by our application with that of a baseline model that enables a temporal ﬁltering of consumer feedback. We found out that Apartment Monitoring outperforms the baseline in user experience, user awareness of item properties, and user control during the interaction with the system. In particular, according to the participants of the study, Apartment Monitoring describes the expectations about the homes and it supports their selection in a more effective way than the baseline. These ﬁndings encourage moving from a low-level description of item properties to a service-oriented one in order to improve users’ decision-making capabilities.


Introduction
The classical ranked list of results presented by search engines poorly supports content interpretation because it cannot explain why the returned items are relevant to the submitted query, or to some of its terms, and how results are related to each other; see in [1].In order to mitigate this issue, and to increase the acceptance of results, recent recommender systems [2] adopt presentation models aimed at supporting transparency [3] and trust [4,5] by enriching the suggestions they generate with a description of the degree, or of the type of matching between users and items.For instance, RSR [6] shows the topic relations between user and proposed items, e.g., by research theme.Moreover, hybrid recommender systems propose analytic visualizations that show how strongly the embedded engines support the suggested items [7][8][9].Furthermore, to support exploratory search [10], other works promote the development of faceted search interfaces that empower the user to control the information filtering process by guiding her/him in the selection of item features, e.g., see the seminal works by Hearst [11,12] and recent ones such as in [13,14].
We claim that current information presentation models offer limited support the exploration of online catalogs, because of the following.
• Describing the rationale behind the suggestions generated by a system can enhance its transparency but it does not necessarily provide the user with the information (s)he needs to decide whether the proposed items are good or bad for her/him.This type of explanation has been traditionally applied in diagnostic expert systems [15] to substantiate their inferences, as a trust measure to help the user assess the validity of the reached conclusions.Moreover, it has been promoted to improve interactive systems in [16].However, when selecting items, users might adopt multiple evaluation criteria [17,18] which might differ from those applied by the recommender system.Therefore, explaining why an item is suggested is not sufficient to support people in decision-making.• Faceted search interfaces return items having the exact features or aspects specified by the user, e.g., the restaurants that offer outdoor seating or which serve good food.However, these interfaces poorly address evaluation dimensions that depend on the aggregation of multiple properties, e.g., product quality.• Decision-making cannot be restricted to information filtering because the experience with items can involve different stages of interaction with the provider, from their search to their delivery/fruition, all of which impact on satisfaction.Moreover, specifically concerning experience goods [19], which have to be used in order to be evaluated, previous consumers' opinions are a key type of data to be considered, see, e.g., in [20].
Indeed, the data about items available in online catalogs and services can be exploited to enhance user awareness and decision-making because it complements metadata with a rich source of information about consumer experience represented by item reviews.Thus, in order to address the limitations of the above listed approaches, we aim at developing a novel information presentation model that provides the user with a holistic perspective on search results, based on a direct analysis of previous consumers' experience rather than on an internal elaboration of low-level details representing the viewpoint of the system.Different from existing models, which attempt to explain why items fit the user, we aim at enabling her/him to make a first-hand opinion about the proposed solutions by efficiently inspecting consumer feedback, in an interactive way.In other words, we propose to treat results generation and presentation as separate concerns.This approach has two main advantages: on the one hand, it enhances the system transparency by making information presentation independent of its inferences, which might be error-prone.On the other hand, it supports the development of agnostic presentation models that can be applied on top of different information filtering techniques in order to enrich them with an efficient presentation of their results.Both aspects contribute to improve information exploration support tools with a better presentation of results.
The INTERactivE viSualizaTion (INTEREST) model, described in this paper, implements the previously discussed points.INTEREST provides a holistic, visual summary of consumer feedback based on interactive charts that support the analysis of reviews and their inspection at different temporal and content granularity levels to satisfy individual information needs.Building on the large availability of reviews in online catalogs of products and services (e.g., Amazon [21] and Yelp [22]), and on the existing literature on service design and evaluation, our model provides a synthesis of the satisfaction with items under multiple evaluation dimensions associated with consumer experience.Specifically, INTEREST is based on one of the most used conceptual models for the design and description of user experience, the "Service Journey Maps" [23,24], which support the definition of high-level aspects of a service associated with different stages of fruition by users.In our model we define the evaluation dimensions as follows.
• First, we model the user experience in stages, e.g., considering online product sales, the experience starts with searching for goods on the web site of the retailer and ends with after sales assistance.• Then, starting from the above stages, we identify a set of evaluation dimensions for item selection.
• Finally, we extract the sentiment of online reviews with respect to the identified dimensions in order to automatically build a holistic synthesis of consumer feelings towards items.
The interactive charts generated by INTEREST summarize previous consumers' opinions about the service evaluation dimensions.Moreover, they support a direct access to relevant reviews by highlighting the evaluated dimensions for easy identification.This enhances user awareness by supporting the formation of a personal opinion about items.In fact, the user can overview the search results focusing on the dimensions of interest, and (s)he can go in depth if (s)he is interested in specific data, being helped to identify the relevant text to be read.The definition of service evaluation dimensions for item presentation is a novel aspect of our approach: while some existing works provide diagrams to visually summarize low-level properties of items (see, e.g., in [25]), they cannot describe items in the holistic perspective given by service design.Notice also that, as INTEREST works on a set of items without making any assumptions about how they have been retrieved, it can be applied to explore the results of a recommendation algorithm, as well as of a faceted exploration system.This is a very important aspect because it makes it possible to exploit our model as a complement to any existing information filtering technology.
INTEREST can be instantiated on different domains, after having analyzed the underlying services and evaluation dimensions.In our work we applied INTEREST to home booking and we tested it on a dataset of reviews gathered from Airbnb [26].The resulting application, named Apartment Monitoring, enables the user to search for homes and to analyze reviews by focusing on different dimensions of the overall service experience, such as the appreciation of the host by her/his guests and the in-apartment experience.We designed a Service Journey Map describing the stages of a typical home renting experience from the viewpoint of the guest.Then, for each stage, we identified the evaluation criteria to be modeled in the application starting from those found in the literature about home and hotel booking [27][28][29].Finally, we detected a set of relevant evaluation dimensions with respect to the service stages.
We carried out a user study to investigate the User Experience (UX) with Apartment Monitoring by comparing it to a baseline system that enables the user to filter reviews by time frame.The experimental results show that the INTEREST model underlying our application outperforms the baseline presentation of information in all UX aspects, including the perception of awareness and control during data exploration for decision-making.In summary, this paper brings the following scientific contributions.
• A novel methodology to design interactive information presentation models supporting a holistic evaluation of items from a service-oriented point of view.• An interactive visual model (INTEREST) to evaluate search results with respect to evaluation dimensions concerning all the phases of service fruition, and at different temporal granularity levels.This is aimed at helping the user quickly understand whether items are suitable for her/him on the basis of existing online consumer feedback.The remainder of this paper is organized as follows.Section 2 positions our work within the related one.Section 3 presents the INTEREST model and the Apartment Monitoring application.Section 4 describes the validation methodology we applied.Section 5 describes the experimental results.Section 6 discusses these results, outlines the limitations of our work and presents our future research.Finally, Section 7 concludes the paper.

Background on Service Journey Maps
The Service Journey Maps (SJM) [23,30] model is one of the most largely used models in service design and innovation because they provide a powerful tool for a holistic approach to the design and development of products and services by focusing on the way customers experience them.A SJM is a visual description of the user experience in using a physical or virtual service.The graphic visualization follows a temporal line from the start point (e.g., enter website or enter shop) to the end one (e.g., on site or online customer care) in order to highlight and understand the various stages a person gets in touch with to benefit from the service; these stages are named touchpoints.SJMs are included as a visual model in the ServiceML language [24] that has been applied to industry case studies; moreover, they have been proposed to improve the design of online Reputation Management Systems (reputation describes a general "expectation about an agent's behavior based on information about or observations of its past behavior" [31].This concept has a global perspective, and Mui et al. [32] describe it as the "perception that an agent creates through past actions about its intentions and norms".According to Misztal [33], reputation "helps us to manage the complexity of social life by singling out trustworthy people-in whose interest it is to meet promises").Starting from the intuition that service design and reputation system modeling share many demands (to represent a variety of entities involved in the related domains), aims (to improve services and businesses with the aim of meeting user needs), and problems (they all involve stakeholders having very different backgrounds and skills), Capecchi and Bettini et al. [34,35] propose tools for service modeling that support designers in capturing high-level reputation management requirements and their implementation.Specifically, they show how consumer feedback can be directly matched to roles and partners involved in business/service creation to detect responsibilities and to plan appropriate actions.

Information Exploration Support Models
Several graphical models for the presentation of search results go beyond the traditional ranked list to make relations within results explicit and to support visual thinking.Some works extend the ranked list with graphical annotations; see Table 1 for an overview.For instance, HotMap [36] adopts color coding to highlight the terms of the search query in each retrieved item.Moreover, Concept Highlighter [36] applies color coding to describe the degree of match between each item and a set of concepts extracted by interpreting the query.Furthermore, Exploration Wall [37] provides streams of topically related results and prompts suggestions to help information search on mobile devices.Other models abandon the ranked list in favor of 2D representations to visualize the relevance of results and some types of relations between them [38].For instance, VIBE [39] presents data on a 2D plane using proximity to denote content similarity with respect to points in the map that denote particularly interesting keywords, topics, or other.Cartograph [40] leverages thematic cartography to visually represent semantic relations among non-geographic data and OnToMap [14] supports facet-based focusing of geographic maps based on multiple dimensions of selection.Some works visualize semantic relations among clusters of results, e.g., FacetAtlas [41] links topics in a 3D diagram supporting the representation of multidimensional relations among them, and SolarMap [42] combines topic-based document clustering with a radial representation of facets to support two-level, topic-based document filtering.
Even though we focus on the visualization of information about individual search results, we are concerned about the presentation of possibly large data sets because, in online catalogs, items can receive a significant amount of feedback.In order to enhance the efficiency of this task, our model supports filtering reviews with respect to the defined evaluation dimensions and we apply color coding to relate the words appearing in the reviews to such dimensions.Moreover, we annotate each review with smilies indicating whether it has negative, neutral, or positive sentiment so that the user can quickly interpret it; in this way, (s)he can survey consumer feedback without inspecting it in detail.

Explanation of Recommender Systems Suggestions
Recent recommender systems help users analyze the results they generate with the aim of enhancing transparency [3] and trust [4,5].
Table 1 summarizes the works that we overview.Considering collaborative recommenders, Herlocker et al. [44] base explanations on the conceptual model that the use has of the recommendation process: they find out that grouping neighbors' ratings, and presenting the past performance of a system, enhance the acceptance of its suggestions.Moreover, Pu and Chen [43] propose explanations organized by grouping items on the basis of their trade-off properties to help decision-making.
Explanations can be based on the applied recommendation paradigm.For instance, content-based recommender systems [45] can motivate suggestions by highlighting the content that matches the user's interests.Moreover, feature-based recommenders [46] and multicriteria ones [47][48][49] separately evaluate the utility of item features; therefore, they can use this information to present the suggestions.Furthermore, graph-based recommenders such as RSR [6] and RippleNet [50] use the chains of relations connecting the user and the suggested items as explanations, while [51] exploits chains connecting users and items in the Linked Open Data cloud to justify recommendations regardless of how they have been generated.
Hybrid recommender systems are more complex to explain because they integrate heterogeneous inference techniques.In that case, the presentation of results typically aims at helping the user shed light on the influence of the embedded engines on item suggestion.For instance, MyMovieFinder [53] adopts a ranked-list visualization and, by clicking on items, the user can see the recommendation criteria they meet.TalkExplorer [8] graphically represents multiple dimensions of items relevance within bidimensional graphs.Analytic models such as Venn diagrams are also proposed to overview the suggestions [56]; SetFusion [54] exploits this type of diagram, combined with color bars, to visualize items and to show how strongly the embedded recommenders support them.RelevanceTuner [7] directly visualizes the degree of support received by the items of the ranked list as stackable bars, and IntersectionExplorer [55] uses the UpSet matrix [72] to show the number of common suggestions provided by the recommenders in a grid-layout.Finally, HyPER [9] integrates heterogeneous recommenders by means of probabilistic logic rules also used to explain the results.
We pursue a direct interaction with results to avoid complex analytic visualizations that might challenge non-technical users and people having low visual spatial abilities [73,74].For this purpose, we invest in improving the efficiency of analysis of individual items, assuming that the user has picked them, e.g., from a ranked list.In this perspective, we offer a simple, interactive model to (i) visualize a synthesis of consumer sentiment about the evaluation dimensions selected by the user and (ii) provide direct access to the reviews supporting that sentiment.In this way the user can easily identify the comments relevant to her/his interests and form a "first-hand" opinion about items.Our work is related to Millecamp et al.'s one [25] because both models present item dimensions by means of charts.However, we provide holistic information about items, based on high-level service evaluation dimensions; differently, in the other work the diagrams support the visualization of low-level item properties such as music tracks' tempo and acousticness.Similar to the work in [51,75,76], our model is agnostic with respect to how items are filtered.However, rather than justifying recommendations by means of a synthesis of reviews, we aim at empowering the user to efficiently explore the existing feedback to make a personal opinion about items.This has a triple advantage: (i) it supports user awareness of item properties; (ii) it does not infringe the transparency of the recommender system; and (iii) it is not affected by issues related to personal characteristics, which in some domains impact on the user's confidence in the explanations (s)he receives [77].
Review-based recommender systems [57] extract item features and aspects from online user feedback to build user and item models for personalized recommendations [59][60][61][62], or to build user-item chains of relations for graph-based recommendation as in TriRank [52].While most works strictly focus on improving Top-N recommendation (see, e.g., in [58,[63][64][65]), McAuley and Leskovec [71] introduce the idea of matching relevant item features extracted from reviews to latent factors in order to support explanation in Matrix Factorization.In the same line of research, Muhammad et al. [66] and O'Mahony and Smyth [68] mine bigram and single-noun phrases and map them to items metadata to address general features in explanations.Moreover, Ni and McAuley [69] and Lu et al. [70] extract positive and negative opinions about item features to integrate recommendation and explanation of results into a single task.Furthermore, Chen and Wang [67] classify extracted features using metadata to group them in explanation.Differently, SearchLens [13] enables the user to define search "lenses" that specify the desired features of items; it returns result lists enriched with a specification of the degree of matching between items and search criteria.
Different from the above works, which focus on specific features (e.g., outdoor seating) or aspects (e.g., taste of food), INTEREST extracts consumer sentiment about evaluation dimensions describing the whole user experience with items, from their selection to their fruition.This is possible thanks to the exploitation of Service Journey Maps that describe the stages of service underlying item selection.Moreover, our model offers an abstract representation of consumer feedback based on interactive charts that enhance data analysis and interpretation, as well as a direct access to relevant reviews.

Techniques for Analyzing Review Content
Several works identify aspects of items from reviews to understand which features emerge from consumer feedback.For this purpose they employ rather different methods.For instance, Alam et al. [78] propose a multi-grain topic sentiment extractor of semantic aspects by extending topic models such as LDA [79].Moreover, Tang et al. [80] jointly extract multi-grain aspects and opinions by modeling aspects, opinions, sentiment, and granularities simultaneously within a probabilistic approach.On a more specific perspective, Xu et al. [81] apply LSA [82] to identify key attributes driving customer satisfaction and dissatisfaction toward hotel product and service attributes.Furthermore, Xiong and Litman [83] apply Supervised LDA [79] to extract relevant topics from high-quality reviews for summarization purposes.Korfiatis et al. [84] apply Structural Topic Models [85] to identify evaluation aspects in the reviews written by airline passengers and combine those aspects with rating data to improve the understanding of service quality.Finally, Chang et al. [86] propose an integrated framework supporting the visual analysis of hotel ratings and reviews.
At the current stage, we do not apply these techniques to extract clusters of co-occurring keywords from reviews because we exploit the elements that compose the service experience underlying the application domain to identify the high-level dimensions for item presentation and evaluation.For instance, as far as home booking is concerned, we consider the interaction with the website, check-in, check-out, etc., as suggested by the Service Journey Maps that model the home booking experience from the viewpoint of a guest.

Materials and Methods
We propose the INTEractivE viSualizaTion model supporting decision-making to help the user in selecting the preferred items out of a list of options (e.g., a recommendation list L) by exploiting a service design framework for the identification of evaluation dimensions.Given an item i ∈ L in the user's focus of attention and the reviews about it posted by consumers, our model enables the user to do the following.
• Choose any subset of the set of evaluation dimensions D = {d 1 , . . ., d v } derived from the underlying service model to assess the suitability of item i.The dimensions of D describe previous consumer experience with items from the stage of searching for it online to its fruition.• Select a time interval for filtering the reviews to be considered.This supports item evaluation in specific contextual conditions, e.g., starting from the most recent reviews, or from those posted within a particular time frame.
The choices that the user makes determine the set of reviews to be analyzed.On this basis, INTEREST generates a visual, interactive representation of the item aimed at summarizing its properties.Moreover, it supports a direct inspection of the relevant reviews, which are graphically annotated to show the level of satisfaction they convey, and to highlight the words concerning the evaluation dimensions they deal with, see Section 3.4.INTEREST can be configured to work on different evaluation dimensions, as long as they are identified within the application domain of interest.The following sections describe the concepts, methodologies, and techniques underlying our model.

Specification of the Dimensions of Item Evaluation
In order to define the evaluation dimensions to be used for presenting items we apply the Service Journey Maps (SJM) [24], which describe the typical flow of user activities that a participant, service provider, or other stakeholder perform in a service execution; see the upper boxed portion of Figure 1.In the map, each activity is encapsulated in a touch-point (stage) represented as a gray circle.An actor can either initiate a touch-point or be involved in it through some inputs or outputs.The former case is represented by a plain open arrow, such as the one connecting customer/user to Stage 1; the latter one is represented by means of dashed lines.The arrows connecting touch points specify their order within the overall workflow.The level of touch point details depends on the nature of the services being modeled as well as on the association between the service stages and the user experience.
The set D of evaluation dimensions describing user experience with items includes high-level concepts {d 1 , . . ., d v } that are retrieved from the stages of service fruition.In general, we might have a one-to-one correspondence between evaluation dimensions and stages.However, for finer-grained reasoning, we might want to identify multiple dimensions relevant to a single stage; moreover, a single dimension could be related to different stages.In order to comply with these requirements, we keep stages and evaluation dimensions as separate concepts and we model the associations between them as [M, N] relations; see the lower portion of Figure 1.
Notice that reviews rarely mention evaluation dimensions explicitly: most comments refer to item features that have to be mapped to a dimension to support their identification.For instance, the host appreciation in the home booking domain might be described by making reference to her/his family and hospitality or to other aspects of the interaction such as her/his helpfulness.In order to identify the sentences that deal with these dimensions, a thesaurus T i of keywords has to be built for each evaluation dimension d i .The thesauri can be defined using previous studies of the service domain.Notice that the keywords of the thesauri must be lemmatized to have a single, base format for each of them; for this purpose, we use the spaCy Python library [87].As described in Section 4.2 for home booking, the identification of evaluation dimensions {d 1 , . . ., d v } starting from the stages of a Service Journey Map is a possibly iterative process and it is strictly intertwined with thesaurus construction: 1. We start with a one-to-one association between stages and evaluation dimensions.2. We build a first version of a thesaurus T i for each identified dimension d i .3. We analyze each of the defined thesauri and we detect: • Dimensions that need a finer-grained representation because the associated keywords refer to topics describing service aspects that deserve to be promoted to dimensions.For instance, the "Stay in apartment" stage can be associated to distinct dimensions to separately evaluate the internal environment of the home and its surroundings.• Keywords related to aspects that are relevant to more than one stage, such as the interaction with the host: these aspects can be promoted to evaluation dimensions associated with multiple stages.
It can be noticed that the performance of roles in delivering the service along stages can have an important impact on the construction of thesauri and, consequently, on the definition of the evaluation dimensions to be modeled.For instance, in a Service Journey Map describing the experience of customers in a restaurant, the waiter's behavior and attitudes highly influence the experience in stages such as "order meal" and "consume meal".

Review Analysis
Given the thesauri {T d 1 , . . ., T d v } storing the lemmatized keywords, each review is processed to identify the evaluation dimensions it addresses and the emerging sentiment.Figure 2 depicts the pipeline of activities concerning the analysis of an individual review.It can be noticed that, while items have to be dynamically evaluated on the basis of the selections made by the user (e.g., time frame), reviews can be analyzed as soon as they are posted online.We thus permanently store this type of information into a database for subsequent usage.Even though the review analysis pipeline is rather standard, the following sections describe it for completeness.

Language Detection
• Filter on English reviews.• Python library: langdetect.

Binding
• Match of lemmatized noun phrases to thesauri.

Sentiment Analysis
• Sentiment analysis of reviews and of their sentences.• Python library: TextBlob.

Language Detection
The language detection step is needed to select the linguistic resources to be applied for content analysis.We identify the language of a review r by using the Python langdetect library [88]; currently, we focus on the reviews written in English and we filter out the other ones.

Linguistic Analysis
In this step of the pipeline we extract and lemmatize the noun phrases occurring in r.Specifically, we group them by sentence to obtain a structured representation of r that supports its subsequent analysis: each sentence s contained in r is thus associated with a set {lnp 1 , . . ., lnp m } of lemmatized noun phrases representing the noun phrases of s as lemmas.We carry out both noun phrase extraction and lemmatization using the spaCy Python library.

Binding Review Sentences to Evaluation Dimensions
In order to support the retrieval of reviews that are relevant to a specific evaluation dimension d ∈ D, and to summarize consumer satisfaction, the sentences of r have to be associated to the dimensions they deal with.For this purpose, we analyze each sentence s of r and we match its lemmatized noun phrases {lnp 1 , . . ., lnp m } to the thesauri {T d 1 , . . ., T d v } defined in Section 3.1.A sentence s matches a thesaurus T if it contains at least one noun phrase lnp ∈ {lnp 1 , . . ., lnp m } that includes a lemmatized keyword of T .Having identified the thesauri matched by s, we bind s and we index it under the corresponding dimensions for quick retrieval at information presentation time.
When this step of the pipeline is completed, each sentence s of r is associated to a list of lemmatized noun phrases {lnp 1 , . . ., lnp m } and to a set of referred dimensions {d 1 , . . ., d k } as shown in Listing 1.
Listing 1: Output of the review analysis pipeline....

Sentiment Analysis
The sentiment analysis step is aimed at identifying the positive, neutral, or negative polarity emerging from review text.Combined with the identification of the dimensions referenced by the sentences of the review r, this makes it possible to estimate r's author evaluation of item i ∈ L. We perform this analysis at two granularity levels: • Sentiment of the review: this is aimed at extracting the reviewer's overall sentiment about i, balancing the possibly different opinions that emerge from the individual sentences included in r.For instance, the reviewer might be happy about certain aspects of item i and unhappy about other ones, conveying a neutral overall evaluation in r.We compute the sentiment of r as the polarity of its text by using the TextBlob Python library [89].This library leverages the Pattern library [90] that takes into account the individual word scores from SentiWordNet [91] and uses heuristics for negation to compute the overall polarity of a text.• Sentiment of sentences by evaluation dimension: this is aimed at extracting the sentiment of the reviewer concerning the considered evaluation dimension.For each sentence s of r, for each dimension d ∈ D addressed in s, the sentiment of s for d is computed as the polarity of s using the TextBlob library on the text of s.
As the polarity is defined in the [−1, 1] interval we normalize it to a level of satisfaction in [0, 100] in order to be presented as a percentage in the user interface of the information visualization system.
Listing 1 shows the output of the review analysis pipeline.Each review r is represented by means of a data structure reporting its identifier (id) and satisfaction level (satisfaction, denoted as satis f action r in the following).As previously described, for each sentence s of the review, the data structure reports its lemmatized noun phrases, referred dimensions, and the satisfaction level of s (sat).

Item Evaluation
The INTEREST model enables the user to select a time frame and a set of evaluation dimensions that determine a set of relevant reviews R = {r 1 , . . ., r w } to be considered.Given this set, our model aggregates review data to summarize consumer feedback about the item i ∈ L. Specifically, the level of satisfaction with i for each evaluation dimension d ∈ D (denoted as sat di ) is computed as the mean satisfaction level of the sentences S di that are included in the reviews of R and are indexed under d: Our model also supports the generation of an overall satisfaction level about item i by computing the mean satisfaction of the reviews belonging to R: All satisfaction levels take values in [0, 100].

Data Visualization
Our model can present consumer feedback at different granularity levels to flexibly support the user in the assessment of item suitability.Specifically, given i ∈ L: • The structured review representation generated by the review analysis pipeline makes it possible to generate dynamic charts that show the overall satisfaction level about i, as well as the satisfaction about specific evaluation dimensions in D.
• The indexing of review sentences under specific dimensions of D supports a direct and efficient access to the reviews that address the evaluation criteria selected by the user.• The computation of the satisfaction level of individual reviews makes it possible to visually annotate them for fast interpretation.• By exploiting the thesauri, the words of the reviews that make reference to the various evaluation dimensions can be identified and highlighted.
Figure 3 shows a portion of the user interface of the Apartment Monitoring application, which we use to simplify the presentation of the visual model we propose.• The left panel is organized as follows.

-
At the top, there is the menu for selecting the item to be evaluated out of the list L proposed to the user, and the link to view the home on the Airbnb web site.

-
At the bottom, a graphical widget supports the selection of the time frame of analysis.

-
In the middle, a component includes a checkbox for each evaluation dimension that the user can choose to explore the item.Each dimension is associated with the mean level of satisfaction derived from the whole set of reviews that belong to the selected time frame.For example, the visualized item has 71% level of satisfaction regarding the host appreciation.
• The right panel of the user interface shows the detailed information about the item: -A histogram visually represents evaluation dimensions by breaking the time frame selected by the user into sub-intervals to overview the temporal distribution of consumer satisfaction.Each bar of the histogram shows the level of satisfaction concerning the associated dimension within its own time interval.The exact level can be visualized by placing the mouse over the bar.

-
Below the chart there is the list of reviews used for the analysis.These reviews depend on the chosen time frame and on the dimensions selected using the checkboxes.The reviews posted in the same time interval which do not address those dimensions are not shown.In each review, a scale of smilies displays its satisfaction level; moreover, the words that correspond to the selected dimensions are highlighted using color coding.
The bars of the histogram are interactive: if the user clicks on any of them, the visualized set of reviews is refreshed to only include the relevant ones and the highlighted words are updated to maintain the consistency with her/his selections.Thus, the user can dynamically filter reviews, e.g., (s)he might investigate why the in-apartment experience received a low evaluation in May and in September 2019.

Validation Methodology
We validate our model on the home booking domain with particular reference to the information available in Airbnb [26], which is increasingly used by people to find homes for their holidays, as well as for working purposes, and thus provides consumer data referring to a rich spectrum of evaluation contexts.Similar to other services such as Booking [92], and as a measure to strengthen the robustness of the underlying Reputation Management System [93], Airbnb enables each consumer to upload at most one review for each renting contract after having left the apartment.This guarantees that online feedback is based on truly experienced services.

Dataset
The experiments are based on a portion of a large, public dataset of reviews collected in the city of Florence (IT) by the Airbnb home booking service (the dataset is periodically updated and can be downloaded from http://insideairbnb.com/get-the-data.html)[26].The dataset stores information about hosts, homes (denoted as "listings"), and their neighborhood; moreover, it stores the textual feedback uploaded by visitors.The following information about reviews is provided.
• listing_id: numerical identifier of the home evaluated in the review.
• id: numerical identifier of the review.
• date: timestamp of the review.
• reviewer_id: numerical identifier of the author of the review.
• reviewer_name: name of the author of the review.
• comments: review text in Natural Language.
Even though the Airbnb website makes it possible to rate the homes in a 5-point Likert scale, the dataset does not report such ratings.
We omit the descriptive statistics of the whole dataset because for our study we focused on a small number of apartments, corresponding to a result list proposed to the user.Within this list, we asked participants to evaluate four specific homes, the same for all people, with the purpose of exposing users to the same amount of data to be analyzed, see Section 4.3.The selected homes have similar size and facilities to be comparable from the viewpoint of basic requirements; e.g., number of rooms.Moreover, each apartment has approximately 15 reviews and the overall amount of text from reviews presented during the evaluation of a home is approximately 50 lines.

Evaluation Dimensions for the Home Booking Domain
In order to identify the evaluation dimensions, we first designed a Service Journey Map (SJM) describing the home booking experience from the viewpoint of the guest.As in this work we focus on home renter experience, we only model the customer and apartment owner roles.Specifically, we ignore the actors involved in other service stages, such as the visit to the website.
With the first SJM we identified four service stages: Visit website, Check-in, Stay in apartment, and Check-out.We associated a separate evaluation dimension to each stage and we built the corresponding thesaurus by using previous studies that describe the topics addressed in the reviews collected by home booking services [27] or by similar ones, such as hotel booking ones [28,29].However, after a first analysis we found out the following.
• Check-in and check-out are usually related in reviews and they are associated to the same keywords, which appear in both thesauri.• Stay in apartment has a rather large number of keywords.Moreover, in their comments, reviewers frequently separate the aspects related to the apartment interiors (furniture, comfort, services) from those concerning its surroundings, e.g., geographic position, available public transportation, shops, and presence of noise.• The interaction with the host and her/his properties represent a relevant evaluation dimension crossing all the service stages.
Consequently, we revised the original evaluation dimensions and we obtained those represented in Figure 4, i.e., Host appreciation, Search on website, Check-in/Check-out, In-apartment experience, and Surroundings.Table 2 shows a few sample keywords for each thesaurus and Table A1 in the Appendix A reports the full list.We remind the reader that these (lemmatized) keywords are used to index review sentences with respect to the corresponding evaluation dimensions and to highlight review words in the user interface of Apartment Monitoring.

Study Design
We conducted a user study using Apartment Monitoring to test the support of the INTEREST model to item evaluation.We assume that a previous filtering phase provides the user with a list L of items that satisfy her/his basic requirements (e.g., arrival and leaving dates and number of guests), and that (s)he interacts with the system to overview the properties of the filtered items, leveraging online reviews to evaluate them.Therefore, starting from a list L of results, the experiment investigates the level of support provided by our application to make an informed item selection decision.
As discussed in Section 2, existing information exploration and recommender systems focus on rather specific item features and aspects, or they attempt to explain the recommendation paradigm applied to generate the results.Therefore, they cannot provide the user with a holistic, service-oriented presentation of items, which is a main objective of our work.Moreover, the systems generating item overviews have rather complex user interfaces that can overload users.We thus decided to compare our approach to models that leverage review data to provide a direct impression of what consumers think about items.In our case, it was natural to consider home booking services, and Airbnb in particular.We thus analyzed the following.
• The INTEREST model in its Apartment Monitoring implementation.This model empowers the user to evaluate items by means of (i) interactive charts that summarize consumer feedback, (ii) visual annotations of reviews that highlight (in synch with the charts) the evaluation dimensions of the experience, and (iii) a temporal selection of reviews.• A Baseline model that shows the textual reviews as in most booking and e-commerce platforms.
To build a strong baseline, we included in this model the date picker supporting the selection of the time frame of interest for the selection of the reviews to be inspected.
We asked each participant of the study to imagine that (s)he had to plan a holiday in the city of Florence and to select an apartment for renting.We explained that, using the two systems, (s)he should have analyzed the properties of the host, home, etc., based on the task to be solved, in order to select the apartment that (s)he would have liked to rent.
We aimed at separately evaluating INTEREST and Baseline but we wanted to minimize the learning effect on participants.Therefore, we formed the result list L by selecting four homes that were comparable from the viewpoint of a search query, and which also had a similar amount of consumer feedback in terms of lines of text.We investigated User Experience (UX) with the applications in two learning tasks, each one using a different apartment: • Task1: question answering using the functions provided by INTEREST, i.e., interactive charts, temporal and dimension-dependent review selection, and word highlighting.• Task2: question answering using the basic list of reviews (Baseline) with temporal filter.
The study was a within-subjects design one.We considered each treatment condition as an independent variable and every participant received the two treatments.We counterbalanced the order of tasks to minimize the impact of result biases and the effects of practice and fatigue.People joined the user study on a voluntary basis, without any compensation, and they gave their informed consent to participate in the study.The consent form described the research purposes, procedure, duration, confidentiality of data, sharing of results, right to refuse or withdraw, and contact persons.The user study took place live, in video calls with shared screen due to COVID-19 pandemic, i.e., we did not use any platform such as Amazon Mechanical Turk or similar to recruit participants or to manage the tasks of the study.

The Experiment
One person at a time participated in the study, which lasted approximately 30 min.We did not impose any time restrictions in solving the tasks and we allowed her/him to take as much time as (s)he needed in order to comply with diverse backgrounds and levels of confidence with technology.Then, we asked her/him to answer a pre-test questionnaire designed to assess demographic information, cultural background, and familiarity with booking and e-commerce platforms.
For each task of the study the participant used either the Apartment Monitoring implementation of INTEREST, or the Baseline application, in the context of the organization of a holiday in Florence: within the task, we asked her/him to look at the proposed homes and to answer two questions that required either scoring (thumb up or down) an evaluation dimension of an apartment, or identifying the content corresponding to the dimension within the available reviews.For each home, all participants answered the same questions, which were based on the following templates.
• Give a thumb up/thumb down evaluation of dimension of home provided by host in time f rame.
For instance, "Give a thumb up/thumb down evaluation of the surroundings of Toscanella apartment provided by host Francesco during the last year".• List the characteristics of dimension of home provided by host in time f rame.
For example, "List the characteristics of the host of Il Podestà apartment provided by host Max during the last six months." While the participant carried out a task, the experimenter took notes about her/his voice comments using the think aloud protocol [94]; we did not put any time restrictions on question answering.For the study we focused on User Experience (UX).
1.After the completion of each task, the participant filled in a post-task questionnaire to evaluate the model (s)he had used.We selected the Italian version of the UEQ questionnaire [95] that supports a quick assessment of a comprehensive impression of user experience covering perceived ergonomic quality, perceived hedonic quality, and perceived attractiveness of a software product.However, as UEQ does not cover user awareness and control, we extended it with three items aimed at investigating these aspects.For this purpose we took inspiration from the ResQue questionnaire for recommender systems [96].
Participants answered each item of our questionnaire by selecting a rating in a 7-point Likert scale.
In UEQ, questions are proposed as bipolar items, e.g., [annoying 1 2 3 4 5 6 7 enjoyable].Moreover, in order to check user attention, half of the items start with the positive term (e.g., "good" versus "bad") while the other ones start with the negative term (e.g., "annoying" versus "enjoyable") in randomized order.In order to support a uniform measurement of scales in the analysis of results, the ratings provided by users are mapped from −3 (fully agree with the negative term) to +3 (fully agree with the positive one).Questions correspond to individual UX aspects and belong to six UEQ factors that describe broader user experience aspects ("Attractiveness", "Perspicuity", "Novelty", "Stimulation", "Dependability", and "Efficiency"), plus the "user awareness and control" that we added.Table 3 shows the set of bipolar items of our questionnaire, grouped by factor, and displays the items we added in italics; for the specific ordering of questions see Figure 5. 2. After the completion of the tasks the participant filled in a post-test questionnaire aimed at capturing her/his overall experience and at comparing Apartment Monitoring to Baseline.In this case, (s)he had to select the model that best matched the questions reported in Table 4.These questions include an open one (Notes) to provide feedback for improving Apartment Monitoring.

Awareness and control
the system is able to describe renting experience/the system is unable to describe renting experience I am aware of the properties of the home/I am not aware of the properties of the home the system supports the selection of the home/the system does not support the selection of the home   3 for color mappings.

Demographic Data and Background of Participants
For the user study we recruited 38 participants (44.74% women; 55.26% men).Their age is between 18 to 54 years, with a mean value of 30.29.They are part of the university staff (researchers and professors) and students, as well as people working in the industry.In the pre-test questionnaire, we analyzed their background and familiarity with technology: 55.26% of participants have a scientific background, 13.16% a technical one, 26.32% humanities and linguistics, and 5.26% economics and law.Regarding the education level, 39.47% attended the high school, 44.74% university, 13.16% have a Ph.D, and 2.63% attended middle school.Regarding familiarity with technology, 39.47% of participants qualified themselves as experts in using technology, 42.11% as intermediate, and 18.42% as beginners.Moreover, 47.37% of people declared that they use e-commerce platforms or online booking services monthly, 28.95% about twice a year, 15.79% weekly, and 7.89% daily.

User Experience: Post-Task Questionnaire Results
Figure 5 shows the evaluation results concerning user experience with Baseline and the INTEREST model underlying Apartment Monitoring.The figure provides detailed results for each item of the post-task questionnaire.As these aspects are measured in the [−3, 3] scale, we consider as positive the values ≥0.8 and negative those lower than −0.8; the other ones are neutral.
• The Baseline model received some positive values related to the following factors.Perspicuity (easy to learn/easy), Dependability (secure), and Awareness and control (awareness of the properties of the home).However, it definitely has negative values concerning Novelty (dull, conventional, usual, conservative) and Stimulation (boring, not interesting, motivating).Furthermore, it has moderately negative values of Efficiency (slow, impractical, cluttered) and Attractiveness (unattractive); the other user awareness aspects are neutral.Notice that, even in the cases in which Baseline received a neutral or positive evaluation, the difference between its value and the one obtained by INTEREST is large and our model achieves the best results.
Table 5 shows the aggregated values obtained by averaging participants' answers by UX factor (Attractiveness, etc.).These results are particularly interesting because they describe the overall perception of ergonomic and hedonic qualities, attractiveness, and awareness and control, which are the main UX evaluation factors we consider.In the table, colored arrows help data interpretation: → denotes a neutral value, ↓ a negative one, and ↑ a positive one.We compared these results by means of a paired T-Test and the differences are significant (p < 0.01).From these results, it can be seen that Baseline was unsatisfactory as far as Efficiency, Stimulation, and Novelty are concerned, while it received neutral evaluations in the other factors.Differently, INTEREST obtained positive values in all UX factors, showing its superiority.Specifically, our model excelled in participants' perception of its Efficiency, Perspicuity, and Awareness and control, all of which got values very near to 2 o higher than that.The lowest value concerns Stimulation, which is well above 1.6, i.e., positive.In order to provide a more analytic view of participants' experience with the two systems we converted the UX results presented in Table 5 to the [0, 100] scale and we computed the relative difference between the obtained values.The third column of Table 6 shows these differences.It can be seen that INTEREST obtained dramatically higher results in what concerns Attractiveness, Efficiency, Stimulation, and Novelty, where it outperformed Baseline by more than 100%.Moreover, it outperformed Baseline in Perspicuity, Dependability, and Awareness and control by approximately 40-45%.While all such results are positive, those concerning Efficiency (+145%) show that participants perceived the user interface of Apartment Monitoring as much more supportive to information search than the Baseline one.Moreover, we are interested in the results concerning Dependability (+42.19%) and Awareness and control (+46.89%), which tell us that, with respect to a direct analysis of reviews, the mediation offered by the interactive charts and the evaluation dimensions employed in the INTEREST model highly improve user experience.The results of the post-test questionnaire (whose questions are reported in Table 4) are consistent with those of the post-task ones.Participants agreed that Apartment Monitoring made them save effort when they were solving the tasks of the experiment; moreover, they were more satisfied with this application than with the Baseline one.A large proportion (81.58%) of people said that Apartment Monitoring was easy to use, but some users agreed that a certain initial effort is needed to learn its functionalities.Almost all (97.37%) of the study participants stated that they would recommend our application to a friend and 100% said that they would use it again in the future to explore and analyze homes for renting.
In the open comments participants declared that keywords highlighting in reviews supports readability and that it helps in quickly finding the information related to evaluation dimensions in the visualized list of reviews.However, some people said that the initial impact with several colors creates a bit of confusion.

Observed Participants' Behavior and Collected Feedback
During the experiment, participants were asked to use the think aloud protocol [94]; thus, we were able to collect their comments, feedback, and suggestions about the systems they used.
Concerning the Baseline application, people observed that the list of reviews often includes too much text and this makes users skip some comments or only read the shortest ones.Moreover, it is difficult to overview the reviews and to find the needed information.In general, the Baseline application made users feel overloaded and confused, which is in line with the results obtained in the post-task and post-test questionnaires.
Regarding the Apartment Monitoring implementation of our INTEREST model, participants perceived some initial learning effort to get acquainted with the tools offered by its user interface.However they said that, after having been familiarized with it, the system is more useful and efficient in finding information than the Baseline one.We noticed that some users looked at the reviews corresponding to the bars of the histogram with the lowest satisfaction levels in order to go in-depth with the analysis of the poorest aspects of evaluation of the apartments.Moreover, some users first analyzed the histogram to check the general trend of the evaluation dimensions along time; then, in a second step, they read the reviews to gather more precise information about the home.Conversely, some users directly read the information corresponding to the highlighted keywords in the reviews, overlooking the chart.The latter behavior can be explained by the fact that people are more familiar with the list of reviews that is typical in booking and e-commerce platforms, than with diagrams.
Participants also gave some suggestions for future work.Specifically, they said that it would be useful to have a summary of the home reporting, e.g., the total number of reviews and its overall rating.In this way, users would be able to filter out the apartments that are below their own standards; for instance, users said that if a home had few reviews or a bad evaluation, they would not consider it at all.People also suggested a function to filter reviews by keyword in order to quickly find the comments dealing with specific topics.Finally, some participants felt confused about the initial impact of the set of colors used to highlight the words in the reviews.They suggested to have this functionality on demand, which is particularly interesting because it supports the idea of adapting explanations and visualization to individual needs, proposed in works such as [25].

Discussion
The positive results of the user study show that the INTEREST visualization model underlying Apartment Monitoring efficaciously supports the selection of items within a set of search results by empowering users to analyze item properties from a service-oriented perspective.Our model supports the presentation of item data by means of a user interface which the participants of the study considered as attractive, novel, stimulating, perspicuous, and dependable.From the viewpoint of user awareness and control, participants confirmed that the system was able to describe the experience to be expected from the homes, that they were aware of the properties of the homes, and that the system supported them in their selection.Thus, people felt in control during information exploration and decision-making.We believe that participants' perception of the Apartment Monitoring application strongly depended on two key elements: first, users were empowered to directly access the reviews dealing with the evaluation dimensions they were interested in; second, data interpretation was supported by the graphical visualization of dimension values and of relevant keywords from the reviews.In fact, the focus on specific evaluation dimensions and the direct access to reviews, which are graphically annotated to highlight their cues, support the overview of the available items and a quick acquisition of a first-hand opinion about them.The high perception of the system efficiency in the post-task questionnaires and the observations of people's behavior during the experiment (either using histograms first and then focusing on the selected reviews, or directly exploiting keyword highlights in the reviews to search for relevant information) support our hypothesis.
The lesson we take is that, while automated evaluation techniques (e.g., those developed in the recommender systems research) are crucial to select plausibly likable items for the individual user out of the plethora of available ones, her/his trust in these techniques can be enhanced by supporting a quick access and a deep inspection of explicit human judgments describing previous consumers' experience.This function enables the user to build her/his own personal opinion by focusing on the evaluation criteria (s)he is most interested in and by finding direct evidence of them in the reviews, with the help of the system.In this perspective, it is however important to develop an item evaluation model that mirrors as closely as possible the experience dimensions that consumers care about.In this respect, service design models such as the Service Journey Maps we used are key building blocks to be applied.We believe that the human perspective on item evaluation is a primary element of trust in the system suggestions because it reduces the mediation role of numerical ratings and it brings the user closer to the experience of the people who previously chose the same items, without using any other proxies to represent their judgments.This leads to promoting a coupled development of enhanced information filtering models and interactive information visualization ones complementing each other for a better user acceptance of both.
Even though the results of our study are very encouraging, they might be affected by limitations that we aim at addressing in our future work.First, the number of users involved in the experiment is limited and we plan to extend it by recruiting a greater number of more differentiated people.In this respect, we value the live contact with participants; thus, we will continue with one-to-one interactions, even though this is time-consuming, and we will not use any automated crowdsourcing platforms to recruit and manage users.Second, we tested our model within a single application domain, i.e., home booking; in our future work we will instantiate INTEREST in a different domain, e.g., Amazon products sales, and we will carry out a new user study to evaluate our model in that scenario.Another limitation of our work is the fact that we implemented the INTEREST model by means of a single version of the user interface, i.e., the one based on histograms.Other types of charts might be evaluated to check their impact on user experience; e.g., radars, value charts, and so forth.Moreover, other presentation models, e.g., focused on item properties instead of service-oriented evaluation dimensions, might be developed to compare our approach to feature-based ones (see, e.g., in [25]), with the aim of understanding whether people prefer to visualize high-level item properties, specific aspects, or a combination of the two in a more articulated user interface.
In our future work we will enrich INTEREST with functions suggested by the participants of the user study, e.g., the search by keywords within consumer reviews and the summary of the home properties.Moreover, we will extend our model to capture service characteristics in a finer-grained way and from the perspective of multiple stakeholders.Conceptual models and tools for service design take a holistic view of services [23] to represent the variety of entities involved in the related domains.As discussed by Jacobucci and Calder, "in more complex consumer experience [...] there may be literally hundreds of small elements of experience the consumer notices" [97].The aim of the Service Journey Maps is to model those features that compose the value delivered to and perceived by consumers in order to improve services and businesses to meet the needs of their users.While in the present work we used the SJMs to enhance decision-making in item selection, the same approach can be used to support the service provider in the analysis of consumer feedback at different temporal and content granularities.This kind of information can point to providers parts of the service/business that consumers perceive as bad or good; in turn, it helps improve the organization output in line with customer needs.For this purpose we plan to extend INTEREST to the service provider perspective by using Service Blueprints [98], a richer model than SJM.Service Blueprints go several steps deeper in service description by combining the customer's experience with all employee actions, support processes, and physical resources that may or may not be visible to her/him.

Conclusions
We described the INTERactivE viSualizaTion (INTEREST) information presentation model, which supports the exploration of item properties from a holistic, service-oriented point of view.This model is based on the intuition that, when choosing an item out of a list of results, the user should be aware of its properties by considering the overall service to be used (from item selection to delivery and after sales assistance), as well as previous consumers feedback.INTEREST is based on the Service Journey Maps for the design and description of user experience with services.These maps support the identification of high-level evaluation dimensions that are used in our model to provide a visual summary of item properties, tightly connected to the item reviews used to evaluate the items.We designed INTEREST as an agnostic information model that ignores the algorithm applied to generate the search results; as such, it can be connected to any information filtering system to present its results in a transparent way.We applied INTEREST to the home booking domain by developing the Apartment Monitoring application that supports the overview and analysis of online reviews about rented homes.A user study involving 38 participants has shown that this application outperforms a temporal presentation of consumer feedback in user experience, user awareness, and user control.This encourages the exploitation of our model to present search results provided by information filtering systems.

Figure 2 .
Figure 2. Pipeline of activities for the analysis of an individual review.

Figure 3 .
Figure 3. Apartment Monitoring: visualization of consumer satisfaction about a home regarding the host appreciation, in-apartment experience, and surroundings, during the last year.

Figure 4 .
Figure 4.The upper portion of this figure shows the Service Journey Map related to the home booking experience.Each stage of the map is connected to the associated evaluation dimensions.Evaluation dimensions are colored as in the user interface of the Apartment Monitoring application.

Figure 5 .
Figure 5. Post-task questionnaire results with items sorted as in the administered survey.The histogram shows the comparison of Baseline and INTEREST models on individual UX aspects.The colors of each bar represents the UX factor to which the bipolar item belongs, see Table3for color mappings.
• A prototype system (Apartment Monitoring) obtained by instantiating INTEREST in the home booking domain.• Validation results of our model within a user study with real users.

Table 1 .
Overview of the support to the analysis of results offered in the information exploration and recommender systems (RS) areas.

Table 2 .
Evaluation dimensions and sample lemmatized keywords for the thesauri of the home booking domain.TableA1in the Appendix A shows the complete thesauri.

Table 3 .
Post-test questionnaire: UX aspects grouped by factor.We consider the 6 UEQ factors plus an extra one (in italics) that describes user awareness and control.The last three items are translated from the Italian language.

Table 4 .
Post-test questionnaire translated from the Italian language.able to describe renting experience/the system… I am aware of the properties of the home/I am not aware of… the system supports the selection of the home/the system… Table A2 in the Appendix A shows detailed numeric values.• INTEREST, in its Apartment Monitoring implementation, received positive values in UX aspects, with a slightly weaker evaluation of Dependability (predictable) with respect to the other values, see Table A3 in the Appendix A for details.

Table 5 .
Post-task questionnaire results: comparison of Baseline and INTEREST on UX factors.The second and third columns show the mean aggregate rating of each factor.Values are obtained by averaging the ratings given by participants to individual UX aspects: → denotes a neutral value, ↓ a negative one and ↑ a positive one.Significance is encoded as (*) p < 0.01.

Table 6 .
Post-task questionnaire results in a [0, 100] scale with relative difference between Baseline and INTEREST on UX factors.The colored arrows have the same meaning as in Table5.

Table A3 .
Post-task questionnaire results for Apartment Monitoring; we use the same notation as in TableA2.