The pervasiveness of computers allows them to communicate with humans anytime and anywhere. As a consequence, humans are often surrounded by a huge amount of information deriving from automatic computation. There are two big issues arising from this scenario, which are the information overloading and the comprehensibility of the results. In many situations, multimedia communication can help people to overcome these problems. For instance, infographics have shown their applicability in a number of contexts [1
], but it has been proven that graphics are useful only for trained users [2
]. As a consequence, computers can effectively communicate with not-trained users only by using the most sophisticated technology that nature gave to human beings, that is natural language.
Indeed, a number of studies have proven a higher comprehensibility of natural language with respect to graphics in some specific technical communications. For instance, in the medical domain, an experiment with human evaluation showed that it was possible to automatically generate helpful natural language summaries for physicians and nurses from an electronic patient record system in a neonatal intensive-care unit [3
]. A similar experimentation in an intensive-care unit confirmed the utility of natural language descriptions for clinicians [5
]. In a different domain, that is weather forecasts, the automatic generation of messages concerning uncertain weather data can improve the performances of the users participating to a simulation experiment [6
The use of natural language for human–computer interaction is the standard modality of interaction in the case of virtual assistants. Virtual assistants can play an important role in fulfilling a virtuous behaviour in many human activities by providing a stimulus when needed—Fogg called it kairos [7
]. For example, when a person goes to a restaurant and is presented with a menu, she/he might not know how to make a correct decision according to his/her diet. A virtual dietitian, that is a virtual assistant in the diet domain, might prove to be useful by providing three facilities. First, it enhances the users’ abilities to recognize healthy dishes by exploiting reasoning mechanisms. Second, it provides a persuasive stimulus at the right time, i.e., when users must choose the dishes to eat. Third, it helps users in devising the consequences of a diet transgression [8
The great importance of persuasive technologies for health and wellness is proven by the amount of studies on this specific topic, e.g., the survey [9
] analysed 85 works published between 2000 and 2015. In particular, the eating domain represents a major application of persuasive technologies: in [9
], twenty-five percent of the analysed works regarded the eating domain.
This paper addresses persuasive natural language message generation (NLG) in the domain of dietary regimens. We describe the implementation of the NLG module of the diet management system called MADiMan (Multimedia Application for Diet Management) [10
]. The goal of MADiMan is to build a complete computer infrastructure for helping people to follow a healthy diet.
In previous work, we discussed the general architecture of the MADiMan system [10
] and the capacity of the reasoning module with a number of simulation experiments with virtual agents based on hospital menus [12
] and on Mediterranean menus [13
]. The research questions that we explore in this paper concern the NLG module of MADiMan, which has been partially previously described in some preliminary work [15
]. In particular, in this paper, we want to investigate the design, the implementation and the evaluation of (1) the optimal linguistic shaping in the message generation and (2) the persuasive power of the messages that should guide users toward an optimal dietary choice. We answer both of these research questions by describing the NLG algorithms and by designing and performing two simulation experiments with humans. In fact, in order to evaluate the system, we used quantitative methods based both on questionnaires and on logged data analysis. In this regard, it is worth noting that a recent survey on persuasive technologies [9
] reports that “the most commonly used approach for collecting quantitative data was questionnaire/survey”, while only a few studies (
) used logged data analysis.
The paper is structured as follows. In Section 2
, we discuss the related work. In Section 3
, we give a description of the MADiMan system and, specifically, a detailed description of the NLG module. In particular, in Section 3.1
, we introduce the MADiMan project, and in Section 3.2
, we introduce the reasoning framework based on simple temporal problems (STPs). In Section 3.3
, we illustrate in detail the NLG module of the MADiMan project. In particular, Section 3.3.1
regards the data interpretation and content selection process that converts into a symbolic form the numeric output of the reasoner; Section 3.3.2
regards the design of the messages produced by the realization engine; Section 3.3.3
illustrates two algorithms that we use for message aggregation. In Section 4
, we discuss the experimental setting used for the simulation experiments with humans. In particular, in Section 4.1
, we describe the first experiment designed to quantitatively evaluate message appeal by varying two specific linguistic features, i.e., the aggregation strategy and the lexical choice procedure. Both of these features influence the compactness and the variety of the messages. In Section 4.2
, we describe a second experiment primarily designed to quantitatively evaluate the persuasive power of the messages. For both experiments, we show the results of a human-based evaluation performed with questionnaires. Moreover, for the second experiment, we provide also a measure of the textual message’s persuasive power obtained by logging the behaviour of the users in reaction to the messages. Finally, Section 5
closes the paper with some concluding remarks and future work.
A preliminary version of this paper has been published in the proceedings of the International Conference on Natural Language Generation 2018 (INLG) [16
2. Related Work
In the literature, there is an increasing number of projects concerning the application of NLG for supporting people to adopt a virtuous behaviour, e.g., [17
]. For instance, in the pioneering work [17
], a tailored email was automatically generated and sent with the aim to help people with quitting smoke. The basic idea was to characterize a user on the basis of the answers given to a form and to personalize the email by using this information. The experimentation, performed with control groups, showed that tailoring did not give any added value to the persuasive power of the messages, and, in the authors’ opinion, many factors in the design of the NLG system could cause this result.
Many works tackled the application of NLG for presenting the results of automated reasoning, e.g., [22
]. Furthermore, many theoretical works on the design of persuasive textual and multimedia messages have been recently proposed. We can split these studies into two classes. The first class is composed by works dealing with the persuasion from an empirical point of view, by using strategies and methods typical of the psychology and of the interaction design [7
], while the second class is composed of works dealing with the persuasion from a theoretical point of view, exploiting strategies and methods common in cognitive science [26
]. For instance, in [25
], Cialdini identified six characteristic human patterns: (1) reciprocity: people feel obligated to return a favour, (2) scarcity: people will value scarce products, (3) authority: people value the opinion of experts, (4) consistency: people do as they said they would, (5) consensus: people do as other people do and (6) liking: we say yes to people we like. A different perspective on persuasion was considered in [27
], where low-level linguistic strategies, e.g., the use of adverbs, were considered.
Guerini et al.’s persuasive strategies taxonomy [28
] proposes a classification that starts from belief-desire-intention (BDI) modelling of the agent involved. In this way, the authors were able to decompose and organize the various components that play a role in persuasion actions.
], the authors presented a framework for designing and evaluating persuasive systems, i.e., the persuasive systems design. The authors defined the development of a persuasive system as a three-step process: (1) understanding the persuasion process, (2) analysing the persuasion context and (3) designing the system qualities. Moreover, they proposed twenty-eight design guidelines for designing persuasive systems. Many of the guidelines proposed in [29
] have been followed by the MADiMan system. In particular, the reduction effort principle is followed since MADiMan simplifies the analysis of a meal; the social role principle is followed since MADiMan impersonates a dietitian; the expertise and verifiability principles are followed since the MADiMan message is based on numerical computation.
MADiMan carries out numerical computation integrating food energy values with diet requirements and then presents the result of such computation in natural language. A key point is the composition of information regarding the different macronutrients. Thus, we could classify the MADiMan NLG module as a data-to-text system, which can be seen as a sort of transformation channel that converts numeric information into textual information [30
]. Reference [31
] is a recent survey of data-to-text technology with a focus on the healthcare setting. The authors considered several kinds of data sources and a categorized overview of NLG applications for transforming these data sources to text. MADiMan particularly fits the “data-to-text for patient engagement” category, that is empowering users in making their own choices on diet. Indeed, the idea to present a message conveying information about the appropriateness of a specific meal allows for “…informing patients adequately about their health status and treatment options for building up trust between patient and doctor” [31
Four recent data-to-text systems, similar to MADiMan in motivations and techniques, were described in [20
], the authors presented “a motivational platform for supporting the monitoring of users’ behaviours and for persuading them to follow healthy lifestyles”, which is a flexible computational architecture to merge general principles about persuasion strategies with specific domain knowledge in order to guide users toward a healthy behaviour. The authors provided a case study based on food intake and physical activity that implemented an ontological reasoner that checked the fulfilment of a number of logical constraints. Moreover, reference [32
] adopted a template-based generation of tailored messages based on a rich taxonomy of persuasion strategies. In contrast with [32
], the data managed by MADiMan are essentially numeric and the reasoning module is based on numeric constraints rather than on ontologies. Moreover, MADiMan adopts a linguistically sound NLG pipeline rather than linear templates.
], a weekly report regarding a user’s car driving style was generated by using telematic data collected from an accelerometer and a GPS receiver. Similar to MADiMan, (i) some persuasive strategies inspired by captology (computer as persuasive technology [7
]) were used and (ii) a complete linguistically-sound NLG pipeline was applied; moreover, (iii) the SimpleNLG realizer was used. However, in contrast to MADiMan, where the domain concerns wellness and the messages are generated for each meal of the week in a simulation context, a report was generated weekly, and the experimentation was conducted in a real-world scenario for five weeks.
], the authors reported the results of a quantitative and qualitative evaluation of How was School Today?, an NLG system for helping the verbal communication of children with complex communication needs. The system was able to structure, verbalize and pronounce a personal story based on a number of records concerning the daily activity of a person. They used a very simple NLG pipeline based on fixed schema for sentence planning and the use of SimpleNLG for realization (cf. Section 3.3.2
). In particular, reference [33
] proposed three different clustering strategies for selecting and merging the daily events in the story. The three strategies were based on the time, the location and the voice recordings of the events. They evaluated the strategies by asking the parents and grandparents of the children that took part in the experimentation to complete a questionnaire. The experimentation showed that the voice-recording clustering was perceived as the preferred strategy for event clustering. Note that, similar to the experimentation presented in this paper (see Section 4
), reference [33
] focused on the NLG engineering problem of aggregation. However, in contrast to our approach, which focuses on the syntactic aggregation of sentences, reference [33
] used clustering for the semantic aggregation of the events.
], a customized linguistic report on household energy consumption was bimonthly generated with the aim of helping people to save energy. By using theories from the NLG field, from the computational theory of perceptions and from fuzzy sets, the authors provided a detailed description of a case study based on real databases and attitudinal and physical taxonomies. In particular, they used taxonomies to personalize a number of report templates, which generated linguistic suggestions to improve the daily energy consumption. Therefore, in contrast to MADiMan, the persuasion strategy was based on user tailoring rather than on general theories of linguistic persuasion.
In this section, we describe two experiments designed and performed involving human users to evaluate the MADiMan system paying specific attention to the NLG module. The main goal of these experiments was to evaluate the usefulness and the persuasive power of the generated natural language messages to communicate the output of the reasoning to users by considering the possible linguistic shapes of the messages. To this aim, we designed a diet simulation. It is worth noting that, while an evaluation of the real efficacy of the persuasion power of automatically generated messages should follow the scientific standards of medical research [17
], as addressed by some researchers in the human-computer interaction field [54
], also non-medical trials can give important feedback particularly when the design is in the early stages or when new technologies have to be evaluated.
To create a realistic experimentation, we designed and realized a mobile app called CheckYourMeal!
is not available yet as a commercial app since it is still under development and used for research purposes only. CheckYourMeal!
provides many standard functionalities of the quantified self domain app, such as registration of username/password, login and insertion of personal and anthropometric data. Note that the underlying user model is based on anthropometric data, which are gender, age, weight and physical activity level. Indeed, these data are necessary to compute the energetic requirement (by using the Schofield formula; cf. [12
]) that is used to compute the DRVs (cf. Section 3.2
). Other data, such as food allergies, intolerance and preferences, are collected, but not actually used in the current version of the app.
The main goal of the CheckYourMeal!
app is to help users in managing their diets. The week is scheduled as 21 slots to fill, with three meals per day from Monday to Sunday. For each slot of the week, the user is presented with a range of possible menus, and she/he decides which menu to eat. Then, she/he is provided with feedback about the compatibility of a specific menu both in graphical and textual forms. In Figure 8
, we show a screenshot of such feedback. The graphical feedback is provided by (i) a pie chart showing the energetic contents in the three macronutrients and (ii) three histograms showing their ideal values for that specific slot of the week. The textual feedback is provided by two sentences automatically generated containing the overall evaluation and the macronutrients’ evaluation, respectively. The CheckYourMeal!
interface, as well as the NLG module, can use both Italian and English languages, but we used only Italian for all the experiments described in this paper.
We asked the users to interact with CheckYourMeal! by considering a simulation context. The users should imagine eating for a period of time in a restaurant, and for each meal, they have to choose what to eat among the menus proposed by the app. Moreover, we asked the users to engage with the CheckYourMeal! app for at least 15 minutes of their real time, choosing the menus of two weeks of simulated time. In other words, the users had to choose each day their breakfast, lunch and supper for a total of 42 choices.
In the simulation, the menus were randomly generated by considering the recipes of the Gedeone database, which is a collection of recipes annotated with their nutritional contents [13
]. The Gedeone database is a relational database (realized with PostgreSQL) containing the recipes originally stored in the Gedeone website (http://www.gedeone-e-coop.it
), and it consists of 500 traditional Mediterranean recipes. We decided to use this specific recipe book for a number of reasons: (1) it is in electronic form suitable to a structured representation; (2) it contains the ingredients and also the composition in terms of macro-/micro-nutrients; (3) recipes are described in terms of simple atomic preparation steps; (4) it contains a number of interesting metadata such as difficulty, required time and cooking methods.
For instance, Gedeone contains the following metadata for four servings of carbonara spaghetti:
ingredients: 320 grams of spaghetti, 100 grams of bacon, 1 egg, 4 tablespoons of Parmesan cheese, 1 tablespoon of extra virgin olive oil, 4 teaspoons of salt;
macronutrients: 17.9 grams of proteins, 26.6 grams of lipids, 66.6 grams of carbohydrates;
cooking methods: frying, cooking in boiling water.
Notice that the recipe analysis and database population were done offline with respect to the experimentation.
We built a complete menu by considering as a template the composition of a traditional Italian meal (lunch and supper). This is a simpler form of the concept of menu pattern used in [56
], where the generation of random menus was modelled by using common sense about the contextual constraints concerning the various kind of foods and their use. The menu template we used in the experiments is composed by a first course (primo, e.g., soup, pasta, pizza), a second course (secondo, e.g., meat, eggs, fish, cheese), a side dish (contorno, e.g., vegetables), a dessert, fruit and bread. We also added a typical Italian breakfast (e.g., coffee, tea, bread, jam, butter, milk, biscuits) to the Gedeone database. Note that during experiments, users were not required to have all courses in a meal.
In a preliminary phase, we conducted a small pilot study with three participants to guarantee that the tasks of the experimentation were understandable and that they could be performed in a reasonable amount of time. The results of this pilot study are not further considered in the following.
4.1. Experiment 1
In this section, we report the hypotheses, materials and methods and results regarding the first experiment with CheckYourMeal!. In this experiment, involving people who were not experts in the diet domain, we wanted to have a first quantitative measure about the utility of the system and of the appeal of the produced messages.
In Experiment 1, we tested two hypotheses. The first hypothesis is related to the usefulness of graphics and natural language messages to communicate the output of the reasoning to users. The second hypothesis is related to the possible linguistic shapes of the messages. In particular, the second hypothesis focuses on the appeal of the messages by varying the aggregation strategies (see Section 3.3.3
Graphics and text messages are both perceived as useful for making the right choice.
The set+VP aggregation strategy (violet version) is perceived as better by users with respect the all+VP aggregation strategy (blue version).
4.1.2. Materials and Methods
Twenty users participated to Experiment 1, eight females and twelve males. The users were students and researchers in computer science that accepted a personal invitation to participate in the experiment without rewards. Eight users were between 18 and 40 years old and twelve were over 40 years old. All the participants were Italian native speakers, and fifteen of them had no experience with apps regarding diets before this experimentation.
We prepared an instruction sheet with a description of the simulation context and of the main objectives of the experiment. We explained the basic mechanism underlying the reasoner (i.e., diet transgressions and compensation, persuasion). We also explicitly informed the users that we wanted to compare two different versions of the message generator, the blue version and the violet version, giving no other information about the specific qualities that we wanted to test. The blue version consisted of the all-VP aggregation strategy, and the violet version consisted of the set+VP aggregation strategy. We believe that with this briefing of the users could pay more attention to the linguistic aspects of the textual feedback. We also asked the users to try a feature called variable lexicon (see Section 3.3.4
). We explicitly informed the users that this feature was not an experimental goal. Users played with the app for one simulated week using the blue version and for another simulated week using the violet version, randomizing the first version with which they started. Finally, users were asked to answer a questionnaire composed of 24 questions: 8 were multiple choices questions regarding personal data; 4 were Likert questions regarding the app in general and the lexicon; 9 were Likert questions regarding the blue and violet versions of the messages in the app; finally, 3 were open questions regarding suggestions for possible improvements of the app, the perceived feeling and the lexicon. For all Likert questions, we used a Likert scale from 1 (indicated as I totally disagree) to 5 (indicated as I totally agree).
We had two questions in the questionnaire regarding usefulness, which were (translated from the original Italian questions):
(Graphics’ usefulness) The graphics on macronutrients are useful to make the right choice.
(Messages’ usefulness) The text messages on macronutrients are useful to make the right choice.
Note that we decided to separately evaluate the usefulness of graphics and text because we think that these factors are not necessarily dependent on each other.
For testing Hypothesis 2, we compared four specific properties of the messages, which were boringness, usefulness, easiness and perceived persuasiveness. The questions were (translated from the original Italian questions):
Perceived boringness: The text messages in the blue version are more boring than the text messages in the violet version.
Perceived easiness: The text messages in the blue version are easier to understand than the text messages in the violet version.
Perceived usefulness: The text messages in the blue version are more useful than the text messages in the violet version in order to make the best choice.
Perceived persuasiveness: The text messages in the blue version are more persuasive than the text messages in the violet version.
The first hypothesis of Experiment 1 concerned the utility of graphics and text messages as perceived by the users. In Table 2
, we report the distribution of the answers for all the Likert questions on the form. In Figure 9
, we graphically represent the distribution for questions GU and TU. For GU, the mean (here and in the following, we consider the points in the Likert scale as equidistant) was
and the standard deviation
; for TU, the mean was
and the standard deviation
. We tested significance by t
-tests considering whether the mean answer had a numeric value
for questions GU and TU (thus, indicating that users deemed useful the textual messages and the graphics), and we obtained the two-tailed p
for GU and TU, respectively. Thus, we can conclude that most users think that both graphics and messages are useful; moreover, the distribution of the answers suggests that they have a preference for textual messages.
The results concerning Hypothesis 2 are reported in Figure 10
, and in Figure 11
, we report the distribution of the answers to the four questions, QB, QE, QU and QP. The figures show a quite clear preference for the violet version, which pursues the set+VP aggregation strategy, with respect to the blue version, which pursues the all+VP aggregation strategy.
In other words, for all four properties, which are boringness (mean = , SD = ), easiness (mean = , SD = ), usefulness (mean = , SD = ) and persuasiveness (mean = , SD = ), the shorter messages generated by the set+VP aggregation strategy were preferred with respect to the longer messages generated by the all-VP aggregation strategy. Indeed, we tested the statistical significance of the preference for the violet version with respect to the blue one. We tested significance by t-tests considering whether the mean answer had a numeric value <3 (the users leaned toward the violet version) for question QE, QU and QP and >3 (the users leaned toward the blue version) for question QB. We obtained the two-tailed p-values , , and for QB, QE, QU and QP, respectively; thus, it is possible to state that the users prefer the violet version over the blue version in a statistically significant way as regards the QB, QE, QU and QP questions.
4.2. Experiment 2
In this Section, we report the hypotheses, materials and methods and results regarding the second experiment with CheckYourMeal!. In this experiments, involving people who were experts in the diet domain, we wanted to confirm some hypotheses in a less controlled experiment and to have a quantitative measure about the persuasiveness of the produced messages.
In Experiment 2, we tested two hypotheses as well. The first hypothesis was again related to the linguistic appeal of the blue/violet versions of the messages. The second hypothesis was related to the measure of the persuasive power of both message versions.
The violet version is preferred by users also in a less controlled environment.
Both the blue and violet versions of the messages have a measurable persuasive effect on the users.
4.2.2. Materials and Methods
In order to have an evaluation more oriented toward the real usage of the system, we conducted Experiment 2 by changing some experimental parameters. First, in order to have an ecological validation [57
] of the system, we conducted Experiment 2 in a noisy and less controlled environment, which is more similar to a real context of use of the app. Second, we increased the number of users by conducting the experiment on 39 users, 24 females and 15 males. Moreover, we chose users that were familiar with the diet domain, i.e., students and teachers of the degree course in dietetics of the University of Turin, Italy. Thirty-seven users were between 18 and 40 years old, and two were over 40 years old. All the participants but two were Italian native speakers (however, all users were fluent in Italian), and 14 of them had no experience with apps regarding diets before this experimentation. The users accepted a personal invitation to participate in the experiment without rewards.
In order to test Hypothesis 3, we modified the questionnaire of Experiment 1 by substituting the Likert scale questions on the blue/violet versions of the app, with three multiple-choice questions (
; cf. Appendix A
), presenting to the users three sample messages both in the blue and the violet version, and we asked them which version they preferred. In this way, we could be sure that the users correctly identified the blue and violet versions when answering the form questions despite the noisy non-controlled environment where we conducted the experiment.
Hypothesis 4 is related to the persuasion capability of the CheckYourMeal! app by measuring the rate of suggestions accepted by the users. Indeed, in Experiment 1, QP asked users to express their feelings about the persuasiveness of the violet version with respect to the blue version. In Experiment 2, we wanted to objectively measure the effect of the messages by assessing the effective behaviour of the users. With this aim, we logged the actual actions of the users in the app, i.e., whether they chose to eat a menu after visualizing a message.
We assumed that the persuasive power of the system could be formalized by considering the decisions of the users in choosing the menus. In fact, the users were presented with a list of the possible menus for a specific meal slot (see Figure 8
), and then they could decide to visualize the details concerning a specific menu. At this point, the app will present them the messages described above concerning the compatibility of the specific menu with the diet. After reading the messages, the users can decide either to confirm the menu or to discard it and “backtrack” to the menu list to choose another menu.
Thus, to quantify the persuasive power of the system, we recorded two values:
, the fraction of times that the users, after a positive feedback message (e.g., “This menu is a great choice...”), did choose the menu and
, the fraction of times that the users, after a negative feedback message (e.g., “This menu is not good ...”), did not choose the menu.
With these values, we measured both the positive persuasive power, by which the users followed the positive recommendation and decided to choose the menu, and the negative persuasive power, by which the users followed the negative recommendation, changed their mind and decided to not choose the menu.
In Table 3
, we report the distribution of the preferences of the 39 users with respect to the violet and blue version. We tested significance with Pearson’s chi-squared test (with Yate’s correction for continuity), and we obtained p
s. These data from Experiment 2 confirmed the results of Experiment 1, that is a clear preference of the users for the violet version.
In Table 4
, we report, for each version of the app, the value of
for the blue version and
for the violet version) and the value of
for the blue version and
for the violet version). Note that the number of positive messages is almost twice the number of negative messages. This fact means that users often visualize menus that are more compatible with their diets; this could be explained as a consequence of the app interface that sorts the menus in a scrollable list where, at the top, there are the menus most compatible with a user’s diet considering the previous meals and the user’s data and, at the bottom, the menus that are less compatible (cf. Figure 8
). We also determined the overall persuasive power as the micro-average between positive and negative persuasive power (
for the blue version and
for the violet version).
By assuming that a naive baseline for the persuasive power could be a random guess, that is
values, we tested the statistical significance of the results in Table 4
by applying the standard Pearson chi-squared test. We obtained significance at
for the positive persuasive power both for the blue and violet versions. In contrast, for the negative persuasive power, we did not obtain significance, neither for the blue nor the violet version. Moreover, by considering the average value of the persuasive powers, we obtained significance at
for the blue version and at
for the violet version. We can conclude that there is a statistically significant effect in encouraging a good choice with respect to discouraging a bad choice.
The experimentation had three major themes, which were (i) the utility of the graphics and of the textual messages, (ii) the linguistic appeal of the text message as a function of the aggregation strategy and, finally, (iii) the persuasive power of the textual messages.
The results on Hypothesis 1 showed that users consider both graphics and texts useful for managing their diets. The users’ agreement values reported in Table 2
and Figure 9
for GU (graphics utility) and TU (textual utility) statistically confirm that the users appreciate both graphics and text. Moreover, they clearly show users’ preferences toward text messages. Both these results confirm that multimedia application has great power in human-machine communication, and once again, we mention the key role of natural language in human comprehension. Therefore, in real applications, graphics and texts can be used simultaneously or independently for communicating specific information on diet.
The results on Hypotheses 2 and 3 showed that the shorter (violet) version of the messages was preferred with respect to the longer (blue) version. The users’ numeric preferences for the shorter version, as reported in Table 2
and Figure 10
for boringness, usefulness and easiness, have practical and theoretical importance. Indeed, on the one hand, these results give hints for developing better natural language interfaces, and, on the other hand, they confirm the previous experiments in the domains of healthcare [45
] and education [46
]. The bias for the violet version was confirmed in Experiment 2 with a different group of users and with a different way of measuring the preference (Table 3
With the aim to have a deeper view of the linguistic appeal of the messages in Experiment 1, we decided to analyse as a post-hoc hypothesis the results of the Likert scale question concerning lexicon variability. Indeed, we asked users to answer a supplementary question that was not the core argument of the experimentation regarding the words in the messages, that is: the “variable lexicon” option makes the use of the app more enjoyable. (QV, 1 = I totally disagree and 5 = I totally agree.). In Figure 12
, we report the distribution of the answers for QV (mean =
, SD =
). Despite the distribution of the answers seeming to indicate a preference for random lexical variations (the p
-value for >3 is
), a specific experimentation is necessary to validate this result.
With respect to the theme of the persuasive power, we conducted the two experiments with two distinct goals and with two distinct measures, respectively. In the first experiment, we were interested in the influence of the aggregation strategy on the persuasive power and asked users to give their subjective score on persuasive power. We note that the results in Table 4
and Figure 11
show that the aggregation strategy plays a role in the persuasive power. Indeed, we can state that both the violet and blue versions of the app have a measurable persuasive effect, but, in contrast to the human judgement expressed in the questionnaire, the blue version seems to have a greater objective persuasive power with respect to the violet version when computed with users’ behaviour (
). Moreover, a point emerging from the results of the second experiment is the difference between the positive and the negative persuasive power (Table 4
). A possible explanation of this point can be found in the free comments section of the questionnaire: some comments pointed out that the repetition of the predicate, typical of the all-VP aggregation strategy used by the blue version, gives a judgemental or blaming attitude to the virtual dietitian. Therefore, an intriguing speculation is that the two aggregation strategies have an appeal depending on the polarity of the messages. This speculation is in agreement with the claim that “changing a previous attitude is harder than originating or reinforcing an attitude” [29
], and should be investigated in future research.
5. Conclusions and Future Work
In this paper, we described the main features of a data-to-text generator in the diet management domain. We described the main components of MADiMan and detailed the design and the implementation of the NLG module. To the best of our knowledge, the MADiMan system presents many aspects of novelty with respect to commercial diet apps both in the reasoning module and in the NLG module. In the reasoning module, the numerical representation with STPs of diet and food allows for flexibility in the diet management. In the NLG module, the use of a linguistically-sound NLG architecture consisting of the document planner, sentence planner and realizer modules allows for a simple customization of the messages. Indeed, we exploited such a property by designing and implementing two distinct aggregation strategies that drive the compactness of the messages. Finally, we described the details of a human-based simulation for the evaluation of the NLG module by using the CheckYourMeal! app. We conducted two experiments with two distinct groups of people.
Experiment 1 involved 20 users (students and researchers in computer science). After a simulated use of the system, the users answered a questionnaire containing questions on the usefulness of the graphics and text messages and on the boringness, usefulness, easiness and persuasiveness of the app comparing the two aggregation strategies. By analysing the results, the users showed their preferences for both textual and graphical presentation of information regarding the diet. Moreover, by considering the perceived properties, the experimental results showed that users prefer more compact messages obtained with a complex aggregation strategy with respect to longer messages.
In Experiment 2, which involved 39 users (professional dietitians and students of dietetics), the experimental results obtained with questionnaires confirmed the appeal of more compact messages. Moreover, in order to quantify the persuasive power of the system, we collected logged data of users’ behaviour for measuring the users’ acceptance rate of the generated textual messages. The results suggested that in diet management: (1) the longer messages had a little more persuasive power than the shorter messages, and (2) both versions of the messages were more persuasive in encouraging a positive behaviour with respect to discouraging a negative behaviour.
In future work, we intend to replicate the experimentation on a larger number of users. In particular, we intend to evaluate the system with respect to the feature of the variability of the lexicon, which in this work has been only superficially investigated. Another point that we intend to test in future versions of the system regards the possibility to have some form of syntactic variation, as the use active/passive form in verbs. Indeed, in contrast to lexical variation, this kind of sentence variation is related to the topic/focus and the rhetorical structures of the message and so should primarily be considered in the document planning phase.
A possible idea for future work for improving the persuasion power is to exploit NLG by enriching messages with sentences describing the consequences of a bad choice. Indeed, the reasoning system of MADiMan can quantify the restrictions in future meals that allow users to still achieve their dietary goals despite a violation [12
]. Therefore, the NLG module could generate simple messages describing these restrictions, e.g., “... but tomorrow you cannot eat the cake”.
Another research question that we intend to address regards the explainability of the answers. Indeed, we can exploit the major comprehensibility of natural language with respect to infographics (cf. [2
]) for explaining the evolution of the diet constraints during the week. For tackling such an issue, we intend to exploit different sources of information. First, we intend to use the information regarding the past meals that the users have eaten during the week. Second, we want to use external information from domain ontologies on food. Indeed, the information on the food domain (e.g., [58
]) could be used to discover and communicate connections among meals. Third, we want to augment the reasoning and generation system with the concept of the Mediterranean diet [14
], which relies on more qualitative constraints that must be combined with the quantitative constraints on the macronutrients. Finally, we want to enrich the explanation with a suggestion on the best dishes to eat in the future meals.
In the actual version of MADiMan, we assume a prior formalization of recipes in terms of quantitative measures of ingredients. We believe that this assumption is consistent with the recent trend of many big restaurant chains that allow customers to download the precise nutritional values of their dishes with the aim of improving their customer retention. Moreover, as a future work, MADiMan could be coupled with a specific computer vision module determining ingredients and nutritional information starting from a picture of a dish taken with a smartphone (see, e.g., [60
Finally, in future versions of the system, we intend to investigate the possibility to account for constraints arising from allergies and similar medical situations. In a recent work [14
], we proposed to exploit an ontological modelling of ingredients and recipes that can play an important role in the extension of MADiMan for allergies. For instance, if a user is allergic to legumes, the system can exclude “pasta with beans” by recovering the information that (1) “pasta with beans” contains beans and (2) beans are legumes.