Construction and Evaluation of QOL Specialized Dictionary SqolDic Using Vocabulary Meaning and QOL Scale

: Agents that build interactive relationships with people can provide appropriate support and generate behaviors by accurately grasping the state of the person. This study focuses on the quality of life (QOL), which can be assessed multidimensionally, and aims to estimate QOL scores in the process of human interaction. Although vision-based estimation has been the main method for QOL estimation, we proposed a new text-based estimation method. We created a QOL-speciﬁc dictionary called SqolDic, which is based on large-scale Japanese textual data. To evaluate the effectiveness of SqolDic, we implemented a system that outputs the time-series variation of a user’s conversation content and the QOL scores based on it. In an experiment for estimating the content of user conversations based on a QOL scale by inputting data from actual human conversations, we achieved a maximum estimation accuracy of 91.2%. Additionally, in an experiment to estimate QOL score variability, we successfully estimated the mental health state and one of the QOL scales with a smaller distribution of error than that in previous studies. The experimental results demonstrated the effectiveness of our system in estimating conversation content and QOL scores as well as the effectiveness of our newly proposed QOL dictionary.


Introduction
Building long-term relationships between humans and agents is one of the major subjects in human-agent interaction (HAI). However, it is not as easy to build long-term relationships in HAIs as it is among humans.
In the field of welfare for the elderly, it has been reported that continuous interaction with robots makes the elderly more active and improves users' communication, self-care, and social life [1,2]. Therefore, interactive robots for the elderly are expected to become widely used as one of the effective means to improve the well-being of the elderly. As Japan has already entered a super-aging societal phase, their elderly welfare facilities are no exception to this trend, and they are increasingly promoting the use of ICT. One of the trends in the welfare of the elderly is the introduction of communication robots such as Paro [3] and Palro [4]. However, while the number of adoptions has been increasing, there are still some elderly welfare facilities that have not been able to successfully establish a sustainable relationship with the robots. This is thought to be because the relationship between humans and robots lacks development over time, and humans are entering the burnout phase, where they are not interested in agents [5]. There are also studies that points out other causes such as low intelligence, little variety of reactions in the robot and the lack of established methods to keep people interested in the robot [6,7], and designs robots or agents to keep users interested in them and build relationships with them [8][9][10]. As a solution to this problem, we intend to create a friendly and continuous relationship between a person and an agent by letting the agent generate personalized behaviors for the user.
It has been proposed that emotional properties are given to agents' speech, and it has been shown that human beings can correctly perceive the given emotions [11]. However, to achieve affable and empathetic communication for users, it is necessary for the agents to grasp the user's state in real time and to generate appropriate words and actions based on the estimated state. Many studies have implemented estimation systems that deal with human state estimation [12,13]. Research on emotion estimation, in particular, has been ongoing for a long time. However, emotion recognition only shows human emotions at a specific time, and more profound information about human's physical, mental, and social aspects cannot be extracted. The quality of life (QOL) originated in a study that was featured in a clinical evaluation by Karnofsky in the late 1940s [14], is a multidimensional indicator of the human condition, which includes not only physical but also mental and social aspects. In this study, we refer to the QOL index of the SF-36v2 health survey [15]. According to this index, QOL consists of eight subscales: Physical Functioning (PF), Role Physical (RP), Bodily Pain (BP), General Health (GH), Vitality (VT), Social Functioning (SF), Role Emotional (RE), and Mental Health (MH). In the conventional method, scores on these eight QOL scales were calculated through interviews by answering the SF-36v2, which is a selective questionnaire. This indicator has been shown to be effective in estimating depression with respect to mental health QOL scale [16]. However, a problem has been pointed out in that these methods impose a burden on those whose QOL is being measured [5]. Additionally, this question-and-answer format may create a hierarchical relationship between the questioner and the respondents. Furthermore, because only the QOL at the time of measurement can be measured in one examination, repeated examinations are required for the continuous measurement of QOL, which is inefficient and cannot be used to immediately track changes in QOL scores.
In this study, we hypothesized that it is possible to estimate the QOL by extracting information regarding QOL from users' conversations, and proposed a text-based QOL estimation method for human communication processes. By understanding which scales of QOL are related to what users are talking about from textual information and by analyzing the features extracted from the text, it is possible to estimate the score of the relevant QOL scale. Therefore, we created a QOL dictionary called SqolDic, which is based on large-scale Japanese textual data, and implemented a system to output QOL scores based on the dictionary. We demonstrated the effectiveness of the system through verification using data collected through interpersonal experiments.

QOL Estimation in HAI
QOL is different from emotion recognition in terms of that it can evaluate a person's state from multiple perspectives. Due to the growing interest on QOL, many studies in the academic community have dealt with QOL, but many of them consider the impact of their proposed method on QOL [17,18]. In the field of HAI, research studies have aimed at estimating QOL itself through interaction with agents first emerged in 2018 [19]. In this QOL estimation study, an end-to-end estimation model was proposed to estimate QOL scores from facial expressions and prosodic information. In April of 2020, it was suggested that gaze patterns will be effective for QOL estimation [20], and in September of 2020, a multi-feature vector learning-based QOL score estimation system that integrates gaze patterns with visual information on head movements and facial expressions using Open-Pose [21], one of the widely used methods for estimating human posture and recognizing facial expressions, was proposed and shown to be effective [5].

Text-Based Estimation of Human States
With respect to human state estimation in HAI, SimSensei [36] is another representative example of an interactive agent aimed at interactive state estimation using multimodal information. Although there is a growing trend to integrate interactions with communicative robots and agents into state estimation, the constraint of constant response generation patterns is one of the problems we need to solve. However, these related studies suggest that agents that build interactive relationships with people can also achieve more accurate state estimation by integrating information from multiple sensors. Therefore, we should establish an estimation method for QOL estimation systems based on not only visual information but also any modal information.
Text-based emotion estimation systems have been studied extensively because text information is significantly useful for human state estimation. Shivhare and Khethawat [37] proposed a method for emotion recognition from textual information by analyzing the intensity of each word extracted through morphological analysis, and Shaheen [38] proposed a method for emotion recognition by combining compositional analysis with syntactic and semantic analysis. For a text-based estimation of the mental state, Ren [39] proposed a system for predicting suicidal behavior by estimating changes in the mental state based on textual data from daily blogs. Furthermore, because real-time feelings are converted into text data on Twitter, a prediction model that considers social context and topical context has been constructed, and it is now possible to infer an individual's feelings from the content of tweets [29]. These studies can be combined with audio-based and visual-based depression estimation systems [40,41] to implement more robust estimation systems.
Additionally, Uchida [42] demonstrated that people tend to disclose more negative things to agents than to people. Therefore, it is possible that agents may receive additional information that would not be disclosed to a person. As a result, an agent might understand users by more accurately estimating their state based on their speech. Furthermore, it may be possible to generate personalized behaviors based on a more fully informed estimation. If a text-based estimation method is established, multimodal QOL estimation and a more accurate estimation of the human condition could become possible.
As for the agent's utterances, research on empathetic response generation has been carried out [43], and it has been emphasized in studies on counseling situations [44,45]. One of the contributions of our study is that the integration of a QOL grasping system with such a response generation system will enable agents to generate appropriate behaviors based on an accurate and pluralistic understanding of people's states.

Methods
To realize the QOL estimation and the generation of appropriate responses from the estimation results, it is necessary for an agent to extract and analyze useful information for QOL estimation from the user's text data during communication with the user. In this study, we created a QOL dictionary called SqolDic, and for its evaluation, we implemented a system for estimating the variability of QOL scores by focusing on the text information in the users' natural dialogues. The flow of our study is shown in Figure 1.
First, we created a dictionary called SqolDic, which is dedicated to QOL. Next, based on the dictionary, we implemented a text content classification system that outputs the strength of association between the user's utterances and eight QOL scales. We then implemented a system that automatically answers the questions in the QOL questionnaire SF-36v2 based on the positive-negative (PN) score, which is one of the features extracted from the text, after which we output eight QOL score variations via an existing scoring algorithm. Figure 2 shows the entire QOL score estimation system. Finally, we evaluated the effectiveness of the proposed system through validation using actual conversation data.  Text-based QOL estimation during communication. SqolDic is a QOL dictionary that we created based on large-scale textual data. PN score is a real number between −1 and 1 that indicates whether the content is positive or negative for a given sentence.

QOL Dictionary SqolDic
We developed a specialized dictionary called SqolDic for QOL content classification and score estimation from users' speech. There are no other dictionaries that are specialized for QOL other than the one proposed in this study. In this section, we describe the procedure for creating SqolDic using Word2vec [46] and Mecab [47], which is a morphological analysis system. Word2vec, which was proposed by Tomas Mikolov, is a neural network-based vectorization method for word groups. The Japanese full-text data from Wikipedia were used as a dataset to create SqolDic. In this study, the data downloaded as of October 21, 2019, were used for subsequent operations.
SqolDic was created in three steps: 1. Create a vector model of all the meanings of all the lexical terms 2.
Extract the QOL-focused terms 3.
Accumulate the words with close vector distance to QOL-focused terms As a first step, we created a data model using Gensim for the full text of Wikipedia data that were morphologically analyzed using Mecab. With the creation of this data model, a feature vector is assigned to each word. For all the remaining steps, based on the words contained in the SF-36v2 QOL questionnaire and the words contained in the description section regarding SF-36v2 on the SF-36v2 seller's website, other words and phrases strongly related to the word were searched. Table 1 shows a part of the database that was created by extracting words that form the basis of QOL. Each line is in the form of a "phrase: the number of the scale to which the phrase relates." The operations mentioned above resulted in the creation of SqolDic, a dictionary specializing in QOL, which is a collection of words and phrases related to each of the scales of QOL.

The Strength of the Connection between the Text and the Respective QOL Scale
In this section, we describe a system that classifies the content from the text data based on SqolDic and outputs the closeness of the relationship between the text data and the respective QOL scale. The following sentence is used as an example: "I went shopping with my friends today and had a great time." Because "friend" and "shopping" are words included in the social functioning QOL scale, and "fun" is a word included in the mental health QOL scale, this sentence can be judged as the most related to the social functioning scale, and the second most related to the mental health scale. Therefore, the input sentences are morphologically analyzed, and then, after referring to SqolDic, the strength of each scale is output. However, it is not practical to extract and understand the meaning of each conversation separately because the content of human conversation changes from moment to moment. This is because each sentence we emit is a component of a context, and its content changes continuously. In our experiment, if a person spoke about something related to a particular QOL scale until just before a particular statement, the content of that particular statement was recalled by the content of the previous statement, and there is always a connection between the two. Therefore, in our study, instead of focusing only on the target speech, we sought to output the strength of the connection between the text and the respective QOL scale by taking the previous speech into account.
As shown in Figure 3, we calculate the QOL relevance of the nth utterance Utterance(n)(U(n)) through weighted calculation using the past three utterances. Each weight was set to decrease as the distance from the target utterance increased. With the above settings, the QOL relevance of the nth utterance QSL(n) is expressed as Equation (1).
(1) Figure 3. Relationship between the nth utterance and the previous ones.

Estimation of Real-Time QOL Score Fluctuations
We implemented a system for answering the QOL questionnaire automatically by referring to the contents of the text and outputting the score. As an example, after determining that the target text is most relevant to the mental health QOL scale, we can assume that if the text is positive, the person's mental health is high, and if the content is negative, the person's mental health is considered to be low. Based on the hypothesis that this is the case, we used the PN score of the text to output QOL scores. The PN score is a real number between −1 and 1 that indicates whether the content is positive or negative for a given sentence. The closer the PN score was to 1, the more positive the content was, and vice versa. The PN values are generally used in the form of morphological analysis of the document to be analyzed for sentiment analysis by referring to the lists of words and their semantic orientation. This time, we used a list of semantic orientations of words created using a spin model [48]. According to the system described above, when a single sentence is entered along with its context, one of the eight QOL scales most relevant to the content is identified and output. At the same time, the PN score, which is a feature of the text, is calculated as shown in Figure 4. From the operations mentioned above, we implemented a system that automatically answers the questions on the QOL scale. The QOL questionnaire SF-36v2 was used in this study, and most of the questions had a format for selecting one option from 1 to 5. Therefore, in converting PN scores (from −1 to 1) into questionnaire options (from 1 to 5), we referred to the correspondence listed in Table 2.
The system we have described so far realizes automatic responses to the questionnaire on the corresponding QOL scale. We update the QOL score for each statement by outputting the QOL scores via the QOL scoring algorithm.   6 4 0.6 x 1.0 5

Interpersonal Experiment
To evaluate the implemented system, we conducted an interpersonal experiment to collect conversational text data to be input into the system. Four university students (age range: 20-23) participated in this experiment. Participants first received an explanation of the experiment, after which they reflected on the past week and answered the SF-36v2 QOL questionnaire. They then had a dialogue with two people regarding the events of the past week, such as what impressed them, and what they were thinking about. The subjects were not informed in advance about the estimation of QOL from their speech. After the dialogue, the verbatim transcripts were distributed to the participants and they labeled this verbatim data. The participants labeled each utterance with a QOL scale that they considered most relevant to it. We asked the participants to write all the scales in the same speech if they believed that the subject was speaking about related to more than one QOL scale. Conversely, they were allowed to write nothing if there was no QOL scale corresponding to the content of the speech. Figure 5 shows the trend of the relevance of the QOL scale output from one subject's speech data. The horizontal axis corresponds to the utterances, the number represents the number of the utterances from the beginning, and the vertical axis represents the relevance of each of the QOL scale to the content of the utterance. To calculate the accuracy of the output QOL relevance, we checked the accuracy between the QOL scale output by the system as the most strongly related and the QOL scale that the subjects labeled on each sentence. For one of the subjects, 91.2% of the subjects' speeches matched the subject's opinion with the QOL scale that our proposed system judged most relevant to them. The accuracy calculated by analyzing the textual data of all subjects is listed in Table 3.   Figure 6 shows the results of each of the eight QOL scale scores for each utterance from one of the subjects, to observe the real-time trends of each QOL scale score. For the analysis, we determined the average of the scores on each QOL scale derived from all the subjects' speech data. To evaluate the accuracy of the estimation of QOL scores from the text data, a graph showing the distribution of the error of each estimated score relative to the actual score is shown in Figure 7.  The horizontal axis represents the eight QOL scales, and the vertical axis represents the distribution of the error (absolute value) between the estimated and actual scores. The distribution of error in the estimation of the score of the physical functioning scale, which is on the leftmost side of the graph, is remarkably large. A possible solution to this problem is changing the mapping between the PN score, which is a feature obtained from the text data, and the questionnaire response options. As a result of updating the mapping, the distribution of the estimation error is shown using a graph in Figure 8.

Evaluation of QOL Estimation Based on SqolDic
In this study, we applied parameter adjustments to the physical function QOL scale to reduce estimation errors. We compared it to the chance level to verify that parameter adjustments can improve the estimation function. Figure 9 compares the physical functioning scale score's error between the actual scores and the results of the proposed method, as well as the distribution of the error between the actual scores and the scores that would be output if all the responses to the questionnaire were in the middle choices as in the chance level. The results of the correspondence t-test showed a significant difference in the reduction of the error, indicating that the estimation accuracy of the proposed method was improved. Furthermore, although previous studies have shown that estimating mental health QOL scores is the most difficult, the comparison of the results of the previous study (median: 11, first quartile: 5.6, third quartile: 26) to the QOL estimation system based on SqolDic (median: 14, first quartile: 11, third quartile: 17) showed that our proposed method reduced the distribution of error.

Discussion
In this study, we created SqolDic, which is a dictionary specific to QOL. To evaluate its effectiveness, we conducted two experiments; QOL content classification and QOL score estimation, using textual data collected from an interpersonal experiment.
In this study, a maximum accuracy of 91.2% was obtained for the content classification based on QOL. The realization of content classification means that we can understand the intentions of our conversation partners. This may help in estimating a person's state in the process of communication. Although we developed a system for the content classification of text information specifically for QOL estimation, the mapping between text contents and the eight scales that constitute QOL can be used for purposes other than QOL estimation. For example, we can estimate what topics a person is interested in and what topics he or she wants to talk about based on the content of the conversation, and this can be used as a response to help the dialogue agent decide what to talk about next.
The estimation of QOL scores based on textual data was also realized by using the correspondence between the PN scores and the questionnaire answers. In particular, the distribution of error in mental health score estimation was smaller than that in previous studies [19], which used only the time-series data of facial expressions extracted from video data. Therefore, text information may also be as useful as facial expressions for QOL estimation. With respect to the physical health scale, we also updated the mapping of PN scores to reduce the error. Additionally, a comparison with the case where the score was output without using the proposed method showed that the estimation error can be reduced by adjusting the parameters. However, we believe that we can further reduce the error by updating this correspondence using a trainer.
These results demonstrated the effectiveness of the proposed system in both QOL content classification and QOL score estimation. Therefore, it was shown that the proposed system can be used to extract high-dimensional information regarding QOL from textual data.
Because the effectiveness of the QOL content classification and the QOL score estimation was shown, we can say that the validity of the underlying QOL dictionary called SqolDic was also shown. The reason for this result is that we used a large amount of textual data, and we were able to record contextually relevant words and phrases, which allows us to cover a wide range of topics. From the experimental results, we conclude that SqolDic can be applied to QOL estimation based on text information.
One of the limitations of our study is the number of subjects. In this study, real data were used to verify the effectiveness of SqolDic, the QOL content classification system, and the QOL score estimation system. In our study, the number of subjects did not affect the QOL content classification system because the system was based on SqolDic, which we created. As for the QOL score estimation system, we improved the accuracy of the estimation of the physical functioning scale in a form suitable for the participant data. Therefore, the model for estimating scores on the physical functioning scale is specific to the textual data of these subjects. Additionally, although the speakers in the collected dialogue database did not explicitly answer the questions in the QOL questionnaire during the dialogue, they were pre-selected as users who were aware that they were participating in a study on QOL through their completion of a QOL questionnaire. Therefore, it is necessary to validate the results in everyday conversations where people are not aware of any tasks. Based on the above discussion, the next task is to verify the generalization performance of the proposed method. To achieve this, it is necessary to collect a wide range of textual data and train appropriate parameters based on that data. If we can obtain appropriate correspondence for each scale, we can further improve the accuracy. Additionally, the accuracy of the system can be further improved by combining it with the QOL estimation system based on prosodic and visual information used in previous studies.

Conclusions
In this study, we created SqolDic as a QOL dictionary that is based on large-scale Japanese textual data. We also proposed a system that outputs the conversation content and QOL scores based on the QOL by extracting features from the user's conversation using SqolDic and demonstrated its effectiveness. Our future studies will involve improving the accuracy of QOL estimation by implementing a multimodal learning estimation model that integrates all the visual, auditory, and textual information that can be collected during the interaction with the user. This method solves the problems of the inefficiency of conventional QOL measurement methods and the difficulty of establishing a continuous relationship between humans and robots due to the uniformity of the relationship. It also enables robots to use their own functions to communicate more successfully with users. Estimating the user's QOL during daily interactions with the robot is expected to be a new application of robots for dynamic and efficient QOL measurement. In addition, HAI based on QOL estimation will enable intelligent robots to estimate the state and generate user-specific behaviors, and provide a clue to the question of how to improve the quality of interaction and build dynamic, better-informed relationships between humans and agents. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.