Skip to Content
HealthcareHealthcare
  • Article
  • Open Access

11 September 2024

Estimation of the Cognitive Functioning of the Elderly by AI Agents: A Comparative Analysis of the Effects of the Psychological Burden of Intervention

,
,
and
1
Simulation of Complex Systems Laboratory, Department of Human and Engineered Environmental Studies, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 277-8563, Japan
2
AI-UX Design Research Institution, Advanced Institute of Industrial Technology, 10-40 Higashi-Oi 1-Chome, Shinagawa, Tokyo 140-0011, Japan
3
Institute of Gerontology (IOG), The University of Tokyo, Tokyo 113-8656, Japan
4
Institute for Future Initiatives (IFI), The University of Tokyo, Tokyo 113-0033, Japan

Abstract

In recent years, an increasing number of studies have begun to use conversational data in spontaneous speech to estimate cognitive function in older people. The targets of spontaneous speech with older people used to be physicians and licensed psychologists, but it is now possible to have conversations with fully automatic AI agents. However, it has not yet been clarified what difference there is in conversational communication with older people when the examiner is a human or an AI agent. This study explored the psychological burden experienced by elderly participants during cognitive function assessments, comparing interactions with human and AI conversational partners. Thirty-four participants, averaging 78.71 years of age, were evaluated using the Mini-Mental State Examination (MMSE), the Visual Analogue Scale (VAS), and the State-Trait Anxiety Inventory (STAI). The objective was to assess the psychological impact of different conversational formats on the participants. The results indicated that the mental strain, as measured by VAS and STAI scores, was significantly higher during the MMSE sessions compared to other conversational interactions (p < 0.01). Notably, there was no significant difference in the mental burden between conversations with humans and AI agents, suggesting that AI-based systems could be as effective as human interaction in cognitive assessments.

1. Introduction

Life expectancy is increasing globally. According to the World Health Statistics 2023 of the World Health Organization (WHO), the global average life expectancy for men and women was 73.3 years in 2019 [1]. Advances in medical technology and improvements in lifestyle led to an increase in the global average life expectancy from 67.9 years in the early 1990s to 73.4 years in 2021, an increase of 5.5 years (about 6%) over 20 years. By 2048, life expectancy is expected to reach 77.0 years globally and 82.4 years in the Western Pacific region, including Japan.
The rise in global life expectancy has also created new challenges for aging societies. The gap between life expectancy and healthy life expectancy is a more pressing issue in aging societies, as the increase in healthy life expectancy has lagged behind the rise in life expectancy. The WHO notes that while life expectancy has increased due to reduced mortality, it does not equate to a decrease in the years spent living with disability. In Japan, the Ministry of Health, Labour, and Welfare (MHLW) stresses the need to extend healthy life expectancy, which is the duration during which people can maintain essential functions for social life and live without daily restrictions due to health issues [2]. Among age-related disease measures, efforts to combat cognitive impairment have garnered particular attention. Around 50 million people worldwide suffer from cognitive impairment, and this number is projected to rise to approximately 152 million by 2050 [3]. Cognitive impairment leads to a loss of independence, an increased caregiving burden, and heightened economic costs. In Japan, the social cost of cognitive impairment was significant, amounting to 14.263 trillion yen in 2014 [4]. Early detection and treatment promotion will optimize the use of limited financial resources.
According to the “Guidelines for the Diagnosis and Treatment of Dementia” (Japan Neurological Society) [5], cognitive dysfunction is defined as “a generalized decline in intellectual functions once acquired due to acquired brain dysfunction that interferes with social and daily life, occurring in the absence of disturbances in consciousness”. Intellectual functions include memory, orientation, language, recognition, calculation, thinking, motivation, and judgment, which are crucial for planning and performing daily activities such as cleaning, doing laundry, eating, and going out. As a result, cognitive dysfunction may cause difficulties in daily planning and activities, posing serious challenges to daily life [6,7]. Impaired executive function due to cognitive dysfunction may hinder self-care, potentially compromising physical health. Moreover, certain diseases, such as diabetes, are linked to cognitive dysfunction [8,9]. Additionally, cognitive dysfunction may prolong hospitalization [10,11]. The condition brings diverse effects, including reduced patient quality of life, longer hospital stays, higher mortality, and elevated social costs. Effective treatments for cognitive dysfunction have not been established. However, depending on the cause and condition, appropriate support and treatment can delay disease progression. Thus, early detection and intervention are crucial.
Many cognitive function tests to diagnose cognitive dysfunction raise issues related to the testing time, invasiveness, psychological burden on patients, and cost of necessary equipment and facilities. CT and MRI tests can be performed only in large hospitals that have such facilities and are likely to cause a testing time and cost burden and invasiveness to patients. In neuropsychological testing, while some tests can be performed in a relatively short time, they must be performed by a clinical psychologist with specialized knowledge, and the results may be affected by the patient’s physical condition. The HDS-R (Hasegawa’s Dementia Scale-Revised) [12] and FAST (Functional Assessment Staging of Alzheimer’s Disease) [13] are examples of cognitive function assessment scales. However, the HDS-R requires careful consideration because differences in social backgrounds, such as the years of education and living environments of the elderly, may have emerged between the past and present, potentially affecting test performance [14]. Additionally, FAST is not an interactive test but an assessment based on interviews with family members and caregivers. On the other hand, MMSE [15] has been used across a variety of patients, including cancer patients and stroke survivors [16,17,18], as it is well-accepted by medical professionals working with patients with cognitive impairment. MMSE has a 70% recognition rate in the field of geriatric psychiatry in Japan [19]. Although MMSE scores have often been reported to be influenced by the number of years of education of the examinees [20], these studies were all conducted outside of Japan. Current reports both in Japan and abroad indicate that MMSE scores are not significantly influenced by the patient’s years of education [14,21,22].
In cognitive function testing, it is essential to reduce the burden on the elderly and perform the tests at a low cost. The use of AI in clinical settings to support the elderly is actively studied and discussed today, and its usefulness and challenges are being explored.
In this study, we overviewed the early detection of cognitive impairment using previous studies with an overview of cognitive testing methods. We also summarized some previous studies using virtual agents. Then, we analyzed the psychological burden on the elderly when the Mini-Mental State Examination (MMSE) was administered to 34 elderly subjects by a human examiner or an AI agent as an administrator.
This study may provide a new perspective on the problems that many conventional cognitive function tests for diagnosing cognitive dysfunction have regarding the testing time, invasiveness, patient psychological burden, and cost of necessary equipment and facilities.

3. Methods

In this study, ECG measurements were taken under three different situations: during MMSE, during a daily conversation with a human, and during a daily conversation via AI. The experiment was conducted at the Yagawa Day Service Center and the Kunitachi Silver Human Resource Center in Kunitachi City, Tokyo. Nineteen participants participated at each facility, with either a human or AI acting as the questioner. The impact of different dialogue partners on the psychological burden of the participants was compared.

3.1. Participants

A total of 34 elderly persons (12 men and 22 women) were included in the study; they were elderly persons living in the community who were not in institutions, etc. Of the 34, 17 were day service users and 17 belonged to the Silver Human Resource Center in the same area. To ensure that the results were not affected by dialect, we selected those residing in Tokyo, Japan. The number of participants was limited to those who did not have any disease that would affect their listening or speech in general conversation, such as hearing loss, for example. Initially, 19 participants were included in each group, but 2 participants from each group died or were transferred to other hospitals during the study period and therefore could not be followed up. The participants were those who consented to the purpose of the study. For elderly patients with cognitive decline, their families were informed and gave consent to the study.

3.2. Obtained Data

In this study, several rating scales and measures were used to assess differences in the impact on cognitive function and mental strain among older adults. The Mini-Mental State Examination (MMSE) [15] was used to measure cognitive function. This test takes 6–10 min to complete, consists of 11 items, and assesses orientation, memory, calculation, language, and visuospatial abilities for a total score of 30 points; a score of 23 or less indicates a suspicion of dementia, and a score of 27 or less indicates mild cognitive impairment (MCI).
In Japan, a Japanese version (MMSE-J) has been created by several translators and is frequently used as an auxiliary tool for the diagnosis of cognitive impairment. In this study, we decided to adopt the internationally used MMSE and used the Sugimoto translation of the MMSE-J [17].
The State-Trait Anxiety Inventory (STAI) [56] was used to measure the participants’ anxiety levels, and a translated version of the new STAI [57] was used. Participants respond to each question on a four-point scale ranging from “not at all” to “very much”. The test item measures two aspects: state anxiety, which assesses transient situational reactions to how one is feeling at the moment, and trait anxiety, which assesses the tendency to react based on general emotions. The Visual Analogue Scale (VAS) [58] was used to measure the degree of mental burden in each task; the VAS is a visual rating scale and requires participants to mark the intensity of the pain or burden that they experience. For example, for pain, participants mark on a 10 cm line, with the left end representing “no pain at all” (0) and the right end representing “most pain” (100).
To consider the impact of depression on conversation, the Geriatric Depression Scale (GDS-15) [59] was used; the GDS-15 consists of 15 questions to be answered with “yes” or “no”, with a score of 5 or higher indicating a depressive tendency and a score of 10 or higher indicating depression. In addition, to collect and examine physical parameters, changes in heart rate and pulse were recorded using a Holter electrocardiograph. This allows for the analysis of levels of excitement and stress. In this study, participants wore a DigitalWalk-700 Holter electrocardiograph to collect ECG data.

3.3. Display Environment for AI Agents

Several modules were used to implement the virtual agent, an avatar with AI dialogue capabilities. First, since the author conducted interpersonal conversations, the AI avatar’s design was made to resemble the author’s appearance (Figure 1). To achieve a design similar to the author, VRoid Studio (https://vroid.com) [60] was used for modeling. This was to minimize the impact of the avatar’s appearance on the test results, as using an avatar that looks significantly different from the person the participants assumed would be speaking could influence the results, due to individual differences in impression.
Figure 1. (a) the appearance of the author who conducted the interpersonal conversation and (b) the appearance of the AI agent, modeled based on the author’s appearance.
To display the designed model in a browser, we used @pixiv/three-vrm [61]. @pixiv/three-vrm is a library for loading and displaying VRM, a format for handling humanoid 3D avatar model data using three.js, on a browser, and is open-sourced by Pixiv. The AI agent was displayed on the screen of a MacBook Pro. Additionally, the background of the character incorporated the background of the room in the facility where the dialogue was taking place, creating an environment where the user felt like they were conversing with a real person.
For generating the AI agent’s speech in daily conversations, we used the Koeiro API [62]. We designed the AI agent’s mouth to move in sync with the speech using lip-sync functionality. For recognizing the user’s responses during the conversation, we employed the Web Speech API (Speech Recognition) [63]. Since speech recognition can be difficult when the timing of the agent’s speech overlaps with the participant’s, we implemented a system where the participant wears a pin microphone, and the system only recognizes speech while a button is pressed.

3.4. Protocols in Conversation Design

The conversation design protocol was structured using a semi-structured interview technique with fixed question content. In semi-structured interviews, the examiner asks questions while adjusting their responses based on the user’s answers. If the user does not receive any reaction to their answers, there is a risk that the quality of their responses may decrease, or they may disengage from the conversation. Therefore, the conversation protocol was designed so that the AI agent always responds to the user’s answer with a related response before moving on to the next question. The implementation of dialogues, including responses corresponding to the user’s answers, was carried out using the ChatGPT API [64] (Figure 2 and Figure 3).
Figure 2. Protocol of daily conversation system for cognitive function estimation by AI agents.
Figure 3. Configuration diagram showing the entire system.
The user was seated in front of a PC displaying the AI agent and equipped with a pin microphone for speech recognition. Additionally, the design called for the conversation to begin with the user wearing a Holter electrocardiograph.

4. Evaluation

4.1. Implementation Procedure

In this study, an experimental approach was used to evaluate the psychological burden of tests on participants and its effect on their cognitive function. The purpose of the experiment was to visualize the psychological burden through changes in participants self-administered psychological tests and electrocardiograms under different interactive environments. This study was conducted with the approval of the ethical review boards of the University of Tokyo and the Tokyo Metropolitan Institute of Technology.

4.2. Hearing Items in Daily Conversation

An important previous study by Igarashi et al. on the estimation methods of cognitive functions [65] is important. In that study, generalized interview items were developed to estimate cognitive function through daily conversation. From intake interviews conducted by psychologists at hospitals, the questionnaire items were designed to ascertain patients’ abilities related to family history, physical condition, interests and concerns, ways of spending the day, memory, time-registration, and place-registration, and as a validation, practical supervision was also conducted by five licensed psychologists working at the hospitals. These were refined, and a total of 30 questions were designed as interview items for the present study, grouped into six categories: time-registration, place-registration, family history, how they spend their day, physical condition, and interests/concerns (Table 1).
Table 1. Daily conversation used in interpersonal and agent conversations, consisting of 30 questions in 5 areas.

4.3. Data Acquisition Flow

Each participant underwent three tests: a cognitive function test using the Mini-Mental State Examination (MMSE), a daily conversation with a human, and a daily conversation via AI. First, the STAI (State-Trait Anxiety Inventory) was used to measure the participants’ current and usual levels of anxiety. Then, the VAS (Visual Analogue Scale) was used to assess the participants’ self-evaluation of “how much mental strain they are currently feeling”.
Before each session of the experiment, participants were instructed to rest with their eyes open for 5 min to ensure that their heart rate was stable. Then, participants engaged in a daily conversation for 30 min. After the conversation, the mental strain during the session was again assessed using the STAI and VAS. Next, a 15-min cognitive function test using the MMSE was administered, and the STAI and VAS were used to assess the participant’s mental strain. To avoid the order of questions in the daily conversation affecting the participant’s response performance, all the questions were asked in a consistent order. Interaction tests with the AI agents were conducted at least one month after the in-person interaction tests.

5. Results

A total of 34 elderly persons (12 males and 22 females) were included in the study, 17 of whom were day service users and 17 of whom belonged to the Silver Human Resource Center in the same area. Initially, 19 participants were included in each group, but 2 participants from each group died or were transferred to other hospitals during the study period and therefore could not be followed up. The participants were those who consented to the purpose of the study. For elderly patients with cognitive decline, their families were informed and gave consent to the study.
The mean age was 78.71 years (SD = 6.77). The mean score of the MMSE (Mini-Mental State Examination) was 21.09 (SD = 8.26), and the mean score of the GDS (Geriatric Depression Scale) was 2.48 (SD = 2.45). The mean heart rate during each test was 75.3 (SD = 10.44) after the daily conversation with a human, and 74.8 (SD = 11.41) after the test with an AI agent. The mean heart rate after the MMSE was 74.8 (SD = 11.13).
The mean Visual Analogue Scale (VAS) score at the beginning of the experiment was 5.50 (SD = 10.56), the mean VAS score after the daily conversation with a human was 7.15 (SD = 12.32), and the mean VAS score after the test with the AI agent was 9.76 (SD = 14.42). The mean VAS score after the MMSE was 42.88 (SD = 29.13). The mean current state anxiety score measured by the STAI (State-Trait Anxiety Inventory) was 29.41 (SD = 5.45), the mean STAI score after the daily conversation with a human was 28.26 (SD = 5.98), and the mean score after implementation with the AI agent was 28.85 (SD = 5.71). The mean STAI score after MMSE implementation was 48.85 (SD = 13.85) (Table 2, Figure 4).
Table 2. Comparison of differences in means by VAS and STAI.
Figure 4. Comparison of differences in means by VAS and STAI. **: Significant differences were found at the 1% level.
As a supplementary note, it should be mentioned that during the acquisition of ECG data, there were instances where participants with cognitive decline made contact with the ECG equipment during the conversational experiments. The equipment used was a Holter electrocardiograph, prioritizing the minimization of the burden on the participants due to its lightweight and portable nature. However, it should be noted that while this device reduces the burden on participants, it may result in slightly less stable data compared to standard equipment. To avoid psychological effects caused by an unfamiliar environment, the study was conducted in familiar surroundings rather than requiring participants to visit a hospital. This approach was achieved by using portable equipment, allowing the study to take place in a room within a familiar facility.

6. Analysis

6.1. Analysis of VAS

The Visual Analogue Scale (VAS) was used to assess the mental burden of the participants. This is a visual analogue scale that participants use to mark the intensity of pain or burden they feel. In this study, participants were asked to indicate a point on a line where the left end represents “no burden at all” (0) and the right end represents “maximum burden” (100), corresponding to their current mental burden. This method allowed the visualization and quantification of the burden experienced by participants in each section. Friedman tests were conducted in Microsoft Excel (Version 2404) on the VAS scores for 34 participants in the pre-interview condition, after the in-person conversation, after the conversation with the AI agent, and after the MMSE. The p-value was 1.5 × 10−11, indicating that the VAS scores changed with the test administration. The two post-hoc groups were tested using Bonferroni’s adjustment for multiple comparisons. Significant differences at the p < 0.01 level were observed for the pre-interview condition, post-interpersonal conversation, and post-conversation with the AI agent conditions, all compared to the condition after the MMSE.

6.2. Analysis of the STAI

The State-Trait Anxiety Inventory (STAI) is designed with a correlation of about 0.27 between trait anxiety and state anxiety. As in the VAS, a Friedman test was conducted in Microsoft Excel (Version 2404) to assess state anxiety in the moment. The p-value was 2.02 × 10−10, indicating that the STAI scores changed with test administration. As with the VAS, the post-hoc, two-group test was conducted with Bonferroni’s adjustment for multiple comparisons. Only after the MMSE were there significant differences at the p < 0.01 level for the pre-interview state, post-interpersonal conversation, and post-conversation with the AI agent. Correlation coefficients were calculated for state anxiety at the beginning of the experiment, state anxiety after the in-person conversation, state anxiety after the conversation with the AI agent, and state anxiety after the MMSE (Table 3).
Table 3. Comparison of correlation coefficients between state anxiety before the experiment, after in-person conversation, after AI conversation, and after MMSE implementation.
Scores slightly decreased after the in-person conversation and after the conversation with the AI agent. On the other hand, the mean increased after the MMSE. There was a very weak correlation between pre-experimental state anxiety and state anxiety after in-person and AI agent conversations, respectively (Rr = 0.25, R = 0.28). No correlation was found between pre-experimental state anxiety and post-MMSE state anxiety (R = 0.08) (Figure 5 and Figure 6). To examine whether there was a correlation between the state anxiety score after the MMSE and the MMSE score, the correlation coefficient was tested, revealing a weak correlation (r = −0.41) (Figure 7).
Figure 5. Correlation between current state anxiety and post-interpersonal conversation state anxiety (n = 34). The X-axis represents current state anxiety, while the Y-axis represents state anxiety after interpersonal conversation. Each point indicates individual participant data, and the dashed line shows the correlation between the two variables. The upward slope of the dashed trendline suggests that higher current state anxiety is associated with higher state anxiety after interpersonal conversation.
Figure 6. Correlation between current state anxiety and state anxiety after AI conversation (n = 34). The X-axis represents current state anxiety, while the Y-axis represents state anxiety after AI conversation. Each point shows individual participant data, and the dashed line indicates the correlation between the two variables. The upward trendline suggests that higher current state anxiety is associated with higher state anxiety after AI conversation.
Figure 7. Correlation between post-MMSE state anxiety and MMSE scores (n = 34). The X-axis represents state anxiety after MMSE, while the Y-axis represents MMSE score. Each point shows individual participant data, and the dashed line indicates the correlation between the two variables. The downward trendline suggests that higher state anxiety after MMSE is associated with lower MMSE scores. The consistent direction of the points also indicates a potential negative correlation.

6.3. Analysis of ECG

In this study, we prioritized minimizing the burden on participants and used a Holter electrocardiograph. Bradycardia is generally defined as a resting heart rate of fewer than 60 beats per minute in adults [66], and some studies use a threshold of 50 beats per minute, considering the decrease in heart rate due to aging [67]. In Japan, the guidelines of the Japanese Circulation Society state that a heart rate of 40 beats per minute or less may warrant consideration of a pacemaker [68]. Based on these criteria, data indicating a heart rate of fewer than 40 beats per minute were considered outliers. The analysis was conducted using raw data with outliers excluded, and the average values for each session were analyzed.
To examine whether there is a correlation between heart rate during the MMSE and MMSE scores, we tested the correlation coefficient and found a weak correlation (r = 0.34), as shown in Figure 8.
Figure 8. Correlation between ECG during MMSE and MMSE scores (n = 34). The X-axis represents ECG data during MMSE, while the Y-axis represents MMSE score. Each point shows individual participant data, and the dashed line indicates the correlation between the two variables. The upward trendline suggests that higher ECG data during MMSE is associated with higher MMSE scores.
There was also little correlation between the ECG data during the interpersonal conversation and the VAS and STAI scores after the interpersonal conversation (VAS r = 0.24, STAI r = 0.21) (Table 4). Similarly, there was little correlation between the ECG data during the conversation with the AI and the VAS and STAI scores after the conversation with the AI (VAS r = 0.18, STAI r = −0.07) (Table 5).
Table 4. Comparison of correlation coefficients between ECG data during interpersonal conversation and VAS and STAI after interpersonal conversation.
Table 5. Comparison of correlation coefficients between ECG data during AI conversation and VAS and STAI after AI conversation.

6.4. Relationship between Personality Traits and Each Assessment Item

The Japanese version of the Ten Item Personality Inventory (TIPI-J) [69] was used to measure the participants’ personality traits. The correlation coefficients between extraversion and VAS (r = −0.06) and STAI (r = −0.32) after conversing with a human showed no significant correlations. Similarly, to examine whether extraversion was correlated with VAS and STAI after the AI conversation, we tested the correlation coefficients and found no correlation between extraversion and either VAS (r = −0.21) or STAI (r = −0.28) after AI conversation. Next, to examine whether there was a correlation between extraversion measured by the TIPI-J and trait anxiety measured by the STAI, we tested the correlation coefficient and found a weak negative correlation (r = −0.39) (Figure 9).
Figure 9. Correlation between extraversion and trait anxiety (n = 34). The X-axis represents extraversion, while the Y-axis represents disturbing character. Each point shows individual participant data, and the dashed line indicates the correlation between the two variables. The downward trendline suggests that higher extraversion is associated with lower disturbing character ratings.

7. Discussion

7.1. VAS and STAI Scores

Analyses of the VAS and STAI scores revealed that the mental burden of the MMSE was significantly higher than that of the other sessions. However, no significant differences were found between the baseline condition and the in-person and AI conversations. It was confirmed that cognitive function estimation based on spontaneous speech, as in previous studies, is preferable to formal evaluation tests such as the MMSE in order to reduce the psychological burden on the participants.
In communication with chatbots that provide emotional support, whether the message sender is human or a chatbot has been found to be a significant factor, with studies indicating that messages from humans are more effective [70]. However, in this study, it was demonstrated that the psychological burden experienced by participants was similar whether they were interacting with the AI agent system developed for this research or conversing with real humans. In the future, it may be possible to easily check cognitive function in the homes of elderly individuals without visiting facilities.

7.2. Analysis of STAI

In order to examine whether there is a correlation between trait anxiety and current state anxiety, state anxiety after an interpersonal conversation, state anxiety after an AI conversation, or state anxiety after MMSE administration, the correlation coefficient was tested. A weak correlation was found between trait anxiety scores and current state anxiety scores (r = 0.47). A very weak correlation was also found between trait anxiety and scores after an interpersonal conversation (r = 0.35), and between trait anxiety and scores after an AI conversation (r = 0.36). No correlation was found between trait anxiety and post-MMSE scores (r = 0.00).
To examine whether there was a correlation between current state anxiety and state anxiety after the in-person conversation, state anxiety after the AI conversation, and state anxiety after the MMSE, we tested the correlation coefficient. We found a very weak correlation between current state anxiety and scores after the in-person conversation (r = 0.25), as well as a very weak correlation between current state anxiety and scores after the AI conversation (r = 0.28). No correlation was found between trait anxiety and post-MMSE scores (r = 0.08).
A high level of current state anxiety or trait anxiety was associated with a similarly high level of the other. At the same time, neither current state anxiety nor trait anxiety was correlated with post-MMSE state anxiety. This suggests that neither state anxiety in the pre-MMSE situation nor anxiety as a relatively constant personality trait may have an effect on the STAI scores after the MMSE. On the other hand, since both current trait anxiety and current state anxiety showed a very weak correlation between state anxiety scores after the in-person conversation and after the AI conversation, anxiety as a personality trait and state anxiety in the pre-implementation situation may influence the degree of anxiety caused by the in-person conversation and AI conversation.
In addition, since there was a correlation between trait anxiety and current state anxiety, it is thought that when anxiety as a personality trait is high, anxiety under the situation is also high. Given this, an approach that reduces state anxiety before the MMSE may not necessarily reduce trait anxiety. In order to examine whether there is a correlation between the state anxiety score after the MMSE and the MMSE score, a correlation coefficient test revealed a weak negative correlation (r = −0.41). This suggests that the higher the state anxiety score after the MMSE, the lower the MMSE score. At the same time, it can also be interpreted that individuals with lower cognitive function tend to feel a greater psychological burden from being tested for cognitive function.

7.3. Personality Traits

To examine whether extraversion is correlated with trait anxiety, we tested the correlation coefficient and found a weak negative correlation (r = −0.39). This indicated that the lower the level of extraversion, the higher the level of trait anxiety. It is also reported that extraversion tends to be outwardly oriented in interests and that it favors socializing with a wide range of people and engaging in cheerful banter with fluent eloquence and skillful wit [69]. There was no correlation between extraversion and either VAS (r = −0.06) or STAI (r = −0.32) after in-person conversation, nor between VAS (r = −0.21) or STAI (r = −0.28) after AI conversation. Mou et al. (2017) pointed out that when interacting with AI, human users exhibit lower levels of openness, agreeableness, extraversion, conscientiousness, and self-disclosure compared to conversations with humans, indicating different personality traits and communication attributes in human–AI interactions [51]. However, this study suggested that extraversion might not be affected by whether the conversational partner is a human or an AI in everyday conversational situations.

7.4. Future Work

Further research is needed to determine whether the situation that was not significant in this analysis also holds for other age groups besides the elderly. The content of the responses of elderly participants may vary depending on whether the questioner is an AI agent or a live person (e.g., length and qualitative aspects of the responses). Further analysis of the transcribed text data is necessary to gain insights into the trends and characteristics of the responses.
In this study, no significant differences were found in the MMSE and STAI results. Therefore, it will be necessary in the future to conduct more detailed analyses, such as analyzing voice data obtained from speech and conducting group work after the experiment to gather feedback.
AI agents have the potential to assess cognitive function and levels of the Instrumental Activities of Daily Living (IADL) through dialogue with elderly individuals using natural language processing technologies. IADL refers to the activities necessary for an individual to live independently, and it is particularly essential for measuring the level of independence in daily life among the elderly. This method could provide a non-invasive and efficient alternative to traditional face-to-face assessments, especially in situations where visits are difficult.
Furthermore, there is potential for designing personalized care plans for each elderly individual based on data collected by AI agents. The International Classification of Functioning, Disability, and Health (ICF) provides a comprehensive framework for evaluating an individual’s health status and the associated levels of disability and social participation. It is expected to play a crucial role in assessing the overall picture of a person’s abilities and disabilities, especially in cognitive function evaluations for the elderly. Such an approach could not only improve the quality of life for elderly individuals who need support to continue living at home but also reduce the burden on caregivers and healthcare professionals.

8. Limitations

The dialogue engine used in this study sometimes stopped expanding on certain words, such as war, when it deemed them inappropriate. The challenge is to enable dialogue even with such word choices. Specifically, the engine could not provide appropriate responses to specific words, such as news about a murder case. In the future, it will be necessary to address this issue by improving the engine so that it can continue dialogue even when specific words are included.
In this study, the subjects were limited to elderly persons living in the community, and the conversation style was semi-structured interviews. Different results could be obtained with the elderly residing in nursing homes or hospitals or with a different conversational style. In Japan, it is difficult to select elderly people with cognitive decline as research subjects, and the sample size was small for this experiment. However, verification by prototype is necessary under such circumstances, and therein lies the contribution of this study. In the future, a follow-up study with a larger sample size is needed, using this study as a starting point.

9. Conclusions

This study investigated the psychological burden of various conversational formats, including interaction with an AI agent, on elderly participants during cognitive function assessment. Thirty-four participants (12 males and 22 females) with a mean age of 78.71 years were evaluated using the Mini-Mental State Examination (MMSE), the Visual Analogue Scale (VAS), and the State-Trait Anxiety Inventory (STAI). The study aimed to compare the psychological impact of conversational interaction (with both humans and AI agents) and cognitive testing on the participants. The results showed that mental strain, as measured by VAS and STAI scores, was significantly higher during the MMSE sessions compared to the other conversational formats (p < 0.01).
Interestingly, there was no significant difference in the mental burden between conversations with humans and AI agents. This suggests that AI-based conversational systems may be as effective as human interaction in cognitive assessments, potentially providing a less burdensome alternative for the elderly in various settings, including home environments. This finding underscores the importance of considering the psychological burden of cognitive assessment and highlights the potential of AI to reduce this burden.

Author Contributions

Conceptualization, T.I.; methodology, T.I.; software, T.I.; validation, T.I.; data curation, T.I.; writing—original draft preparation, T.I.; writing—review and editing, T.I.; visualization, T.I.; supervision, K.I., K.N. and Y.C.; project administration, K.I., K.N. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted with the approval of the ethical review boards of the Tokyo Metropolitan Institute of Technology (code: 23009, 19 October 2023).

Data Availability Statement

Based on the requirements for the ethical review and the protocols outlined by our University for storing and sharing data, our data, which includes information on dementia patients, will be disclosed upon reasonable request.

Acknowledgments

This study was conducted with the cooperation of the Silver Human Resources Center. We would like to thank all the study participants. All individuals listed in this section agree to this statement.

Conflicts of Interest

Author “Kunio Nitta” was employed by the company “Tsukushikai Medical Corporation”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. World Health Organization (Ed.) World Health Statistics 2023; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
  2. Tokyo Metropolitan Government Health Promotion Plan 21, 3rd ed.; Bureau of Health and Medical Care, Tokyo Metropolitan Government: Tokyo, Japan, 2023.
  3. Patterson, C. World Alzheimer Report 2018; Alzheimer’s Disease International (ADI): London, UK, 2018. [Google Scholar]
  4. Sado, M. A Study on the Economic Impact of Dementia in Japan. FY 2014 General and Collaborative Research Report, FY 2014 Health and Labour Sciences Research Grant (Comprehensive Research Project on Dementia Measures), Keio University Stress Research Center. Available online: https://mhlw-grants.niph.go.jp/project/24159 (accessed on 26 April 2024).
  5. The Japan Neurological Society. Guidelines for the Treatment of Dementia; The Japan Neurological Society: Tokyo, Japan, 2010. [Google Scholar]
  6. Voss, S.E.; Roger, A.B. Executive function: The core feature of dementia? Dement. Geriatr. Cogn. Disord. 2004, 18, 207–216. [Google Scholar] [CrossRef] [PubMed]
  7. Stopford, C.L.; Thompson, J.C.; Neary, D.; Richardson, A.M.; Snowden, J.S. Working memory, attention, and executive function in Alzheimer’s disease and frontotemporal dementia. Cortex 2012, 48, 429–446. [Google Scholar] [CrossRef] [PubMed]
  8. Tran, D.; Baxter, J.; Hamman, R.F.; Grigsby, J. Impairment of executive cognitive control in type 2 diabetes, and its effects on health-related behavior and use of health services. J. Behav. Med. 2014, 37, 414–422. [Google Scholar] [CrossRef] [PubMed]
  9. Liao, K.M.; Lin, T.C.; Li, C.Y.; Yang, Y.H.K. Dementia increases severe sepsis and mortality in hospitalized patients with chronic obstructive pulmonary disease. Medicine 2015, 94, e967. [Google Scholar] [CrossRef] [PubMed]
  10. Wancata, J.; Windhaber, J.; Krautgartner, M.; Alexandrowicz, R. The consequences of non-cognitive symptoms of dementia in medical hospital departments. Int. J. Psychiatry Med. 2003, 33, 257–271. [Google Scholar] [CrossRef]
  11. Saravay, S.M.; Kaplowitz, M.; Kurek, J.; Zeman, D.; Pollack, S.; Novik, S.; Knowlton, S.; Brendel, M.; Hoffman, L. How do delirium and dementia increase length of stay of elderly general medical inpatients? Psychosomatics 2004, 45, 235–242. [Google Scholar] [CrossRef]
  12. Kato, S. Creation of the Revised Hasegawa Simple Intelligence Evaluation Scale (HDS-R). J. Geriatr. Psychiatry 1991, 2, 1339–1347. [Google Scholar]
  13. Sclan, S.G.; Barry, R. Functional assessment staging (FAST) in Alzheimer’s disease: Reliability, validity, and ordinality. Int. Psychogeriatr. 1992, 4, 55–69. [Google Scholar] [CrossRef]
  14. Takiura, T. Dementia Screening Tests. Hiroshima Shudo Univ. J. Humanit. 2007, 48, 347–379. [Google Scholar]
  15. Ylikoski, R.; Erkinjuntti, T.; Sulkava, R.; Juva, K.; Tilvis, R.; Valvanne, J. Correction for age, education and other demographic variables in the use of the Mini Mental State Examination in Finland. Acta Neurol. Scand. 1992, 85, 391–396. [Google Scholar] [CrossRef]
  16. Arevalo-Rodriguez, I.; Smailagic, N.; Roqué i Figuls, M.; Ciapponi, A.; Sanchez-Perez, E.; Giannakou, A.; Pedraza, O.L.; Cosp, X.B.; Cullum, S. Mini-Mental State Examination (MMSE) for the detection of Alzheimer’s disease and other dementias in people with mild cognitive impairment (MCI). Cochrane Database Syst. Rev. 2015, 3. [Google Scholar]
  17. Mystakidou, K.; Tsilika, E.; Parpa, E.; Galanos, A.; Vlahos, L. Brief cognitive assessment of cancer patients: Evaluation of the Mini-Mental State Examination (MMSE) psychometric properties. Psycho-Oncol. J. Psychol. Soc. Behav. Dimens. Cancer 2007, 16, 352–357. [Google Scholar] [CrossRef] [PubMed]
  18. Bour, A.; Rasquin, S.; Boreas, A.; Limburg, M.; Verhey, F. How predictive is the MMSE for cognitive performance after stroke? J. Neurol. 2010, 257, 630–637. [Google Scholar] [CrossRef]
  19. Toba, K. Comprehensive Functional Assessment of Dementia. Geriatr. Med. 2002, 40, 111–116. [Google Scholar]
  20. Piccinin, A.M.; Muniz-Terrera, G.; Clouston, S.; Reynolds, C.A.; Thorvaldsson, V.; Deary, I.J.; Deeg, D.J.; Johansson, B.; Mackinnon, A.; Spiro, A., III. Coordinated analysis of age, sex, and education effects on change in MMSE scores. J. Gerontol. Ser. B Psychol. Sci. Soc. Sci. 2013, 68, 374–390. [Google Scholar] [CrossRef]
  21. Harada, H.; Notoya, A.; Nakanishi, M.; Fujiwara, N.; Inoue, K. Measurement Values of Neuropsychological Tests in Healthy Elderly People: The Influence of Age and Years of Education. High. Brain Funct. Res. (Former. Aphasia Res.) 2006, 26, 16–24. [Google Scholar]
  22. Sugishita, M.; Koshizuka, Y.; Sudou, S.; Sugishita, K. The Validity and Reliability of the Japanese Version of the Mini-Mental State Examination (MMSE-J) with the original procedure of the Attention and Calculation Task (2001). Cogn. Neurosci. 2018, 20, 91–110. [Google Scholar]
  23. Hurd, M.D.; Martorell, P.; Delavande, A.; Mullen, K.J.; Langa, K.M. Monetary costs of dementia in the United States. N. Engl. J. Med. 2013, 368, 1326–1334. [Google Scholar] [CrossRef]
  24. Connolly, S.; Gillespie, P.; O’Shea, E.; Cahill, S.; Pierce, M. Estimating the economic and social costs of dementia in Ireland. Dementia 2014, 13, 5–22. [Google Scholar] [CrossRef]
  25. Wimo, A.; Jönsson, L.; Bond, J.; Prince, M.; Winblad, B.; International, A.D. The worldwide economic impact of dementia 2010. Alzheimer’s Dement. 2013, 9, 1–11.e3. [Google Scholar] [CrossRef]
  26. Burlá, C.; Guilhermina, R.; Rui, N. Alzheimer, dementia and the living will: A proposal. Med. Health Care Philos. 2014, 17, 389–395. [Google Scholar] [CrossRef] [PubMed]
  27. Milne, A. Dementia screening and early diagnosis: The case for and against. Health Risk Soc. 2010, 12, 65–76. [Google Scholar] [CrossRef]
  28. Jagust, W. Positron emission tomography and magnetic resonance imaging in the diagnosis and prediction of dementia. Alzheimer’s Dement. 2006, 2, 36–42. [Google Scholar] [CrossRef] [PubMed]
  29. Kawano, K. The Utility of the Clock Drawing Test (CDT) in Clinical Dementia Practice. J. Biomed. Fuzzy Syst. 2004, 6, 69–79. [Google Scholar]
  30. Miller, L.A.; Daniels, R.A. Psychological Assessment and Testing. Int. Handb. Psychol. Learn. Teach. 2020, 1–34. [Google Scholar]
  31. American Psychological Association. APA GUIDELINES for Psychological Assessment and Evaluation; American Psychological Association: Washington, DC, USA, 2020. [Google Scholar]
  32. Kawatani, M. Neuropsychological Tests. Med. Exam. 2017, 66, 11–21. [Google Scholar]
  33. Tiberti, C.; Sabe, L.; Kuzis, G.; Cuerva, A.G.; Leiguarda, R.; Starkstein, S.E. Prevalence and correlates of the catastrophic reaction in Alzheimer’s disease. Neurology 1998, 50, 546–548. [Google Scholar] [CrossRef]
  34. Mehrabian, A.; Susan, R.F. Inference of attitudes from nonverbal communication in two channels. J. Consult. Psychol. 1967, 31, 248. [Google Scholar] [CrossRef]
  35. Morris, J.C. The Clinical Dementia Rating (CDR) current version and scoring rules. Neurology 1993, 43, 2412. [Google Scholar] [CrossRef]
  36. Kawaguchi, H.; Sato, S. A Study on the Evaluation of Cognitive Ability of Demented Elderly by Others. Care Behav. Sci. Elder. 2002, 8, 37–45. (In Japanese) [Google Scholar]
  37. Laranjo, L.; Dunn, A.G.; Tong, H.L.; Kocaballi, A.B.; Chen, J.; Bashir, R.; Surian, D.; Gallego, B.; Magrabi, F.; Lau, A.Y.; et al. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. 2018, 25, 1248–1258. [Google Scholar] [CrossRef] [PubMed]
  38. Luxton, D.D. (Ed.) Artificial Intelligence in Behavioral and Mental Health Care; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
  39. Walker, G.; Morris, L.A.; Christensen, H.; Mirheidari, B.; Reuber, M.; Blackburn, D.J. Characterising spoken responses to an intelligent virtual agent by persons with mild cognitive impairment. Clin. Linguist. Phon. 2021, 35, 237–252. [Google Scholar] [CrossRef] [PubMed]
  40. Yoshii, K.; Kimura, D.; Kosugi, A.; Shinkawa, K.; Takase, T.; Kobayashi, M.; Yamada, Y.; Nemoto, M.; Watanabe, R.; Ota, M. Screening of mild cognitive impairment through conversations with humanoid robots: Exploratory pilot study. JMIR Form. Res. 2023, 7, e42792. [Google Scholar] [CrossRef]
  41. Liang, X.; Batsis, J.A.; Zhu, Y.; Driesse, T.M.; Roth, R.M.; Kotz, D.; MacWhinney, B. Evaluating Voice-Assistant Commands for Dementia Detection. Comput. Speech Lang. 2022, 72, 101297. [Google Scholar] [CrossRef]
  42. Agbavor, F.; Liang, H. Predicting Dementia from Spontaneous Speech Using Large Language Models. PLoS Digit. Health 2022, 1, e0000168. [Google Scholar] [CrossRef]
  43. Hamrick, P.; Sanborn, V.; Ostrand, R.; Gunstad, J. Lexical Speech Features of Spontaneous Speech in Older Persons with and without Cognitive Impairment: Reliability Analysis. JMIR Aging 2023, 6, e46483. [Google Scholar] [CrossRef] [PubMed]
  44. Tang, F.; Chen, J.; Dodge, H.H.; Zhou, J. The Joint Effects of Acoustic and Linguistic Markers for Early Identification of Mild Cognitive Impairment. Front. Digit. Health 2022, 3, 702772. [Google Scholar] [CrossRef]
  45. Wolters, M.K.; Fiona, K.; Jonathan, K. Designing a spoken dialogue interface to an intelligent cognitive assistant for people with dementia. Health Inform. J. 2016, 22, 854–866. [Google Scholar] [CrossRef] [PubMed]
  46. NTT Communications. Started Free Trial of Brain Health Check Toll-Free Dial. Available online: https://www.ntt.com/about-us/press-releases/news/article/2022/0921.html (accessed on 26 April 2024).
  47. Hikino, J.; Nakano, Y.; Yasuda, K. Communication Support for Dementia Patients Using a Conversation Agent. In Proceedings of the 73rd National Convention of Information Processing Society of Japan, Tokyo, Japan, 2–4 March 2011; pp. 195–196. [Google Scholar]
  48. NEC Platforms; Papero, I. Available online: https://www.necplatforms.co.jp/solution/papero_i/index.html (accessed on 26 April 2024).
  49. Igarashi, T.; Nihei, M.; Inoue, T.; Sugawara, I.; Kamata, M. Eliciting a User’s Preferences by the Self-Disclosure of Socially Assistive Robots in Local Households of Older Adults to Facilitate Verbal Human–Robot Interaction. Int. J. Environ. Res. Public Health 2022, 19, 11319. [Google Scholar] [CrossRef]
  50. Kobayashi, T.; Miyazaki, T.; Arai, K. Study on a System for Detecting Signs of Dementia Using a Social Media Mediating Robot. J. Inst. Electron. Inf. Commun. Eng. D 2022, 105, 533–545. [Google Scholar]
  51. Mou, Y.; Xu, K. The media inequality: Comparing the initial human-human and human-AI social interactions. Comput. Hum. Behav. 2017, 72, 432–440. [Google Scholar] [CrossRef]
  52. Hewitt, T.; Ian, B. A case study of user communication styles with customer service agents versus intelligent virtual agents. In Proceedings of the the 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Virtual, 1–3 July 2020. [Google Scholar]
  53. Diederich, S.; Brendel, A.B.; Morana, S.; Kolbe, L. On the design of and interaction with conversational agents: An organizing and assessing review of human-computer interaction research. J. Assoc. Inf. Syst. 2022, 23, 96–138. [Google Scholar] [CrossRef]
  54. Bickmore, T.W.; Mitchell, S.E.; Jack, B.W.; Paasche-Orlow, M.K.; Pfeifer, L.M.; O’Donnell, J. Response to a relational agent by hospital patients with depressive symptoms. Interact. Comput. 2010, 22, 289–298. [Google Scholar] [CrossRef] [PubMed]
  55. Silkej, E. Linguistic Differences in Real Conversations: Human to Human vs Human to Chatbot; University of Gothenburg in Sweden: Gothenburg, Sweden, 2020. [Google Scholar]
  56. Spielberger, C.D.; Gonzalez-Reigosa, F.; Martinez-Urrutia, A.; Natalicio, L.F.; Natalicio, D.S. The State-Trait Anxiety Inventory. Rev. Interam. Psicol./Interam. J. Psychol. 1971, 5, 4. [Google Scholar]
  57. Hida, N.; Fukuhara, T.; Iwawaki, M.; Soga, Y.; Spielberger, C.D. New Edition STAI Manual—State-Trait Anxiety Inventory-Form JYZ Practical Education Publication; Hidano, T., Fukuhara, M., Iwawaki, M., Soga, S., Spielberger, C.D., Eds.; Practical Education Publishing: Tokyo, Japna, 2000. [Google Scholar]
  58. Huskisson, E.C. Measurement of pain. Lancet 1974, 304, 1127–1131. [Google Scholar] [CrossRef]
  59. Yesavage, J.A. Geriatric Depression Scale. Psychopharmacol. Bull. 1988, 24, 709–711. [Google Scholar] [PubMed]
  60. 3D Character Creation Software VRoid Studio. Available online: https://vroid.com/studio (accessed on 26 April 2024).
  61. GitHub Pixiv Three-Vrm. Available online: https://github.com/pixiv/three-vrm (accessed on 26 April 2024).
  62. AI Service for Generating Voice and Facial Motion Koemotion. Available online: https://rinna.co.jp/news/2023/06/20230612.html (accessed on 26 April 2024).
  63. Web Speech API—MDN Web Docs. Available online: https://developer.mozilla.org/ja/docs/Web/API/Web_Speech_API (accessed on 26 April 2024).
  64. Introducing ChatGPT and Whisper APIs. Available online: https://openai.com/index/introducing-chatgpt-and-whisper-apis/ (accessed on 26 April 2024).
  65. Igarashi, T.; Umeda-Kameyama, Y.; Kojima, T.; Akishita, M.; Nihei, M. Assessment of Adjunct Cognitive Functioning through Intake Interviews Integrated with Natural Language Processing Models. Front. Med. 2023, 10, 1145314. [Google Scholar] [CrossRef]
  66. Sidhu, S.; Joseph, E.M. Evaluating and managing bradycardia. Trends Cardiovasc. Med. 2020, 30, 265–272. [Google Scholar] [CrossRef]
  67. Rehorn Mand Albert, Y.S. Bradyarrhythmias. Manag. Card. Arrhythm. 2020, 3, 205–224. [Google Scholar]
  68. General Incorporated Association Japanese Circulation Society. Non-Pharmacological Treatment Guidelines for Arrhythmias; The Japanese Society of Cardiology: Tokyo, Japan, 2018. [Google Scholar]
  69. Oshio, S.; Abe, S. Attempt to Create the Japanese Version of the Ten Item Personality Inventory (TIPI-J). Personal. Res. 2012, 21, 40–52. [Google Scholar]
  70. Medeiros, L.; Tibor, B.; Charlotte, G. Can a chatbot comfort humans? Studying the impact of a supportive chatbot on users’ self-perceived stress. IEEE Trans. Hum.-Mach. Syst. 2021, 52, 343–353. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.