1. Introduction
Thailand entered an aging society in 2001 when the aging population over 65 was around 7% of the country’s population. By 2050, its aging population is expected to reach 35.8%; i.e., ~20 million people [
1]. In 2021, the share of population older than 65 years old in Thailand accounted for 12.4%. According to prevalence studies, mild cognitive impairment (MCI) was found in ~20% of the elderly [
2,
3,
4]. This percentage is alarming to healthcare professionals because MCI causes a cognitive change in people over 65 years of age that can develop into Alzheimer’s disease (AD) or dementia [
5]. Early detection of MCI is essential for the elderly to manage their lifestyle, which may alleviate the impairments in brain function [
4]. However, a diagnosis of MCI can be time consuming and cost intensive due to the need for several clinical procedures. Using information and communication technology will facilitate clinicians to overcome these limitations.
The Montreal cognitive assessment (MoCA) is a prominent screening assessment tool to diagnose cognitive impairment [
6,
7,
8,
9,
10]. MoCA is used to diagnose MCI by considering patients’ performance in various cognitive functions using tests. Inevitably, MoCA has some limitations. First, the original paper-and-pencil MoCA requires experts to conduct the assessment with the participants. Second, it cannot be used for sightless or motor disabilities. Third, the assessment result is stored manually only on paper, making it difficult to further analyze the results.
A possible solution to mitigate the above limitations is to consider the analysis of verbal fluency (VF). There are two categories of VF: semantic VF (SVF) and phonemic VF (PVF). Many scholars have shown the success of MCI detection using VF [
11,
12,
13,
14,
15,
16]. SVF can be obtained when patients are asked to say a word in certain categories (e.g., fruits, animals). Meanwhile, for PVF, MoCA prompts patients to say words beginning with specific letters, such as “F”, in 1 min. The score of a PVF test is calculated from the total number of correct answers. A decline in VF or a low score is evidence of frontal lobe dysfunction, which is related to the symptoms of MCI [
17]. The number of generated words in Thai PVF substantially differs between MCI and a healthy control (HC) [
18]. Several studies have suggested ways to extract features from PVF for MCI detection, which will be extended in the related work.
Although the abovementioned analytical process performs well in English, it cannot be applied to Thai. The main reason is that Thai has different grammatical rules and structures compared with English [
19], which could pose numerous problems, such as (1) the problem of phonemic clustering in Thai, which requires subcategories to be rearranged; (2) the homophone problem because Thai has several sets of letters that produce the same sound, differing in its definitions (e.g., “กรรณ” /kan/, “กัน” /kan/); (3) the compound word problem due to prefixing, i.e., “การ” /kaan/ or “กระ” /krà/, to change the types or definitions of words (e.g., “การบ้าน” /kaanˑbâan/, “การเรียน” /kaanˑrian/, “กระโดด” /kràˑdòot/, and “กระรอก” /kràˑrɔ̂ɔk/); (4) the tonal characteristic that adds challenges to speech recognition (e.g., “ก่อน” /kɔ̀ɔn/, “ก้อน” /kɔ̂ɔn/); and (5) the consonant cluster problem for groups of two consonants, i.e., “กล” /kl/, “กร” /kr/, and “กว” /kw/, that make a distinct sound in pronunciation (e.g., “กล้าม” /klâam/, “กราบ” /kràap/, “กวาด” /kwàat/). Naturally, there must be novel methods to remedy these problems. Due to the linguistic characteristics, we have noticed this vulnerability. We plan to use our Thai language proficiency to address these issues.
In this study, we focused on detecting MCI using Thai PVF data from the digital MoCA [
10], which has validity as assessed by examining Spearman’s rank order coefficients and the Cronbach alpha value [
20]. To solve the language barriers, we planned to use our proficiency in Thai language to develop a novel phonemic clustering and switching algorithm. Furthermore, we proposed a novel method by combining various feature types with feature selection using the chi-square test. In this way, we achieved a promising result in detecting MCI using Thai PVF data and highlighted the feature’s importance for further research investigation.
2. Related Work
VF tasks are employed for assessing neuropsychology because of their conciseness and ease of use. Participants are asked to name as many words as possible in 1 min under a given condition. SVF has the condition of requiring participants to identify things, such as animals or fruits. Meanwhile, PVF has the condition of requiring participants to produce words beginning with specific letters, such as F or P. Several scholars have analyzed variants within VF tasks to observe the processes that influence cognitive impairment.
Troyer et al. [
21] introduced two essential components in VF: clustering—the grouping of words within semantic or phonemic subcategories; and switching—the ability to transition between clusters. Ryan et al. [
22] compared cognitive decline between experienced boxers and beginners and proposed a cluster using a similarity score of phonemes in VF. They showed that the number of fights was significantly related to shifting ability. Mueller et al. [
23] investigated the correlation between PVF and SVF using data from the Wisconsin Registry for Alzheimer’s Prevention. They showed that persons with amnestic MCI poorly have lower scores than the control group. Clustering is related to the tendency for participants to produce words within the same category. Switching refers to participants’ conscious decision to shift from one category to another [
24].
Word similarity is an effective strategy for detecting cognitive impairment. Levenshtein et al. [
25] introduced the Levenshtein distance (LD) to evaluate word similarity by edit distance. LD is the number of operations (e.g., insertions, deletions, and substitutions) required for transforming one word into another. Orthographic similarity, calculated from comparing letters in words, is commonly used in psycholinguistics; it involves lexical access in word memory [
24,
25,
26]. Semantic similarity is based on word meaning or definition; it affects letter fluency performance, such as the degradation of nonverbal conceptual information [
27]. Lindsay et al. [
28] proposed alternative similarity metrics (e.g., LD, weighted phonetic similarity, weighted position in words, and semantic distance between words, clustering, and switching) with a two-fold evaluating argument. They showed that weighted phonemic edit distance had the best result for assessment in PVF. Further, similarity-based features have been reported to help improve model accuracy by 29% for PVF [
29].
Spontaneous speech is a sensitive parameter to identify cognitive impairment in VF. Hoffmann et al. [
30] proposed four temporal parameters of spontaneous speech by Hungarian native speakers. Their examination included the hesitation ratio, articulation rate, speech tempo, and grammatical errors. They showed that the hesitation ratio is the best parameter for identifying AD. However, measuring these parameters can be time consuming. T’oth et al. [
31] performed automatic feature extraction using automatic speech recognition (ASR) for laborious processes. Their method, which could be used as a screening tool for MCI, yielded an F1-score of 85.3. Using silence-based features (e.g., silence segment, filled pauses, and silence duration) with a machine learning technique has yielded an F1-score of 78.8% for detecting MCI [
15]. Recently, Campbell et al. [
32] proposed an algorithm based on analyzing the temporal patterns of silence in VF tasks using the “AcceXible” and “ADReSS” databases. Their results showed that the silence-based feature had the best accuracy in the VF tasks. Several studies within the same scope have indicated that silenced-based features are the biomarkers for detecting cognitive impairment [
13].
In conclusion, the abovementioned features (silenced-based features, similarity-based features, and clustering) are related to cognitive decline in MCI. These features have different capabilities and implications in discrimination. We found the possibility to integrate them with state-of-the-art machine learning techniques in MoCA application for medical benefits and some improvement. However, some features may be unsuitable for Thai, which we investigate in this study.
5. Discussion
The goals of the present study were to use the data from the Thai PVF task for MCI detection and develop the guidelines for clustering in the feature extraction for Thai PVF. Using state-of-the-art machine learning techniques with optimal feature extraction produced acceptable results for MCI detection (
Table 4,
Table 5 and
Table 6).
5.1. Feature Analysis
Our findings provide three pieces of evidence that are consistent with previous research. First, the prediction value of the silenced-based feature for MCI detection is high [
30]. The average silence between words is ranked at the top of the SHAP values. Silence might be accounted for by impaired processes of lexical access and word-finding difficulties. MCI tends to have extended silence before saying the next word, whereas silence in the PVF task directly implicates the number of generated words.
Figure 9 shows that MCI’s box and HC’s box of the average silence between words are almost symmetric. The median indicates that the data between HC and MCI are likely different. Second, the prediction value of switching is high, but clustering is not (
Figure 5 and
Figure 7). This finding agrees with the original research that switching is more essential than clustering for optimal performance on PVF, whereas switching and clustering were equally essential for SVF [
21]. Switching involves the transition between clusters. Alternatively, switching may be related to the ability to initiate a search for a new strategy or subcategory. MCI seems to have a lower value of switching compared with HC.
Figure 9 shows that the median of the MCI box is almost outside the HC box, suggesting that the two groups are different. Third, similarity-based features seem to have no prediction value. Similarity-based features were ranked almost last in terms of feature importance (
Figure 5,
Figure 6 and
Figure 7). Semantic similarity, which involves producing a different vocabulary, reveals the best p-value in the chi-square test compared with other similarity features.
Figure 9 shows that the MCI box is sparse. Furthermore, the median of the HC box is within the MCI box, indicating that this feature is inappropriate for MCI detection. These results correspond to those of a previous study that the semantic feature and LD had a worse silhouette coefficient than Troyer’s proposed method [
28].
5.2. Classification Analysis
In this study, three classifiers were chosen based on their algorithm’s basis and advantages in a performance comparison. SVM is advantageous in high-dimensional data, and it can customize kernel functions to transform data into a required form. RF is based on several decision tree classifiers on various subsamples of a dataset and uses averaging to improve the predictive accuracy [
36]. XGBoost is based on the gradient boosting algorithms, optimized and distributed to be highly efficient, flexible, and portable [
42].
We found that SVM is the best classifier among the three. Furthermore, we obtained slightly better results when increasing the number of significant features in the classification process (
Table 4,
Table 5 and
Table 6), which agrees with a previous study [
15]. Additionally, we performed fine-tuning to choose the optimal parameters in each classifier. From the result, we suggest that each classifier should be used for a task that it is good in. Therefore, SVM is suitable for widespread use because it has the highest AUC, which is a threshold-free evaluation metric. Meanwhile, RF performs stably even when increasing the number of features; the AUC is between 0.617 and 0.683. XGBoost’s performance is similar to that of RF, with an AUC between 0.617 and 0.671. Furthermore, in terms of training data and fine-tuning, XGBoost is the fastest among the three classifiers.
5.3. Limitations and Future Work
Our proposed phonemic clustering and switching guidelines demonstrate the benefits of MCI detection for Thai native speakers. This proposal fills the gap between the differences in language characteristics. Our algorithms are also simplified and do not require high computing power, which is suitable for a mobile or small device. Accordingly, we believe this guideline will aid in the cost-effective automation of MCI detection.
However, this study has some limitations. First, our data were obtained from only one type of Thai PVF. Another Thai VF assessment (fruit categories, animal categories, and other letters, such as /S/ “ส”) has not been investigated yet. Next is the small and unbalanced dataset. Unfortunately, we collected data for this research during the coronavirus outbreak. Thus, there were insufficient participants to collect a large amount of data due to the lockdown policy. Finally, high-accuracy ASR for PVF is needed to handle a large amount of data. Several text-to-speech solutions perform well in typical situations; e.g., when speaking long sentences. However, when applied to an audio clip using PVF, unacceptable results were realized. Maybe the PVF does not have the context clues to help the computer speculate the next word. Further, PVF has so many short-speech styles that it is difficult to specify whether they are phonemes or tones. Besides, the Thai language has different word meanings using tones. For this reason, the more accurate the text-to-speech solution, the more extensive data we can handle.
For future research, we developed the digital MoCA to collect beneficial information during a test. We plan to use the data from other tasks (backward digit span, serial sevens, and memory test) obtained from the digital MoCA. We believe that selecting a significant feature from the various tasks will encourage the performance of MCI detection or other relevant diseases (dementia and AD). We also plan to use the Thai text-to-speech solution [
10] that focuses on PVF in terms of being fully automated.