The Development of a Multi-Modality Emotion Recognition Test Presented via a Mobile Application

Emotion recognition ability is the basis of interpersonal communication and detection of brain alterations. Existing tools for assessing emotion recognition ability are mostly single modality, paper-and-pencil test format, and using only Western stimuli. However, various modalities and cultural factors greatly influence emotion recognition ability. We aimed to develop a multi-modality emotion recognition mobile application (MMER app). A total of 169 healthy adults were recruited as participants. The MMER app’s materials were extracted from a published database, and tablets were used as the interface. The Rasch, factor analysis, and related psychometric analyses were performed. The Cronbach alpha was 0.94, and the test–retest reliability was 0.85. Factor analyses identified three factors. In addition, an adjusted score formula was provided for clinical use. The MMER app has good psychometric properties, and its further possible applications and investigations are discussed.


Emotion Recognition
Emotion recognition indicates the identification of emotional information in various ways, including through the face and prosody [1]. Different receiving channels affect emotion recognition accuracy. Emotion recognition ability is essential for human social interaction and considerably impacts everyday life [2]. When individuals interact with others, they need to receive and recognize emotional information from various sources, such as verbal (e.g., words and semantics) or non-verbal (e.g., tones, facial emotions, and body movements) stimulation. Studies have shown that emotion recognition ability is an essential factor in developing the ability to "understand others' intentions" [3]. In the past few years, research on emotion recognition has grown tremendously and attracted the interest of scholars in various fields.
Emotion recognition ability develops during childhood [4] and changes with age. Nevertheless, the relationship between emotion recognition ability and aging is inconclusive. Previous studies have shown that emotion recognition ability declines with age [5]. The elderly find it more challenging to identify anger [6], fear, and sadness [7] than young people. Moreover, emotion recognition is essential for detecting brain alterations or early signs of psychological/neurological disorders. Evidence has shown that patients with neurodegenerative diseases (e.g., Alzheimer's disease [8] and Parkinson's disease [9]) or psychiatric disorders (e.g., autism spectrum disorder and schizophrenia [10]) are found to have impairments in facial emotion recognition ability. Furthermore, it has also been shown that alexithymia is a transdiagnostic construct [11], and detecting difficulties in emotion recognition could facilitate the diagnosis of psychiatric disorders. Changes in emotion recognition ability may reflect alterations in the corresponding brain areas. The literature indicates that some brain areas (e.g., the striatum, fusiform gyrus, superior temporal gyrus, amygdala, orbitofrontal cortex, basal ganglia, somatosensory, and insula) might be responsible for this ability [12][13][14].
Based on empirical evidence, it is believed that emotion recognition ability may be a good indicator of detecting subtle changes in the brain and facilitating the diagnosis and management of psychiatry-related diseases. Therefore, an assessment tool that measures emotion recognition ability is essential and has clinical utility.

Single-Modal Emotion Recognition Tests
Most previous recognition tests have been based on Ekman's facial emotion database [15,16]; however, most current emotion recognition tests only focus on four basic emotions (i.e., happy, sad, angry, and surprised/frightened) [17]. Unfortunately, only a few Eastern facial stimuli were included in this database. Although facial expression is believed to be a universal language for emotion [18], studies have found that ethnicity and cultural factors must be considered [19,20]. Culture may affect how individuals express their feelings; Westerners use the whole face to express emotions, while Easterners use the upper half of the face [20]. Jack and colleagues (2012) [21] found that Asians have difficulty recognizing European facial emotions. Current studies have revealed a culture-specific decoding strategy applied in various ethnicities and cultures [20].
Few single-modal emotion recognition tests (shown in Table 1), which were applied Eastern materials, have been developed in the past, including the Japanese and Caucasian Brief Affect Recognition Test (JACBART) [22] and the Chinese Facial Emotion Recognition Database (CFERD) [23]. The JACBART rates the degree in the seven types of facial emotions simultaneously (i.e., anger, contempt, disgust, fear, happiness, sadness, and surprise) for one facial stimulus. The CFERD is a three-dimensional color facial emotion recognition task, which involves matching each facial expression with the corresponding emotional word (happiness, disgust, fear, anger, sadness, surprise, or neutral). The tests mentioned above address the culture-specific issues but do not consider the interaction between the facial emotion source and other modalities. Therefore, developing a multi-modal emotion recognition test with Eastern materials is needed.

Multi-Modal Emotion Recognition Tests
A few emotion recognition tests have examined dual or multi-modal emotion recognition (shown in Table 2). The most widely known are the Florida Affect Battery (FAB) [24] and the Diagnostic Analysis Nonverbal Accuracy Scale (DANVA) [17]. These tests are well known and commonly used dual emotional recognition tests; however, they have limitations. For instance, the lack of core emotions (i.e., the FAB lacks disgust and fear, and the DANVA lacks disgust and surprise), the application of only Caucasian faces as stimuli, and the use of only the female face and voice. Fear and disgust are crucial emotions for people's survival and social interactions. If one cannot recognize another's fear, it may affect the individual's understanding of a crisis and messages from others. It is believed that disgust is related to human beings' specific needs (e.g., hunger and moral) [25], and disgust allows individuals to perform correct behaviors, such as nausea and vomiting after eating expired food; in addition, sexual violence can cause human disgust. Thus, the recognition of these two emotions is crucial for an individual's life and interpersonal relationships. Although the FAB has been translated and validated in the Chinese population [26], it still uses Caucasian faces for facial stimuli. The FAB altered the voice to Chinese pronunciation; however, the Caucasian faces with Chinese voices may confuse the participants.
The Awareness of Social Inference Test Emotion Evaluation Test [27], the Multimodal Emotion Recognition Test (MERT) [28], the Geneva Emotion Recognition Test (GERT) and its short form (GERT-S) [29,30], and the Emotion Recognition Assessment in Multiple Modalities test [31] were also developed for examining multi-modal emotion recognition; however, they were not similar to the FAB with multiple subtests and were only performed Brain Sci. 2022, 12, 251 3 of 13 for matching tests, involving different types of stimuli and emotional words. Moreover, most of these tests examine more than ten types of emotions, and too many emotion choices may interfere with recognition. The stimuli for the above multi-modal emotion recognition tests were all Caucasian faces, and they were not validated for the Eastern population. The development of emotion recognition tools has progressed from the single-channel format [22,23,32,33] to the exploration of emotion recognition ability in multiple channels [17,24,[26][27][28][29][30][31]. Besides, rather than focusing on basic emotions in the past, a more comprehensive range of emotions and covering various emotions is needed for recent development tools [28][29][30][31]. Another unmet need is that most tools use stimulus material from Western countries or ethnicities, and few tests have been developed for Eastern populations. Furthermore, traditional paper-and-pencil tests have practical limitations. Through using readily available electronic tools as a medium, it will be possible to enhance the accessibility and applicability of the tools. However, to the best of our knowledge, none of the tools for assessing emotion recognition ability in the literature reviewed above were displayed through user-friendly methods (e.g., mobile application).

Aim
We aimed to develop the Multi-Modality Emotion Recognition mobile application (MMER app), a multi-modal emotion recognition test. Chinese faces and prosody were used Brain Sci. 2022, 12, 251 4 of 13 as stimuli. Moreover, comprehensive emotions (i.e., neutral, happy, sad, angry, disgust, fear, and surprise) were included. Furthermore, our test was designed as a device application and performed on a tablet due to the popularity and handiness of mobile devices. We used the tablet as an interface to calculate and output the scores, which improves the convenience and applicability of the test.

Participants
We recruited 169 healthy adults (demographic characteristics are presented in Table 3) from community activity centers and our college. Participants suspected to have dementia, based on a mini mental state examination (MMSE) score below 24, were excluded from the study [34]. In addition, patients with a history of psychiatric illness, substance abuse, severe systemic diseases, and traumatic brain injury were excluded. Before participation, the participants provided informed consent; ethical standards were drawn up based on the 1964 Declaration of Helsinki. The ethical research committee of National Cheng Kung University Hospital IRB (approval number: A-ER-107-425) confirmed the study protocols. Abbreviations: MMER app-Multi-Modalities Emotion Recognition Mobile Application; SD-standard deviation; a -sum of subtests 1, 3, 4, and 5; b -sum of subtests 8 and 9; + -standard deviation.

Item Generation of the MMER App
We generated the items and format of the MMER app through a literature review and reference to other tests (e.g., the FAB). Facial and prosodic emotions were chosen as the materials for the MMER app. The materials were obtained from the Emotional Speech Database in Taiwan and from the Taiwan Corpora of Chinese Emotion and Relevant Psychophysiological Database [35][36][37]. Those datasets included many Eastern faces and tones. We randomly selected 2272 Chinese face pictures and 188 sound segment stimuli in 7 emotions to establish the MMER app. The pretest version of the MMER app included 9 subtests (5 facial tests, 2 prosodic tests, and 2 cross-modal tests), with a total of 325 items.

The Subtests of the MMER App
We referred to the format of the FAB to generate the subtests. Subtest 1 was a facial feature discrimination test. Subtests 2-5 were facial-related emotion recognition tests, subtests 6 and 7 were prosodic-related emotion recognition tests, and subtests 8 and 9 were facial-prosodic, cross-modal emotion recognition tests. We wrote the items into the app and used the tablet to collect the participants' responses. Subtest 1 was a "facial feature discrimination test", with 24 items, including the front, side, and two-thirds of the face. Two faces were presented in the tablet at a time, and the participants were asked to determine whether the two faces were the same person. Subtest 2 was a "facial emotion discrimination test" using faces. The subtest had 35 items. Two faces were presented in the tablet at a time, and the participants were asked to determine whether the emotions of the two faces were the same. Subtest 3 was a "face-word matched test" and contained 42 items, including front, side, and two-thirds of the face. One face and seven emotion terms (i.e., neutral, happy, sad, angry, disgust, fear, and surprise) were displayed on the tablet, and the participants were asked to choose one emotion term that best fit the target face. Subtest 4 was a "word-face matched test" and contained 14 items. One emotion term and seven faces were shown on the tablet, and the participants were asked to choose one face that best fit the target emotion term. Subtest 5 was a "face-face emotion mated test" containing 28 items. Six faces were shown in the tablet (along with the target face), and the participants were asked to select one face whose emotion best resembled that of the target face.
Subtest 6 was a "prosodic emotion discrimination test". Two emotional sentences were played via the tablet, and the participants were asked to determine whether the two sentences displayed the same emotions. Subtest 7 was a "prosodic-word matched test" with 28 tems. One sentence and seven emotion terms were presented, and participants were asked to choose one emotion term that most suitably represented the target sentence's emotion.
Subtest 8 was a "prosodic-face matched test" and contained 35 items. One sentence and four faces were presented on the tablet, and participants were asked to select one face whose emotion was the most suitable for the target sentence's emotion. Subtest 9 was a "face-prosodic matched test" and contained 42 items. One face and four sentences were presented, and participants were asked to select one sentence whose emotion was the most suitable for the target face's emotion.

The Pretest Stage
Seven participants were recruited to join the pretest procedure to examine the reaction time, test item accuracy, and modification of some items. In addition, we collected the participants' comments after completing the test to improve the quality of the MMER app. After the pretest, we kept the items with an accuracy above 50%, increased the number of practice items (subtests 3-5 and subtest 7), and added feedback in the practice section. We modified the items in which accuracy was below 50% by excluding stimuli that presented conflicting emotions (e.g., actors acting out sad emotions, but most participants rated it as scared). The option of subtest 9 was originally a four-segment prosodic emotion; however, considering that prosodic emotion has only two opportunities to play and is more likely to be affected by the participant's cognitive function (e.g., poor memory may make them forget the sound of the previous option), the number of options were reduced to three. In the analysis of the previous version of the MMER app, subtest 2 had unacceptable internal consistency (Cronbach's alpha = 0.46) and test-retest reliability (0.28). Subtest 6 had poor internal consistency (Cronbach's alpha = 0.56) and questionable test-retest reliability (0.67). Thus, we deleted these two subtests. The Rasch analysis results were used to modify the MMER app. Based on the Rasch analysis, the comparison between participant ability and item difficulty was conducted, and the items below participant ability were deleted.
Item difficulty, which was below the minimum value of participant performance, was eliminated. Nevertheless, it retained half of the items in each emotion if over half of the items in each emotion were canceled. As the purpose of the MMER app was to determine emotion recognition ability, seven types of emotion items were needed to fulfill its aims, even though it was easy for most of the population. The retained items were selected based on difficulty-the higher difficulty items were retained until each emotion item reached half.

The MMER App
The MMER app included 7 subtests and 198 items (100 items of facial tests, 25 items of prosodic tests, and 73 items of cross-modal tests), with a scoring method of 1 point per question (total score of 198 points).

Measurement
The MMER app was displayed on a 10.1-inch tablet to evaluate the participants' emotion recognition ability (Figure 1). We used the MMSE [34] to exclude participants with dementia. MMSE has a total score of 30 and is often used to assess an individual's general cognitive function. Those with scores below 24 were considered likely to have dementia. The Reading Mind in the Eyes Test (RMET) [38] was applied to establish criterion-related validity. The RMET is a test that is often used to assess individuals' judgment of others' feelings or emotions through the expressions around their eyes. This test presents a series of black-and-white photos expressing emotions and asks participants to choose the adjective that best fits the emotion expressed by the people in the photos.
was eliminated. Nevertheless, it retained half of the items in each emotion if over half of the items in each emotion were canceled. As the purpose of the MMER app was to determine emotion recognition ability, seven types of emotion items were needed to fulfill its aims, even though it was easy for most of the population. The retained items were selected based on difficulty-the higher difficulty items were retained until each emotion item reached half.

The MMER App
The MMER app included 7 subtests and 198 items (100 items of facial tests, 25 items of prosodic tests, and 73 items of cross-modal tests), with a scoring method of 1 point per question (total score of 198 points).

Measurement
The MMER app was displayed on a 10.1-inch tablet to evaluate the participants' emotion recognition ability (Figure 1). We used the MMSE [34] to exclude participants with dementia. MMSE has a total score of 30 and is often used to assess an individual's general cognitive function. Those with scores below 24 were considered likely to have dementia. The Reading Mind in the Eyes Test (RMET) [38] was applied to establish criterion-related validity. The RMET is a test that is often used to assess individuals' judgment of others' feelings or emotions through the expressions around their eyes. This test presents a series of black-and-white photos expressing emotions and asks participants to choose the adjective that best fits the emotion expressed by the people in the photos.

Data Analysis
Descriptive statistics of performance were calculated for total scores, accuracy in each subtest, and demographic characteristics of all participants. Cronbach's alpha was used to calculate internal consistency to examine the internal reliability of the MMER app. Twenty-eight participants were invited to complete the test again after three-four months from the first visit, and Pearson's correlation was employed to confirm the test-retest reliability. The RMET [37] was applied to establish criterion-related validity. Confirmatory factor analysis was conducted to examine the factorial structure of the MMER app. A onefactor model (model 1) was first used, as the purpose of the MMER app was related to one factor-"emotion recognition". Multi-dimensional models were conducted to examine the relationship between factors; models 2 and 4 were oblique, and models 3 and 5 were orthogonal. Moreover, two-factor models (models 2 and 3) and three-factor models were also tested to study the relationship between the three theoretical concepts (facial recognition, facial emotion recognition, and prosody emotion recognition). Oblique and orthogonal is a factor rotation for transform gained factors from factor analysis. It would max-

Data Analysis
Descriptive statistics of performance were calculated for total scores, accuracy in each subtest, and demographic characteristics of all participants. Cronbach's alpha was used to calculate internal consistency to examine the internal reliability of the MMER app. Twenty-eight participants were invited to complete the test again after three-four months from the first visit, and Pearson's correlation was employed to confirm the test-retest reliability. The RMET [37] was applied to establish criterion-related validity. Confirmatory factor analysis was conducted to examine the factorial structure of the MMER app. A one-factor model (model 1) was first used, as the purpose of the MMER app was related to one factor-"emotion recognition". Multi-dimensional models were conducted to examine the relationship between factors; models 2 and 4 were oblique, and models 3 and 5 were orthogonal. Moreover, two-factor models (models 2 and 3) and three-factor models were also tested to study the relationship between the three theoretical concepts (facial recognition, facial emotion recognition, and prosody emotion recognition). Oblique and orthogonal is a factor rotation for transform gained factors from factor analysis. It would maximize the large factor loadings and minimize the small factor loadings to enhance the interpretability for the factors. The major difference between oblique and orthogonal rotation is that the factors in the oblique rotation model could be correlated, and the correlation between factors in the orthogonal rotation model is equal to zero.

Performance
The mean and standard deviation of the demographic characteristics and performance of the participants are presented in Table 3. The correct score and accuracy of the MMER app for the seven types of emotion are shown in Table 4. Abbreviations: please see Table 3. a -sum of subtests 3, 4, and 5; b -sum of subtests 8 and 9.

Reliability
The internal consistency of the MMER app was excellent (Cronbach's alpha = 0.94) with good test-retest reliability (0.87).

Confirmatory Factor Analysis
The results for the different models are presented in Table 5, Figures 2-4. The fit index showed that the orthogonal models fit more with the data. The results of Models 2 and 4 are similar. Moreover, the structure of Model 4 is equal to the theoretical structure of the MMER app. The factor loadings of Model 4 are shown in Table 6.

Criterion-Related Validity
The result of the Pearson's correlation indicated that there was a strong positive association between RMET and the total score of the MMER app (r = 0.53, p < 0.001). Moreover, all subtests were also significantly correlated with RMET. For subtest 3 ("face-word matched test"), the Pearson correlation showed a strong positive association with RMET (r = 0.54, p < 0.001). In addition, the Pearson correlation showed that the RMET had a

Criterion-Related Validity
The result of the Pearson's correlation indicated that there was a strong positive association between RMET and the total score of the MMER app (r = 0.53, p < 0.001). Moreover, all subtests were also significantly correlated with RMET. For subtest 3 ("face-word matched test"), the Pearson correlation showed a strong positive association with RMET (r = 0.54, p < 0.001). In addition, the Pearson correlation showed that the RMET had a moderate positive association with subtest 4 ("word-face matched test") (r = 0.43, p <

Criterion-Related Validity
The result of the Pearson's correlation indicated that there was a strong positive association between RMET and the total score of the MMER app (r = 0.53, p < 0.001). Moreover, all subtests were also significantly correlated with RMET. For subtest 3 ("face-word matched test"), the Pearson correlation showed a strong positive association with RMET (r = 0.54, p < 0.001). In addition, the Pearson correlation showed that the RMET had a

Multiple Stepwise Regression for Application
According to the correlation analysis, the demographic variables of the sample were correlated with the MMER app; for instance, age (r = −0.77, p < 0.001) and education (r = 0.41, p < 0.001) were correlated to the total score. Multiple stepwise backward regression was conducted to modify the score of the MMER app using a formula. The demographic variables, including age, gender, and education, explained 65.67% of the MMER app performance based on multiple stepwise backward regression. The adjustment formula was as follows: Adjusted MMER app score = MMER app total score-4.007(gender-0.33)-0.750(age-48.28) + 1.645 (education-13.73) where gender was defined as male = 1 and female = 0.

Discussion
Emotion recognition is vital for social interaction and intimacy, and the evaluation of emotion recognition ability is necessary for detecting brain dysfunction and further developing rehabilitation programs. In the current study, we have overcome the limitations of previous studies and use the tablet's advantages to develop a reliable and effective emotion recognition test-the MMER app. The MMER app contains 7 subtests with 198 items. The stimuli used in the MMER app are all Eastern stimuli (both sexes), and the MMER app can measure multi-modal emotion recognition ability (i.e., face and prosody). The MMER app has good psychometric properties and takes only 20 minutes to complete. To the best of our knowledge, the MMER app is the first suitable test in the Chinese population to measure emotion recognition ability via various modalities (i.e., visual and auditory).
The total accuracy of the MMER app was 69%, and subtest 1 had the highest accuracy (93%). Compared with other subtests, healthy people generally have the highest accuracy in subtest 1. The face-related subtest (76%) had the second-highest accuracy, followed by the face-prosody subtest (61%). The prosody-related subtest (44%) had the lowest accuracy. The accuracy was similar to previous study findings, which reported facial emotion recognition accuracy between 65 and 78% and prosodic emotion recognition between 52 and 65% [39,40]. The highest accuracy of each emotion was anger, followed by happiness; the lowest accuracy was fear, followed by disgust. These findings are consistent with previous research [35,37] and indicate that people have the highest consistency in evaluating happiness and anger and the lowest consistency in assessing fear and disgust. The highest accuracies of the face-related and prosodic-related subtests were for happiness and anger, respectively. In addition, the lowest accuracy of the face-related subtest was for fear. For the prosody-related subtest, the lowest accuracy was for disgust, and these findings are consistent with Scherer et al.'s results. [40].
The MMER app has high internal consistency and adequate test-retest reliability. Moreover, the RMET, as criterion-related validity, is closely related to the MMER app. Among the subtests, the face-related subtests and the RMET scores, which also use facial pictures as stimulus materials, are significantly correlated. Previous studies have shown that these two abilities are highly related [41], and lesion studies have also found that emotion recognition and mind-reading abilities partially share the exact neural mechanisms [13]. Thus, we believe that the MMER app has well-established, criterion-related validity. In addition, the factor analyses confirm that the MMER app has three factors and concepts: facial recognition and facial and prosody emotion recognition. The findings of our factor analyses show that the MMER app is a multi-modality tool for measuring emotion recognition ability. Moreover, the seven subtests may represent a different psychological mechanism of emotion recognition owing to the different presentation methods. Further investigations that apply this test in clinical practice-focusing on the relationship between different types of emotion recognition defects and related brain pathologies, which may assist in disease detection and rehabilitation-are encouraged.
The MMER app has a certain degree of discrimination on healthy participants, which is different from previous tests (e.g., FAB and DANVA) that have a ceiling effect. Studies suggest that demographic variables, such as age, gender [42], and education level, are crucial in emotion recognition ability [43]. Multiple stepwise regression can be adjusted due to demographic variables, potentially being used as the norm exploration in the future. An individual's emotion recognition ability can be determined after using the adjusted score to query the percentage level comparison table. If one's adjusted MMER app score is below the common standard (5th percentile), his/her emotion recognition ability is considered defective. This is the first multi-modality emotional recognition test for an app to the best of our knowledge. Our MMER app can improve the accessibility and clinical efficiency of assessment.
The limitation of this study lies in the lack of developed Eastern emotion recognition models to compare with and refer to, which is also the reason for this study. Although the MMER app is significantly related to RMET, and criterion-related validity was established, the RMET cannot perfectly play the role of the criterion, especially the part of voice-emotion subtests. Second, the cultural divergence among Eastern countries is considerable. Third, the intensity of emotional stimuli was not considered in this study. Emotional intensity may be one reason that affects the ability to recognize emotions and needs further investigation. Fourth, the accuracy of the prosody-related subtests was low. Previous studies have found that other factors easily affect prosody-related emotions (e.g., semantic meaning) [31]. Our findings also confirmed that prosody-related emotion recognition accuracy is lower than face-related emotion recognition. Although the accuracy of these subtests was poor, other psychometric properties of these subtests were acceptable-good. Finally, only healthy participants were recruited because there is evidence that psychiatric disorders may impair emotion recognition. The MMER app could potentially be created as a screening test or a standard emotion recognition test within Eastern cultures. Thus, other populations should be recruited in future studies to validate our findings. Modification of the MMER app was also required for a prospective study on sample variation, multicultural comparison in Eastern culture, and item selection.

Conclusions
The MMER app has well-established psychometric properties and provides an integrated Eastern version of an emotion recognition test, with multiple modalities involved in comprehensive emotions, and no sex bias in the stimulus. Moreover, we offered a formula to generate the adjusted score, which can be used to determine whether an individual's emotion recognition ability is impaired through the percentile scale correspondence. Further research is needed to recruit other populations (e.g., clinical cases) to cross-validate the MMER app. This test also has the potential to be used in future clinical practice.