MDPI - Publisher of Open Access Journals

23 pages, 3353 KiB

Open AccessArticle

HyFusER: Hybrid Multimodal Transformer for Emotion Recognition Using Dual Cross Modal Attention

by Moung-Ho Yi, Keun-Chang Kwak and Ju-Hyun Shin

Appl. Sci. 2025, 15(3), 1053; https://doi.org/10.3390/app15031053 - 21 Jan 2025

Cited by 7 | Viewed by 2051

Emotion recognition is becoming increasingly important for accurately understanding and responding to user emotions, driven by the rapid proliferation of non-face-to-face environments and advancements in conversational AI technologies. Existing studies on multimodal emotion recognition, which utilize text and speech, have aimed to improve performance by integrating the information from both modalities. However, these approaches have faced limitations such as restricted information exchange and the omission of critical cues. To address these challenges, this study proposes a Hybrid Multimodal Transformer, which combines Intermediate Layer Fusion and Last Fusion. Text features are extracted using KoELECTRA, while speech features are extracted using HuBERT. These features are processed through a transformer encoder, and Dual Cross Modal Attention is applied to enhance the interaction between text and speech. Finally, the predicted results from each modality are aggregated using an average ensemble method to recognize the final emotion. The experimental results indicate that the proposed model achieves superior emotion recognition performance compared to existing models, demonstrating significant progress in improving both the accuracy and reliability of emotion recognition. In the future, incorporating additional modalities, such as facial expression recognition, is expected to further strengthen multimodal emotion recognition capabilities and open new possibilities for application across diverse fields. Full article

(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods: 2nd Edition)

► Show Figures

Figure 1

15 pages, 2085 KiB

Open AccessArticle

KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer

by Moung-Ho Yi, Keun-Chang Kwak and Ju-Hyun Shin

Electronics 2024, 13(23), 4674; https://doi.org/10.3390/electronics13234674 - 27 Nov 2024

Cited by 4 | Viewed by 1751

Abstract

With the advancement of human-computer interaction, the role of emotion recognition has become increasingly significant. Emotion recognition technology provides practical benefits across various industries, including user experience enhancement, education, and organizational productivity. For instance, in educational settings, it enables real-time understanding of students’ emotional states, facilitating tailored feedback. In workplaces, monitoring employees’ emotions can contribute to improved job performance and satisfaction. Recently, emotion recognition has also gained attention in media applications such as automated movie dubbing, where it enhances the naturalness of dubbed performances by synchronizing emotional expression in both audio and visuals. Consequently, multimodal emotion recognition research, which integrates text, speech, and video data, has gained momentum in diverse fields. In this study, we propose an emotion recognition approach that combines text and speech data, specifically incorporating the characteristics of the Korean language. For text data, we utilize KoELECTRA to generate embeddings, and for speech data, we extract features using HuBERT embeddings. The proposed multimodal transformer model processes text and speech data independently, subsequently learning interactions between the two modalities through a Cross-Modal Attention mechanism. This approach effectively combines complementary information from text and speech, enhancing the accuracy of emotion recognition. Our experimental results demonstrate that the proposed model surpasses single-modality models, achieving a high accuracy of 77.01% and an F1-Score of 0.7703 in emotion classification. This study contributes to the advancement of emotion recognition technology by integrating diverse language and modality data, suggesting the potential for further improvements through the inclusion of additional modalities in future work. Full article

(This article belongs to the Special Issue Application of Data Mining in Social Media)

► Show Figures

Figure 1

15 pages, 1771 KiB

Open AccessArticle

Fake Sentence Detection Based on Transfer Learning: Applying to Korean COVID-19 Fake News

by Jeong-Wook Lee and Jae-Hoon Kim

Appl. Sci. 2022, 12(13), 6402; https://doi.org/10.3390/app12136402 - 23 Jun 2022

Cited by 12 | Viewed by 3594

Abstract

With the increasing number of social media users in recent years, news in various fields, such as politics, economics, and so on, can be easily accessed by users. However, most news spread through social networks including Twitter, Facebook, and Instagram has unknown sources, thus having a significant impact on news consumers. Fake news on COVID-19, which is affecting the global population, is propagating quickly and causes social disorder. Thus, a lot of research is being conducted on the detection of fake news on COVID-19 but is facing the problem of a lack of datasets. In order to alleviate the problem, we built a dataset on COVID-19 fake news from fact-checking websites in Korea and propose deep learning for detecting fake news on COVID-19 using the datasets. The proposed model is pre-trained with large-scale data and then performs transfer learning through a BiLSTM model. Moreover, we propose a method for initializing the hidden and cell states of the BiLSTM model to a [CLS] token instead of a zero vector. Through experiments, the proposed model showed that the accuracy is 78.8%, which was improved by 8% compared with the linear model as a baseline model, and that transfer learning can be useful with a small amount of data as we know it. A [CLS] token containing sentence information as the initial state of the BiLSTM can contribute to a performance improvement in the model. Full article

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

► Show Figures

Figure 1

11 pages, 1578 KiB

Open AccessArticle

K-EPIC: Entity-Perceived Context Representation in Korean Relation Extraction

by Yuna Hur, Suhyune Son, Midan Shim, Jungwoo Lim and Heuiseok Lim

Appl. Sci. 2021, 11(23), 11472; https://doi.org/10.3390/app112311472 - 3 Dec 2021

Cited by 9 | Viewed by 3117

Abstract

Relation Extraction (RE) aims to predict the correct relation between two entities from the given sentence. To obtain the proper relation in Relation Extraction (RE), it is significant to comprehend the precise meaning of the two entities as well as the context of the sentence. In contrast to the RE research in English, Korean-based RE studies focusing on the entities and preserving Korean linguistic properties rarely exist. Therefore, we propose K-EPIC (Entity-Perceived Context representation in Korean) to ensure enhanced capability for understanding the meaning of entities along with considering linguistic characteristics in Korean. We present the experimental results on the BERT-Ko-RE and KLUE-RE datasets with four different types of K-EPIC methods, utilizing entity position tokens. To compare the ability of understanding entities and context of Korean pre-trained language models, we analyze HanBERT, KLUE-BERT, KoBERT, KorBERT, KoELECTRA, and multilingual-BERT (mBERT). The experimental results demonstrate that the F1 score increases significantly with our K-EPIC and that the performance of the language models trained with the Korean corpus outperforms the baseline. Full article

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI