Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (31)

Search Parameters:
Keywords = facial sentiment

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 10634 KB  
Article
Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach
by Wullianallur Raghupathi, Jie Ren and Tanush Kulkarni
AppliedMath 2026, 6(1), 11; https://doi.org/10.3390/appliedmath6010011 - 9 Jan 2026
Abstract
As artificial intelligence systems proliferate across critical societal domains, understanding the nature, patterns, and evolution of AI-related harms has become essential for effective governance. Despite growing incident repositories, systematic computational analysis of AI incident discourse remains limited, with prior research constrained by small [...] Read more.
As artificial intelligence systems proliferate across critical societal domains, understanding the nature, patterns, and evolution of AI-related harms has become essential for effective governance. Despite growing incident repositories, systematic computational analysis of AI incident discourse remains limited, with prior research constrained by small samples, single-method approaches, and absence of temporal analysis spanning major capability advances. This study addresses these gaps through a comprehensive multi-method text analysis of 3494 AI incident records from the OECD AI Policy Observatory, spanning January 2014 through October 2024. Six complementary analytical approaches were applied: Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) topic modeling to discover thematic structures; K-Means and BERTopic clustering for pattern identification; VADER sentiment analysis for emotional framing assessment; and LIWC psycholinguistic profiling for cognitive and communicative dimension analysis. Cross-method comparison quantified categorization robustness across all four clustering and topic modeling approaches. Key findings reveal dramatic temporal shifts and systematic risk patterns. Incident reporting increased 4.6-fold following ChatGPT’s (5.2) November 2022 release (from 12.0 to 95.9 monthly incidents), accompanied by vocabulary transformation from embodied AI terminology (facial recognition, autonomous vehicles) toward generative AI discourse (ChatGPT, hallucination, jailbreak). Six robust thematic categories emerged consistently across methods: autonomous vehicles (84–89% cross-method alignment), facial recognition (66–68%), deepfakes, ChatGPT/generative AI, social media platforms, and algorithmic bias. Risk concentration is pronounced: 49.7% of incidents fall within two harm categories (system safety 29.1%, physical harms 20.6%); private sector actors account for 70.3%; and 48% occur in the United States. Sentiment analysis reveals physical safety incidents receive notably negative framing (autonomous vehicles: −0.077; child safety: −0.326), while policy and generative AI coverage trend positive (+0.586 to +0.633). These findings have direct governance implications. The thematic concentration supports sector-specific regulatory frameworks—mandatory audit trails for hiring algorithms, simulation testing for autonomous vehicles, transparency requirements for recommender systems, accuracy standards for facial recognition, and output labeling for generative AI. Cross-method validation demonstrates which incident categories are robust enough for standardized regulatory classification versus those requiring context-dependent treatment. The rapid emergence of generative AI incidents underscores the need for governance mechanisms responsive to capability advances within months rather than years. Full article
(This article belongs to the Section Computational and Numerical Mathematics)
Show Figures

Figure 1

17 pages, 4146 KB  
Article
Sentiment Analysis of Meme Images Using Deep Neural Network Based on Keypoint Representation
by Endah Asmawati, Ahmad Saikhu and Daniel O. Siahaan
Informatics 2025, 12(4), 118; https://doi.org/10.3390/informatics12040118 - 28 Oct 2025
Viewed by 1427
Abstract
Meme image sentiment analysis is a task of examining public opinion based on meme images posted on social media. In various fields, stakeholders often need to quickly and accurately determine the sentiment of memes from large amounts of available data. Therefore, innovation is [...] Read more.
Meme image sentiment analysis is a task of examining public opinion based on meme images posted on social media. In various fields, stakeholders often need to quickly and accurately determine the sentiment of memes from large amounts of available data. Therefore, innovation is needed in image pre-processing so that an increase in performance metrics, especially accuracy, can be obtained in improving the classification of meme image sentiment. This is because sentiment classification using human face datasets yields higher accuracy than using meme images. This research aims to develop a sentiment analysis model for meme images based on key points. The analyzed meme images contain human faces. The facial features extracted using key points are the eyebrows, eyes, and mouth. In the proposed method, key points of facial features are represented in the form of graphs, specifically directed graphs, weighted graphs, or weighted directed graphs. These graph representations of key points are then used to build a sentiment analysis model based on a Deep Neural Network (DNN) with three layers (hidden layer: i = 64, j = 64, k = 90). There are several contributions of this study, namely developing a human facial sentiment detection model using key points, representing key points as various graphs, and constructing a meme dataset with Indonesian text. The proposed model is evaluated using several metrics, namely accuracy, precision, recall, and F-1 score. Furthermore, a comparative analysis is conducted to evaluate the performance of the proposed model against existing approaches. The experimental results show that the proposed model, which utilized the directed graph representation of key points, obtained the highest accuracy at 83% and F1 score at 81%, respectively. Full article
(This article belongs to the Special Issue Practical Applications of Sentiment Analysis)
Show Figures

Figure 1

28 pages, 2524 KB  
Article
A Multimodal Analysis of Automotive Video Communication Effectiveness: The Impact of Visual Emotion, Spatiotemporal Cues, and Title Sentiment
by Yawei He, Zijie Feng and Wen Liu
Electronics 2025, 14(21), 4200; https://doi.org/10.3390/electronics14214200 - 27 Oct 2025
Viewed by 858
Abstract
To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of [...] Read more.
To quantify the communication effectiveness of automotive online videos, this study constructs a multimodal deep learning framework. Existing research often overlooks the intrinsic and interactive impact of textual and dynamic visual content. To bridge this gap, our framework conducts an integrated analysis of both the textual (titles) and visual (frames) dimensions of videos. For visual analysis, we introduce FER-MA-YOLO, a novel facial expression recognition model tailored to the demands of computational communication research. Enhanced with a Dense Growth Feature Fusion (DGF) module and a multiscale Dilated Attention Module (MDAM), it enables accurate quantification of on-screen emotional dynamics, which is essential for testing our hypotheses on user engagement. For textual analysis, we employ a BERT model to quantify the sentiment intensity of video titles. Applying this framework to 968 videos from the Bilibili platform, our regression analysis—which modeled four distinct engagement dimensions (reach, support, discussion, and interaction) separately, in addition to a composite effectiveness score—reveals several key insights: emotionally charged titles significantly boost user interaction; visually, the on-screen proportion of human elements positively predicts engagement, while excessively high visual information entropy weakens it. Furthermore, neutral expressions boost view counts, and happy expressions drive interaction. This study offers a multimodal computational framework that integrates textual and visual analysis and provides empirical, data-driven insights for optimizing automotive video content strategies, contributing to the growing application of computational methods in communication research. Full article
(This article belongs to the Special Issue Advances in Data-Driven Artificial Intelligence)
Show Figures

Figure 1

23 pages, 1191 KB  
Article
The Power of Interaction: Fan Growth in Livestreaming E-Commerce
by Hangsheng Yang and Bin Wang
J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 203; https://doi.org/10.3390/jtaer20030203 - 6 Aug 2025
Cited by 2 | Viewed by 3711
Abstract
Fan growth serves as a critical performance indicator for the sustainable development of livestreaming e-commerce (LSE). However, existing research has paid limited attention to this topic. This study investigates the unique interactive advantages of LSE over traditional e-commerce by examining how interactivity drives [...] Read more.
Fan growth serves as a critical performance indicator for the sustainable development of livestreaming e-commerce (LSE). However, existing research has paid limited attention to this topic. This study investigates the unique interactive advantages of LSE over traditional e-commerce by examining how interactivity drives fan growth through the mediating role of user retention and the moderating role of anchors’ facial attractiveness. To conduct the analysis, real-time data were collected from 1472 livestreaming sessions on Douyin, China’s leading LSE platform, between January and March 2023, using Python-based (3.12.7) web scraping and third-party data sources. This study operationalizes key variables through text sentiment analysis and image recognition techniques. Empirical analyses are performed using ordinary least squares (OLS) regression with robust standard errors, propensity score matching (PSM), and sensitivity analysis to ensure robustness. The results reveal the following: (1) Interactivity has a significant positive effect on fan growth. (2) User retention partially mediates the relationship between interactivity and fan growth. (3) There is a substitution effect between anchors’ facial attractiveness and interactivity in enhancing user retention, highlighting the substitution relationship between anchors’ personal characteristics and livestreaming room attributes. This research advances the understanding of interactivity’s mechanisms in LSE and, notably, is among the first to explore the marketing implications of anchors’ facial attractiveness in this context. The findings offer valuable insights for both academic research and managerial practice in the evolving livestreaming commerce landscape. Full article
Show Figures

Figure 1

20 pages, 1253 KB  
Article
Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP
by Qamar El Maazouzi and Asmaa Retbi
Computers 2025, 14(8), 314; https://doi.org/10.3390/computers14080314 - 2 Aug 2025
Cited by 2 | Viewed by 2153
Abstract
In distance learning environments, learner engagement directly impacts attention, motivation, and academic performance. Signs of fatigue, negative affect, or critical remarks can warn of growing disengagement and potential dropout. However, most existing approaches rely on a single modality, visual or text-based, without providing [...] Read more.
In distance learning environments, learner engagement directly impacts attention, motivation, and academic performance. Signs of fatigue, negative affect, or critical remarks can warn of growing disengagement and potential dropout. However, most existing approaches rely on a single modality, visual or text-based, without providing a general view of learners’ cognitive and affective states. We propose a multimodal system that integrates three complementary analyzes: (1) a CNN-LSTM model augmented with warning signs such as PERCLOS and yawning frequency for fatigue detection, (2) facial emotion recognition by EmoNet and an LSTM to handle temporal dynamics, and (3) sentiment analysis of feedback by a fine-tuned BERT model. It was evaluated on three public benchmarks: DAiSEE for fatigue, AffectNet for emotion, and MOOC Review (Coursera) for sentiment analysis. The results show a precision of 88.5% for fatigue detection, 70% for emotion detection, and 91.5% for sentiment analysis. Aggregating these cues enables an accurate identification of disengagement periods and triggers individualized pedagogical interventions. These results, although based on independently sourced datasets, demonstrate the feasibility of an integrated approach to detecting disengagement and open the door to emotionally intelligent learning systems with potential for future work in real-time content personalization and adaptive learning assistance. Full article
(This article belongs to the Special Issue Present and Future of E-Learning Technologies (2nd Edition))
Show Figures

Figure 1

10 pages, 26406 KB  
Article
Emojis with a Stable Interpretation Among Individuals in Japan
by Gaku Kutsuzawa, Hiroyuki Umemura, Koichiro Eto and Yoshiyuki Kobayashi
Psychol. Int. 2025, 7(1), 27; https://doi.org/10.3390/psycholint7010027 - 19 Mar 2025
Cited by 1 | Viewed by 2377
Abstract
Emojis are widely used to measure users’ emotional states; however, their interpretations can vary over time. While some emojis exhibit consistent meanings, others may be perceived differently at other times. To utilize emojis as indicators in consumer studies, it is essential to ensure [...] Read more.
Emojis are widely used to measure users’ emotional states; however, their interpretations can vary over time. While some emojis exhibit consistent meanings, others may be perceived differently at other times. To utilize emojis as indicators in consumer studies, it is essential to ensure that their interpretations remain stable over time. However, the long-term stability of emoji interpretations remains uncertain. Therefore, this study aims to identify emojis with stable and unstable interpretations. We collected 256 responses in an online survey twice, one week apart, in which participants rated the valence and arousal levels of 74 facial emojis on a nine-point scale. Wilcoxon rank-sum tests showed unstable interpretations for seven of the seventy-four emojis. Further, a hierarchical cluster analysis categorized 67 stable emojis into the following four clusters based on valence and arousal dimensions: strong positive sentiment, moderately positive sentiment, neutral sentiment, and negative sentiment. Consequently, we recommend the use of the 67 emojis with stable interpretations as reliable measures of emotional states in consumer studies. Full article
Show Figures

Figure 1

28 pages, 3886 KB  
Article
Assessment and Improvement of Avatar-Based Learning System: From Linguistic Structure Alignment to Sentiment-Driven Expressions
by Aru Ukenova, Gulmira Bekmanova, Nazar Zaki, Meiram Kikimbayev and Mamyr Altaibek
Sensors 2025, 25(6), 1921; https://doi.org/10.3390/s25061921 - 19 Mar 2025
Cited by 2 | Viewed by 2412
Abstract
This research investigates the improvement of learning systems that utilize avatars by shifting from elementary language compatibility to emotion-driven interactions. An assessment of various instructional approaches indicated marked differences in overall effectiveness, with the system showing steady but slight improvements and little variation, [...] Read more.
This research investigates the improvement of learning systems that utilize avatars by shifting from elementary language compatibility to emotion-driven interactions. An assessment of various instructional approaches indicated marked differences in overall effectiveness, with the system showing steady but slight improvements and little variation, suggesting it has the potential for consistent use. Analysis through one-way ANOVA identified noteworthy disparities in post-test results across different teaching strategies. However, the pairwise comparisons with Tukey’s HSD did not reveal significant group differences. The group variation and limited sample sizes probably affected statistical strength. Evaluation of effect size demonstrated that the traditional approach had an edge over the avatar-based method, with lessons recorded on video displaying more moderate distinctions. The innovative nature of the system might account for its initial lower effectiveness, as students could need some time to adjust. Participants emphasized the importance of emotional authenticity and cultural adaptation, including incorporating a Kazakh accent, to boost the system’s success. In response, the system was designed with sentiment-driven gestures and facial expressions to improve engagement and personalization. These findings show the potential of emotionally intelligent avatars to encourage more profound learning experiences and the significance of fine-tuning the system for widespread adoption in a modern educational context. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

32 pages, 4102 KB  
Article
A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework
by Anay Ghosh, Saiyed Umer, Bibhas Chandra Dhara and G. G. Md. Nawaz Ali
Sensors 2025, 25(4), 1223; https://doi.org/10.3390/s25041223 - 17 Feb 2025
Cited by 4 | Viewed by 2424
Abstract
This study introduces a multimodal sentiment analysis system to assess and recognize human pain sentiments within an Internet of Things (IoT)-enabled healthcare framework. This system integrates facial expressions and speech-audio recordings to evaluate human pain intensity levels. This integration aims to enhance the [...] Read more.
This study introduces a multimodal sentiment analysis system to assess and recognize human pain sentiments within an Internet of Things (IoT)-enabled healthcare framework. This system integrates facial expressions and speech-audio recordings to evaluate human pain intensity levels. This integration aims to enhance the recognition system’s performance and enable a more accurate assessment of pain intensity. Such a multimodal approach supports improved decision making in real-time patient care, addressing limitations inherent in unimodal systems for measuring pain sentiment. So, the primary contribution of this work lies in developing a multimodal pain sentiment analysis system that integrates the outcomes of image-based and audio-based pain sentiment analysis models. The system implementation contains five key phases. The first phase focuses on detecting the facial region from a video sequence, a crucial step for extracting facial patterns indicative of pain. In the second phase, the system extracts discriminant and divergent features from the facial region using deep learning techniques, utilizing some convolutional neural network (CNN) architectures, which are further refined through transfer learning and fine-tuning of parameters, alongside fusion techniques aimed at optimizing the model’s performance. The third phase performs the speech-audio recording preprocessing; the extraction of significant features is then performed through conventional methods followed by using the deep learning model to generate divergent features to recognize audio-based pain sentiments in the fourth phase. The final phase combines the outcomes from both image-based and audio-based pain sentiment analysis systems, improving the overall performance of the multimodal system. This fusion enables the system to accurately predict pain levels, including ‘high pain’, ‘mild pain’, and ‘no pain’. The performance of the proposed system is tested with the three image-based databases such as a 2D Face Set Database with Pain Expression, the UNBC-McMaster database (based on shoulder pain), and the BioVid database (based on heat pain), along with the VIVAE database for the audio-based dataset. Extensive experiments were performed using these datasets. Finally, the proposed system achieved accuracies of 76.23%, 84.27%, and 38.04% for two, three, and five pain classes, respectively, on the 2D Face Set Database with Pain Expression, UNBC, and BioVid datasets. The VIVAE audio-based system recorded a peak performance of 97.56% and 98.32% accuracy for varying training–testing protocols. These performances were compared with some state-of-the-art methods that show the superiority of the proposed system. By combining the outputs of both deep learning frameworks on image and audio datasets, the proposed multimodal pain sentiment analysis system achieves accuracies of 99.31% for the two-class, 99.54% for the three-class, and 87.41% for the five-class pain problems. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

15 pages, 4304 KB  
Article
Face and Voice Recognition-Based Emotion Analysis System (EAS) to Minimize Heterogeneity in the Metaverse
by Surak Son and Yina Jeong
Appl. Sci. 2025, 15(2), 845; https://doi.org/10.3390/app15020845 - 16 Jan 2025
Viewed by 4655
Abstract
The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes [...] Read more.
The metaverse, where users interact through avatars, is evolving to closely mirror the real world, requiring realistic object responses based on users’ emotions. While technologies like eye-tracking and hand-tracking transfer physical movements into virtual spaces, accurate emotion detection remains challenging. This study proposes the “Face and Voice Recognition-based Emotion Analysis System (EAS)” to bridge this gap, assessing emotions through both voice and facial expressions. EAS utilizes a microphone and camera to gauge emotional states, combining these inputs for a comprehensive analysis. It comprises three neural networks: the Facial Emotion Analysis Model (FEAM), which classifies emotions using facial landmarks; the Voice Sentiment Analysis Model (VSAM), which detects vocal emotions even in noisy environments using MCycleGAN; and the Metaverse Emotion Recognition Model (MERM), which integrates FEAM and VSAM outputs to infer overall emotional states. EAS’s three primary modules—Facial Emotion Recognition, Voice Emotion Recognition, and User Emotion Analysis—analyze facial features and vocal tones to detect emotions, providing a holistic emotional assessment for realistic interactions in the metaverse. The system’s performance is validated through dataset testing, and future directions are suggested based on simulation outcomes. Full article
Show Figures

Figure 1

27 pages, 3711 KB  
Article
An IoT Framework for Assessing the Correlation Between Sentiment-Analyzed Texts and Facial Emotional Expressions
by Sebastian-Ioan Petruc, Razvan Bogdan, Marian-Emanuel Ionascu, Sergiu Nimara and Marius Marcu
Electronics 2025, 14(1), 118; https://doi.org/10.3390/electronics14010118 - 30 Dec 2024
Cited by 1 | Viewed by 1205
Abstract
Emotion monitoring technologies leveraging detection of facial expressions have gained important attention in psychological and social research due to their ability of providing objective emotional measurements. However, this paper addresses a gap in the literature consisting of the correlation between emotional facial response [...] Read more.
Emotion monitoring technologies leveraging detection of facial expressions have gained important attention in psychological and social research due to their ability of providing objective emotional measurements. However, this paper addresses a gap in the literature consisting of the correlation between emotional facial response and sentiment analysis of written texts, developing a system capable of recognizing real-time emotional responses. The system uses a Raspberry Pi 4 and a Pi Camera module in order to perform real-time video capturing and facial expression analysis with the DeepFace version 0.0.80 model, while sentiment analysis of texts was performed utilizing Afinn version 0.1.0. User secure authentication and real-time database were implemented with Firebase. Although suitable for assessing psycho-emotional health in test takers, the system also provides valuable insights into the strong compatibility of the sentiment analysis performed on texts and the monitored facial emotional response, computing for each testing session the “compatibility” parameter. The framework provides an example of a new methodology for performing comparisons between different machine learning models, contributing to the enhancement of machine learning models’ efficiency and accuracy. Full article
(This article belongs to the Special Issue Artificial Intelligence in Vision Modelling)
Show Figures

Figure 1

21 pages, 894 KB  
Article
Emotion Recognition on Call Center Voice Data
by Yüksel Yurtay, Hüseyin Demirci, Hüseyin Tiryaki and Tekin Altun
Appl. Sci. 2024, 14(20), 9458; https://doi.org/10.3390/app14209458 - 16 Oct 2024
Cited by 7 | Viewed by 6111
Abstract
Emotion recognition is a crucial aspect of human–computer interaction, particularly in the field of marketing and advertising. Call centers play a vital role in generating positive client experiences and maintaining relationships. As individuals increasingly rely on computers for daily tasks, there is a [...] Read more.
Emotion recognition is a crucial aspect of human–computer interaction, particularly in the field of marketing and advertising. Call centers play a vital role in generating positive client experiences and maintaining relationships. As individuals increasingly rely on computers for daily tasks, there is a growing need to improve human–computer interactions. Research has been conducted on emotion recognition, in three main areas: facial expression-based, voice-based, and text-based. This study focuses on emotion recognition on incoming customer calls to call centers, which plays a vital role in customer experience and company satisfaction. The study uses real-life customer data provided by Turkish Mobile Operators to analyze the customer’s emotional state and inform call center employees about the emotional state. The model created in this research is a significant milestone for sentiment analysis in the Turkish language, demonstrating the ability to acquire fundamental patterns and categorize emotional expressions. The objective is to analyze the emotional condition of individuals using audio data received from phone calls, focusing on identifying good, negative, and neutral emotional states. Deep learning techniques are employed to analyze the results, with an accuracy value of 0.91, which is acceptable for our partner the “Turkcell Global Bilgi Pazarlama Danışmanlık ve Çağrı Servisi Hizmetleri” Incorporation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 1893 KB  
Article
A Study of a Drawing Exactness Assessment Method Using Localized Normalized Cross-Correlations in a Portrait Drawing Learning Assistant System
by Yue Zhang, Zitong Kong, Nobuo Funabiki and Chen-Chien Hsu
Computers 2024, 13(9), 215; https://doi.org/10.3390/computers13090215 - 23 Aug 2024
Cited by 3 | Viewed by 1962
Abstract
Nowadays, portrait drawing has gained significance in cultivating painting skills and human sentiments. In practice, novices often struggle with this art form without proper guidance from professionals, since they lack understanding of the proportions and structures of facial features. To solve this limitation, [...] Read more.
Nowadays, portrait drawing has gained significance in cultivating painting skills and human sentiments. In practice, novices often struggle with this art form without proper guidance from professionals, since they lack understanding of the proportions and structures of facial features. To solve this limitation, we have developed a Portrait Drawing Learning Assistant System (PDLAS) to assist novices in learning portrait drawing. The PDLAS provides auxiliary lines as references for facial features that are extracted by applying OpenPose and OpenCV libraries to a face photo image of the target. A learner can draw a portrait on an iPad using drawing software where the auxiliary lines appear on a different layer to the portrait. However, in the current implementation, the PDLAS does not offer a function to assess the exactness of the drawing result for feedback to the learner. In this paper, we present a drawing exactness assessment method using a Localized Normalized Cross-Correlation (NCC) algorithm in the PDLAS. NCC gives a similarity score between the original face photo and drawing result images by calculating the correlation of the brightness distributions. For precise feedback, the method calculates the NCC for each face component by extracting the bounding box. In addition, in this paper, we improve the auxiliary lines for the nose. For evaluations, we asked students at Okayama University, Japan, to draw portraits using the PDLAS, and applied the proposed method to their drawing results, where the application results validated the effectiveness by suggesting improvements in drawing components. The system usability was also confirmed through a questionnaire with a SUS score. The main finding of this research is that the implementation of the NCC algorithm within the PDLAS significantly enhances the accuracy of novice portrait drawings by providing detailed feedback on specific facial features, proving the system’s efficacy in art education and training. Full article
(This article belongs to the Special Issue Smart Learning Environments)
Show Figures

Figure 1

22 pages, 9193 KB  
Article
RS-Xception: A Lightweight Network for Facial Expression Recognition
by Liefa Liao, Shouluan Wu, Chao Song and Jianglong Fu
Electronics 2024, 13(16), 3217; https://doi.org/10.3390/electronics13163217 - 14 Aug 2024
Cited by 12 | Viewed by 3781
Abstract
Facial expression recognition (FER) utilizes artificial intelligence for the detection and analysis of human faces, with significant applications across various scenarios. Our objective is to deploy the facial emotion recognition network on mobile devices and extend its application to diverse areas, including classroom [...] Read more.
Facial expression recognition (FER) utilizes artificial intelligence for the detection and analysis of human faces, with significant applications across various scenarios. Our objective is to deploy the facial emotion recognition network on mobile devices and extend its application to diverse areas, including classroom effect monitoring, human–computer interaction, specialized training for athletes (such as in figure skating and rhythmic gymnastics), and actor emotion training. Recent studies have employed advanced deep learning models to address this task, though these models often encounter challenges like subpar performance and an excessive number of parameters that do not align with the requirements of FER for embedded devices. To tackle this issue, we have devised a lightweight network structure named RS-Xception, which is straightforward yet highly effective. Drawing on the strengths of ResNet and SENet, this network integrates elements from the Xception architecture. Our models have been trained on FER2013 datasets and demonstrate superior efficiency compared to conventional network models. Furthermore, we have assessed the model’s performance on the CK+, FER2013, and Bigfer2013 datasets, achieving accuracy rates of 97.13%, 69.02%, and 72.06%, respectively. Evaluation on the complex RAF-DB dataset yielded an accuracy rate of 82.98%. The incorporation of transfer learning notably enhanced the model’s accuracy, with a performance of 75.38% on the Bigfer2013 dataset, underscoring its significance in our research. In conclusion, our proposed model proves to be a viable solution for precise sentiment detection and estimation. In the future, our lightweight model may be deployed on embedded devices for research purposes. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

20 pages, 5755 KB  
Article
Evaluation of Perceptions Using Facial Expression Scores on Ecological Service Value of Blue and Green Spaces in 61 Parks in Guizhou
by Lan Wang and Changwei Zhou
Sustainability 2024, 16(10), 4108; https://doi.org/10.3390/su16104108 - 14 May 2024
Cited by 3 | Viewed by 1615
Abstract
This study selected 61 parks in Guizhou province as research points and collected 3282 facial expression photos of park visitors in 2021 on the Sina Weibo platform. FireFACE v1.0 software was used to analyze the facial expressions of the visitors and evaluate their [...] Read more.
This study selected 61 parks in Guizhou province as research points and collected 3282 facial expression photos of park visitors in 2021 on the Sina Weibo platform. FireFACE v1.0 software was used to analyze the facial expressions of the visitors and evaluate their emotional perception of the landscape structure and ecosystem service value (ESV) of different landscape types of blue–green spaces. Research shows that the average ESV of green spaces in parks is USD 6.452 million per year, while the average ESV of blue spaces is USD 3.4816 million per year. The ESV of the blue–green space in the park shows no geographical gradient changes, while the happiness score in facial expressions is negatively correlated with latitude. Compared to blue spaces, green spaces can better awaken positive emotions among visitors. The ESV performance of different types of green spaces is as follows: TheroponcedrymV > GrasslandV > Shrubland V. The landscape structure and ESV of the blue–green space in the park can be perceived by visitors, and GreenV and vegetation height are considered the main driving factors for awakening positive emotions among visitors. In Guizhou, when the park area decreases, people are more likely to experience sadness. Regressions indicated that by increasing the green space area of the park and strengthening the hydrological regulation function of the blue–green space, people can achieve a more peaceful mood. Overall, people perceive more positive sentiments with high ESV in blue–green spaces of Karst parks but low ESV in shrubland. Full article
(This article belongs to the Section Health, Well-Being and Sustainability)
Show Figures

Figure 1

23 pages, 18654 KB  
Article
A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism
by Keyuan Qiu, Yingjie Zhang, Jiaxu Zhao, Shun Zhang, Qian Wang and Feng Chen
Electronics 2024, 13(10), 1922; https://doi.org/10.3390/electronics13101922 - 14 May 2024
Cited by 17 | Viewed by 4736
Abstract
The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous [...] Read more.
The objective of multimodal sentiment analysis is to extract and integrate feature information from text, image, and audio data accurately, in order to identify the emotional state of the speaker. While multimodal fusion schemes have made some progress in this research field, previous studies still lack adequate approaches for handling inter-modal information consistency and the fusion of different categorical features within a single modality. This study aims to effectively extract sentiment coherence information among video, audio, and text and consequently proposes a multimodal sentiment analysis method named joint chain interactive attention (VAE-JCIA, Video Audio Essay–Joint Chain Interactive Attention). In this approach, a 3D CNN is employed for extracting facial features from video, a Conformer is employed for extracting audio features, and a Funnel-Transformer is employed for extracting text features. Furthermore, the joint attention mechanism is utilized to identify key regions where sentiment information remains consistent across video, audio, and text. This process acquires reinforcing features that encapsulate information regarding consistency among the other two modalities. Inter-modal feature interactions are addressed through chained interactive attention, and multimodal feature fusion is employed to efficiently perform emotion classification. The method is experimentally validated on the CMU-MOSEI dataset and the IEMOCAP dataset. The experimental results demonstrate that the proposed method significantly enhances the performance of the multimodal sentiment analysis model. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop