This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication
by
Yun Wang
Yun Wang 1
,
Liege Cheung
Liege Cheung 2,
Patrick Ma
Patrick Ma 3,
Herbert Lee
Herbert Lee 4
and
Adela S.M. Lau
Adela S.M. Lau 1,*
1
Data Science Lab, Department of Statistics and Actuarial Science, School of Computing and Data Science, The University of Hong Kong, Pok Fu Lam, Hong Kong
2
College of Liberal Arts & Sciences, University of Illinois Urbana-Champaign, 601 E John St, Champaign, IL 61820, USA
3
Marvel Digital Ai Limited, 6/F, 16E, 16 Science Park East Avenue, Hong Kong Science Park, Shatin, N.T., Hong Kong
4
Xtreme Business Enterprises Limited, 6/F, 16E, 16 Science Park East Avenue, Hong Kong Science Park, Shatin, N.T., Hong Kong
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(21), 4304; https://doi.org/10.3390/electronics14214304 (registering DOI)
Submission received: 16 September 2025
/
Revised: 27 October 2025
/
Accepted: 27 October 2025
/
Published: 31 October 2025
Abstract
One of the key limitations of AI chatbots is the lack of human-like nonverbal communication. Although there are many research studies on video or audio emotion recognition for detecting human emotions, there is no research that combines video, audio, and ontology methods to develop an AI chatbot with human-like communication. Therefore, this research aims to develop an audio-video emotion recognition model and an emotion-ontology-based chatbot engine to improve human-like communication with emotion detection. This research proposed a novel model of cluster-based audiovisual emotion recognition for improving emotion detection with both video and audio signals and compared it with existing methods using video or audio signals only. Twenty-two audio features, the Mel spectrogram, and facial action units were extracted, and the last two were fed into a cluster-based independent transformer to learn long-term temporal dependencies. Our model was validated on three public audiovisual datasets: RAVDESS, SAVEE, and RML. The results demonstrated that the accuracy scores of the clustered transformer model for RAVDESS, SAVEE, and RML were 86.46%, 92.71%, and 91.67%, respectively, outperforming the existing best model with accuracy scores of 86.3%, 75%, and 60.2%, respectively. An emotion-ontology-based chatbot engine was implemented to make inquiry responses based on the detected emotion. A case study of the HKU Campusland metaverse was used as proof of concept of the emotional AI chatbot for nonverbal communication.
Share and Cite
MDPI and ACS Style
Wang, Y.; Cheung, L.; Ma, P.; Lee, H.; Lau, A.S.M.
An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication. Electronics 2025, 14, 4304.
https://doi.org/10.3390/electronics14214304
AMA Style
Wang Y, Cheung L, Ma P, Lee H, Lau ASM.
An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication. Electronics. 2025; 14(21):4304.
https://doi.org/10.3390/electronics14214304
Chicago/Turabian Style
Wang, Yun, Liege Cheung, Patrick Ma, Herbert Lee, and Adela S.M. Lau.
2025. "An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication" Electronics 14, no. 21: 4304.
https://doi.org/10.3390/electronics14214304
APA Style
Wang, Y., Cheung, L., Ma, P., Lee, H., & Lau, A. S. M.
(2025). An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication. Electronics, 14(21), 4304.
https://doi.org/10.3390/electronics14214304
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.