MDPI - Publisher of Open Access Journals

23 pages, 10088 KB

Open AccessArticle

Development of an Interactive Digital Human with Context-Sensitive Facial Expressions

by Fan Yang, Lei Fang, Rui Suo, Jing Zhang and Mincheol Whang

Sensors 2025, 25(16), 5117; https://doi.org/10.3390/s25165117 - 18 Aug 2025

Viewed by 466

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression [...] Read more.

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression generation. The system establishes a complete pipeline for real-time interaction and compound emotional expression, following a sequence of “speech semantic parsing—multimodal emotion recognition—Action Unit (AU)-level 3D facial expression control.” First, a ResNet18-based model is employed for robust emotion classification using the AffectNet dataset. Then, an AU motion curve driving module is constructed on the Unreal Engine platform, where dynamic synthesis of basic emotions is achieved via a state-machine mechanism. Finally, Generative Pre-trained Transformer (GPT) is utilized for semantic analysis, generating structured emotional weight vectors that are mapped to the AU layer to enable language-driven facial responses. Experimental results demonstrate that the proposed system significantly improves facial animation quality, with naturalness increasing from 3.54 to 3.94 and semantic congruence from 3.44 to 3.80. These results validate the system’s capability to generate realistic and emotionally coherent expressions in real time. This research provides a complete technical framework and practical foundation for high-fidelity digital humans with affective interaction capabilities. Full article

(This article belongs to the Special Issue Emotion Recognition Based on Sensors (3rd Edition))

► Show Figures

Figure 1

25 pages, 1734 KB

Open AccessArticle

A Multimodal Affective Interaction Architecture Integrating BERT-Based Semantic Understanding and VITS-Based Emotional Speech Synthesis

by Yanhong Yuan, Shuangsheng Duo, Xuming Tong and Yapeng Wang

Algorithms 2025, 18(8), 513; https://doi.org/10.3390/a18080513 - 14 Aug 2025

Viewed by 527

Abstract

Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the [...] Read more.

Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the naturalness, expressiveness, and response efficiency of human–computer emotional interaction. By introducing a modular layered design, a six-dimensional emotional space, a gated attention mechanism, and a dynamic model scheduling strategy, the system overcomes challenges such as limited emotional representation, modality misalignment, and high-latency responses. Experimental results demonstrate that the framework achieves superior performance in speech synthesis quality (MOS: 4.35), emotion recognition accuracy (91.6%), and response latency (<1.2 s), outperforming baseline models like Tacotron2 and FastSpeech2. Through model lightweighting, GPU parallel inference, and load balancing optimization, the system validates its robustness and generalizability across English and Chinese corpora in cross-linguistic tests. The modular architecture and dynamic scheduling ensure scalability and efficiency, enabling a more humanized and immersive interaction experience in typical application scenarios such as psychological companionship, intelligent education, and high-concurrency customer service. This study provides an effective technical pathway for developing the next generation of personalized and immersive affective intelligent interaction systems. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

22 pages, 3079 KB

Open AccessArticle

ECE-TTS: A Zero-Shot Emotion Text-to-Speech Model with Simplified and Precise Control

by Shixiong Liang, Ruohua Zhou and Qingsheng Yuan

Appl. Sci. 2025, 15(9), 5108; https://doi.org/10.3390/app15095108 - 4 May 2025

Viewed by 2843

Abstract

Significant advances have been made in emotional speech synthesis technology; however, existing models still face challenges in achieving fine-grained emotion style control and simple yet precise emotion intensity regulation. To address these issues, we propose Easy-Control Emotion Text-to-Speech (ECE-TTS), a zero-shot TTS model [...] Read more.

Significant advances have been made in emotional speech synthesis technology; however, existing models still face challenges in achieving fine-grained emotion style control and simple yet precise emotion intensity regulation. To address these issues, we propose Easy-Control Emotion Text-to-Speech (ECE-TTS), a zero-shot TTS model built upon the F5-TTS architecture, simplifying emotion modeling while maintaining accurate control. ECE-TTS leverages pretrained emotion recognizers to extract Valence, Arousal, and Dominance (VAD) values, transforming them into Emotion-Adaptive Spherical Vectors (EASV) for precise emotion style representation. Emotion intensity modulation is efficiently realized via simple arithmetic operations on emotion vectors without introducing additional complex modules or training extra regression networks. Emotion style control experiments demonstrate that ECE-TTS achieves a Word Error Rate (WER) of 13.91%, an Aro-Val-Domin SIM of 0.679, and an Emo SIM of 0.594, surpassing GenerSpeech (WER = 16.34%, Aro-Val-Domin SIM = 0.627, Emo SIM = 0.563) and EmoSphere++ (WER = 15.08%, Aro-Val-Domin SIM = 0.656, Emo SIM = 0.578). Subjective Mean Opinion Score (MOS) evaluations (1–5 scale) further confirm improvements in speaker similarity (3.93), naturalness (3.98), and emotional expressiveness (3.94). Additionally, emotion intensity control experiments demonstrate smooth and precise modulation across varying emotional strengths. These results validate ECE-TTS as a highly effective and practical solution for high-quality, emotion-controllable speech synthesis. Full article

► Show Figures

Figure 1

17 pages, 3457 KB

Open AccessArticle

Multimodal Information Fusion and Data Generation for Evaluation of Second Language Emotional Expression

by Jun Yang, Liyan Wang, Yong Qi, Haifeng Chen and Jian Li

Appl. Sci. 2024, 14(19), 9121; https://doi.org/10.3390/app14199121 - 9 Oct 2024

Viewed by 1670

Abstract

This study aims to develop an emotion evaluation method for second language learners, utilizing multimodal information to comprehensively evaluate students’ emotional expressions. Addressing the limitations of existing emotion evaluation methods, which primarily focus on the acoustic features of speech (e.g., pronunciation, frequency, and [...] Read more.

This study aims to develop an emotion evaluation method for second language learners, utilizing multimodal information to comprehensively evaluate students’ emotional expressions. Addressing the limitations of existing emotion evaluation methods, which primarily focus on the acoustic features of speech (e.g., pronunciation, frequency, and rhythm) and often neglect the emotional expressions conveyed through voice and facial videos, this paper proposes an emotion evaluation method based on multimodal information. The method includes the following three main parts: (1) generating virtual data using a Large Language Model (LLM) and audio-driven facial video synthesis, as well as integrating the IEMOCAP dataset with self-recorded student videos and audios containing teacher ratings to construct a multimodal emotion evaluation dataset; (2) a graph convolution-based emotion feature encoding network to extract emotion features from multimodal information; and (3) an emotion evaluation network based on Kolmogorov–Arnold Networks (KAN) to compare students’ emotion features with standard synthetic data for precise evaluation. The emotion recognition method achieves an unweighted accuracy (UA) of 68.02% and an F1 score of 67.11% in experiments with the IEMOCAP dataset and TTS data. The emotion evaluation model, using the KAN network, outperforms the MLP network, with a mean squared error (MSE) of 0.811 compared to 0.943, providing a reliable tool for evaluating language learners’ emotional expressions. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 1486 KB

Open AccessArticle

Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data

by Jialin Zhang, Mairidan Wushouer, Gulanbaier Tuerhong and Hanfang Wang

Appl. Sci. 2023, 13(9), 5724; https://doi.org/10.3390/app13095724 - 6 May 2023

Cited by 4 | Viewed by 3444

Abstract

Emotional speech synthesis is an important branch of human–computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on deep learning, the research of affective speech synthesis [...] Read more.

Emotional speech synthesis is an important branch of human–computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on deep learning, the research of affective speech synthesis has gradually attracted the attention of scholars. However, due to the lack of quality emotional speech synthesis corpus, emotional speech synthesis research under low-resource conditions is prone to overfitting, exposure error, catastrophic forgetting and other problems leading to unsatisfactory generated speech results. In this paper, we proposed an emotional speech synthesis method that integrates migration learning, semi-supervised training and robust attention mechanism to achieve better adaptation to the emotional style of the speech data during fine-tuning. By adopting an appropriate fine-tuning strategy, trade-off parameter configuration and pseudo-labels in the form of loss functions, we efficiently guided the learning of the regularized synthesis of emotional speech. The proposed SMAL-ET2 method outperforms the baseline methods in both subjective and objective evaluations. It is demonstrated that our training strategy with stepwise monotonic attention and semi-supervised loss method can alleviate the overfitting phenomenon and improve the generalization ability of the text-to-speech model. Our method can also enable the model to successfully synthesize different categories of emotional speech with better naturalness and emotion similarity. Full article

► Show Figures

Figure 1

24 pages, 5952 KB

Open AccessArticle

Design of Digital-Twin Human-Machine Interface Sensor with Intelligent Finger Gesture Recognition

by Dong-Han Mo, Chuen-Lin Tien, Yu-Ling Yeh, Yi-Ru Guo, Chern-Sheng Lin, Chih-Chin Chen and Che-Ming Chang

Sensors 2023, 23(7), 3509; https://doi.org/10.3390/s23073509 - 27 Mar 2023

Cited by 12 | Viewed by 7534

Abstract

In this study, the design of a Digital-twin human-machine interface sensor (DT-HMIS) is proposed. This is a digital-twin sensor (DT-Sensor) that can meet the demands of human-machine automation collaboration in Industry 5.0. The DT-HMIS allows users/patients to add, modify, delete, query, and restore [...] Read more.

In this study, the design of a Digital-twin human-machine interface sensor (DT-HMIS) is proposed. This is a digital-twin sensor (DT-Sensor) that can meet the demands of human-machine automation collaboration in Industry 5.0. The DT-HMIS allows users/patients to add, modify, delete, query, and restore their previously memorized DT finger gesture mapping model and programmable logic controller (PLC) logic program, enabling the operation or access of the programmable controller input-output (I/O) interface and achieving the extended limb collaboration capability of users/patients. The system has two main functions: the first is gesture-encoded virtual manipulation, which indirectly accesses the PLC through the DT mapping model to complete control of electronic peripherals for extension-limbs ability by executing logic control program instructions. The second is gesture-based virtual manipulation to help non-verbal individuals create special verbal sentences through gesture commands to improve their expression ability. The design method uses primitive image processing and eight-way dual-bit signal processing algorithms to capture the movement of human finger gestures and convert them into digital signals. The system service maps control instructions by observing the digital signals of the DT-HMIS and drives motion control through mechatronics integration or speech synthesis feedback to express the operation requirements of inconvenient work or complex handheld physical tools. Based on the human-machine interface sensor of DT computer vision, it can reflect the user’s command status without the need for additional wearable devices and promote interaction with the virtual world. When used for patients, the system ensures that the user’s virtual control is mapped to physical device control, providing the convenience of independent operation while reducing caregiver fatigue. This study shows that the recognition accuracy can reach 99%, demonstrating practicality and application prospects. In future applications, users/patients can interact virtually with other peripheral devices through the DT-HMIS to meet their own interaction needs and promote industry progress. Full article

(This article belongs to the Special Issue Computer Vision and Smart Sensors for Human-Computer Interaction)

► Show Figures

Figure 1

12 pages, 2233 KB

Open AccessArticle

An Emotion Speech Synthesis Method Based on VITS

by Wei Zhao and Zheng Yang

Appl. Sci. 2023, 13(4), 2225; https://doi.org/10.3390/app13042225 - 9 Feb 2023

Cited by 15 | Viewed by 11092

Abstract

People and things can be connected through the Internet of Things (IoT), and speech synthesis is one of the key technologies. At this stage, end-to-end speech synthesis systems are capable of synthesizing relatively realistic human voices, but the current commonly used parallel text-to-speech [...] Read more.

People and things can be connected through the Internet of Things (IoT), and speech synthesis is one of the key technologies. At this stage, end-to-end speech synthesis systems are capable of synthesizing relatively realistic human voices, but the current commonly used parallel text-to-speech suffers from loss of useful information during the two-stage delivery process, and the control features of the synthesized speech are monotonous, with insufficient expression of features, including emotion, leading to emotional speech synthesis becoming a challenging task. In this paper, we propose a new system named Emo-VITS, which is based on the highly expressive speech synthesis module VITS, to realize the emotion control of text-to-speech synthesis. We designed the emotion network to extract the global and local features of the reference audio, and then fused the global and local features through the emotion feature fusion module based on the attention mechanism, so as to achieve more accurate and comprehensive emotion speech synthesis. The experimental results show that the Emo-VITS system’s error rate went up a little bit compared with the network without emotionality and does not affect the semantic understanding. However, this system is superior to other networks in naturalness, sound quality, and emotional similarity. Full article

(This article belongs to the Special Issue Advances in Architecture, Protocols and Challenges in Internet of Things)

► Show Figures

Figure 1

16 pages, 2854 KB

Open AccessArticle

ZSE-VITS: A Zero-Shot Expressive Voice Cloning Method Based on VITS

by Jiaxin Li and Lianhai Zhang

Electronics 2023, 12(4), 820; https://doi.org/10.3390/electronics12040820 - 6 Feb 2023

Cited by 6 | Viewed by 11177

Abstract

Voice cloning aims to synthesize the voice with a new speaker’s timbre from a small amount of the new speaker’s speech. Current voice cloning methods, which focus on modeling speaker timbre, can synthesize speech with similar speaker timbres. However, the prosody of these [...] Read more.

Voice cloning aims to synthesize the voice with a new speaker’s timbre from a small amount of the new speaker’s speech. Current voice cloning methods, which focus on modeling speaker timbre, can synthesize speech with similar speaker timbres. However, the prosody of these methods is flat, lacking expressiveness and the ability to control the expressiveness of cloned speech. To solve this problem, we propose a novel method ZSE-VITS (zero-shot expressive VITS) based on the end-to-end speech synthesis model VITS. Specifically, we use VITS as the backbone network and add the speaker recognition model TitaNet as the speaker encoder to realize zero-shot voice cloning. We use explicit prosody information to avoid effects from the speaker information and adjust speech prosody using the prosody information prediction and prosody fusion methods directly. We widen the pitch distribution of the train datasets using pitch augmentation to improve the generalization ability of the prosody model, and we fine-tune the prosody predictor alone in the emotion corpus to learn prosody prediction of various styles. The objective and subjective evaluations of the open datasets show that our method can generate more expressive speech and adjust prosody information artificially without affecting the similarity of speaker timbre. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 2162 KB

Open AccessArticle

Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck

by Jennifer M. Vojtech, Claire L. Mitchell, Laura Raiff, Joshua C. Kline and Gianluca De Luca

Vibration 2022, 5(4), 692-710; https://doi.org/10.3390/vibration5040041 - 13 Oct 2022

Cited by 4 | Viewed by 3789

Abstract

Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was [...] Read more.

Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development. Full article

(This article belongs to the Special Issue Feature Papers in Vibration)

► Show Figures

Figure 1

14 pages, 1207 KB

Open AccessArticle

Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels

by Marc Freixes, Joan Claudi Socoró and Francesc Alías

Appl. Sci. 2022, 12(4), 2055; https://doi.org/10.3390/app12042055 - 16 Feb 2022

Cited by 1 | Viewed by 2196

Abstract

The source-filter model is one of the main techniques applied to speech analysis and synthesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve [...] Read more.

The source-filter model is one of the main techniques applied to speech analysis and synthesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve the expressiveness of 3D-generated voices, they are still far from achieving realistic results. Towards this goal, this work analyses the contribution of both the the vocal tract (VT) and the glottal source spectral (GSS) cues in the generation of happy and aggressive speech through a GlottDNN-based analysis-by-synthesis methodology. Paired neutral expressive utterances are parameterised to generate different combinations of expressive vowels, applying the target expressive GSS and/or VT cues on the neutral vowels after transplanting the expressive prosody on these utterances. The conducted objective tests focused on Spanish [a], [i] and [u] vowels show that both GSS and VT cues significantly reduce the spectral distance to the expressive target. The results from the perceptual test show that VT cues make a statistically significant contribution in the expression of happy and aggressive emotions for [a] vowels, while the GSS contribution is significant in [i] and [u] vowels. Full article

(This article belongs to the Special Issue IberSPEECH 2020: Speech and Language Technologies for Iberian Languages)

► Show Figures

Figure 1

27 pages, 5643 KB

Open AccessArticle

Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People

by Jinqiang Bai, Zhaoxiang Liu, Yimin Lin, Ye Li, Shiguo Lian and Dijun Liu

Electronics 2019, 8(6), 697; https://doi.org/10.3390/electronics8060697 - 20 Jun 2019

Cited by 81 | Viewed by 9810

Abstract

Assistive devices for visually impaired people (VIP) which support daily traveling and improve social inclusion are developing fast. Most of them try to solve the problem of navigation or obstacle avoidance, and other works focus on helping VIP to recognize their surrounding objects. [...] Read more.

Assistive devices for visually impaired people (VIP) which support daily traveling and improve social inclusion are developing fast. Most of them try to solve the problem of navigation or obstacle avoidance, and other works focus on helping VIP to recognize their surrounding objects. However, very few of them couple both capabilities (i.e., navigation and recognition). Aiming at the above needs, this paper presents a wearable assistive device that allows VIP to (i) navigate safely and quickly in unfamiliar environment, and (ii) to recognize the objects in both indoor and outdoor environments. The device consists of a consumer Red, Green, Blue and Depth (RGB-D) camera and an Inertial Measurement Unit (IMU), which are mounted on a pair of eyeglasses, and a smartphone. The device leverages the ground height continuity among adjacent image frames to segment the ground accurately and rapidly, and then search the moving direction according to the ground. A lightweight Convolutional Neural Network (CNN)-based object recognition system is developed and deployed on the smartphone to increase the perception ability of VIP and promote the navigation system. It can provide the semantic information of surroundings, such as the categories, locations, and orientations of objects. Human–machine interaction is performed through audio module (a beeping sound for obstacle alert, speech recognition for understanding the user commands, and speech synthesis for expressing semantic information of surroundings). We evaluated the performance of the proposed system through many experiments conducted in both indoor and outdoor scenarios, demonstrating the efficiency and safety of the proposed assistive system. Full article

(This article belongs to the Special Issue Wearable Electronic Devices)

► Show Figures

Figure 1

13 pages, 12431 KB

Open AccessArticle

Es-Tacotron2: Multi-Task Tacotron 2 with Pre-Trained Estimated Network for Reducing the Over-Smoothness Problem

by Yifan Liu and Jin Zheng

Information 2019, 10(4), 131; https://doi.org/10.3390/info10040131 - 9 Apr 2019

Cited by 10 | Viewed by 10811

Abstract

Text-to-speech synthesis is a computational technique for producing synthetic, human-like speech by a computer. In recent years, speech synthesis techniques have developed, and have been employed in many applications, such as automatic translation applications and car navigation systems. End-to-end text-to-speech synthesis has gained [...] Read more.

Text-to-speech synthesis is a computational technique for producing synthetic, human-like speech by a computer. In recent years, speech synthesis techniques have developed, and have been employed in many applications, such as automatic translation applications and car navigation systems. End-to-end text-to-speech synthesis has gained considerable research interest, because compared to traditional models the end-to-end model is easier to design and more robust. Tacotron 2 is an integrated state-of-the-art end-to-end speech synthesis system that can directly predict closed-to-natural human speech from raw text. However, there remains a gap between synthesized speech and natural speech. Suffering from an over-smoothness problem, Tacotron 2 produced ’averaged’ speech, making the synthesized speech sounds unnatural and inflexible. In this work, we first propose an estimated network (Es-Network), which captures general features from a raw mel spectrogram in an unsupervised manner. Then, we design Es-Tacotron2 by employing the Es-Network to calculate the estimated mel spectrogram residual, and setting it as an additional prediction task of Tacotron 2, to allow the model focus more on predicting the individual features of mel spectrogram. The experience shows that compared to the original Tacotron 2 model, Es-Tacotron2 can produce more variable decoder output and synthesize more natural and expressive speech. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Graphical abstract

27 pages, 12685 KB

Open AccessArticle

Muecas: A Multi-Sensor Robotic Head for Affective Human Robot Interaction and Imitation

by Felipe Cid, Jose Moreno, Pablo Bustos and Pedro Núñez

Sensors 2014, 14(5), 7711-7737; https://doi.org/10.3390/s140507711 - 28 Apr 2014

Cited by 50 | Viewed by 17386

Abstract

This paper presents a multi-sensor humanoid robotic head for human robot interaction. The design of the robotic head, Muecas, is based on ongoing research on the mechanisms of perception and imitation of human expressions and emotions. These mechanisms allow direct interaction between the [...] Read more.

This paper presents a multi-sensor humanoid robotic head for human robot interaction. The design of the robotic head, Muecas, is based on ongoing research on the mechanisms of perception and imitation of human expressions and emotions. These mechanisms allow direct interaction between the robot and its human companion through the different natural language modalities: speech, body language and facial expressions. The robotic head has 12 degrees of freedom, in a human-like configuration, including eyes, eyebrows, mouth and neck, and has been designed and built entirely by IADeX (Engineering, Automation and Design of Extremadura) and RoboLab. A detailed description of its kinematics is provided along with the design of the most complex controllers. Muecas can be directly controlled by FACS (Facial Action Coding System), the de facto standard for facial expression recognition and synthesis. This feature facilitates its use by third party platforms and encourages the development of imitation and of goal-based systems. Imitation systems learn from the user, while goal-based ones use planning techniques to drive the user towards a final desired state. To show the flexibility and reliability of the robotic head, the paper presents a software architecture that is able to detect, recognize, classify and generate facial expressions in real time using FACS. This system has been implemented using the robotics framework, RoboComp, which provides hardware-independent access to the sensors in the head. Finally, the paper presents experimental results showing the real-time functioning of the whole system, including recognition and imitation of human facial expressions. Full article

(This article belongs to the Special Issue State-of-the-Art Sensors Technology in Spain 2013)

► Show Figures

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI