Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models

Fernández-Rodicio, Enrique; Dondrup, Christian; Sevilla-Salcedo, Javier; Castro-González, Álvaro; Salichs, Miguel A.

doi:10.3390/biomimetics10120835

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models

by

Enrique Fernández-Rodicio

^1,*

,

Christian Dondrup

²,

Javier Sevilla-Salcedo

¹

,

Álvaro Castro-González

¹

and

Miguel A. Salichs

¹

Department of Systems Engineering and Automation, University Carlos III of Madrid, Av. de la Universidad, 30, 28911 Leganés, Spain

²

The Interaction Lab, School of Mathematical and Computer Sciences, Campus The Avenue, Heriot-Watt University, Edinburgh EH14 4AS, Scotland, UK

^*

Author to whom correspondence should be addressed.

Biomimetics 2025, 10(12), 835; https://doi.org/10.3390/biomimetics10120835 (registering DOI)

Submission received: 12 November 2025 / Revised: 8 December 2025 / Accepted: 9 December 2025 / Published: 13 December 2025

(This article belongs to the Special Issue Intelligent Human–Robot Interaction: 4th Edition)

Download Versions Notes

Abstract

In recent years, robots have started to be used in tasks involving human interaction. For this to be possible, humans must perceive robots as suitable interaction partners. This can be achieved by giving the robots an animate appearance. One of the methods that can be utilised to endow a robot with a lively appearance is giving it the ability to perform expressions on its own, that is, combining multimodal actions to convey information. However, this can become a challenge if the robot has to use gestures and speech simultaneously, as the non-verbal actions need to support the message communicated by the verbal component. In this manuscript, we present a system that, based on a robot’s utterances, predicts the corresponding gesture and synchronises it with the speech. A deep learning-based prediction model labels the robot’s speech with the types of expressions that should accompany it. Then, a rule-based synchronisation module connects different gestures to the correct parts of the speech. For this, we have tested two different approaches: (i) using a combination of recurrent neural networks and conditional random fields; and (ii) using transformer models. The results show that the proposed system can properly select co-speech gestures under the time constraints imposed by real-world interactions.

Keywords: gesture prediction; deep learning; transformer models; co-speech gestures; human–robot interaction

Share and Cite

MDPI and ACS Style

Fernández-Rodicio, E.; Dondrup, C.; Sevilla-Salcedo, J.; Castro-González, Á.; Salichs, M.A. Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models. Biomimetics 2025, 10, 835. https://doi.org/10.3390/biomimetics10120835

AMA Style

Fernández-Rodicio E, Dondrup C, Sevilla-Salcedo J, Castro-González Á, Salichs MA. Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models. Biomimetics. 2025; 10(12):835. https://doi.org/10.3390/biomimetics10120835

Chicago/Turabian Style

Fernández-Rodicio, Enrique, Christian Dondrup, Javier Sevilla-Salcedo, Álvaro Castro-González, and Miguel A. Salichs. 2025. "Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models" Biomimetics 10, no. 12: 835. https://doi.org/10.3390/biomimetics10120835

APA Style

Fernández-Rodicio, E., Dondrup, C., Sevilla-Salcedo, J., Castro-González, Á., & Salichs, M. A. (2025). Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models. Biomimetics, 10(12), 835. https://doi.org/10.3390/biomimetics10120835

Article Menu

Predicting and Synchronising Co-Speech Gestures for Enhancing Human–Robot Interactions Using Deep Learning Models

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI