Computer Vision and AI for Interactive Robotics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Robotics and Automation".

Deadline for manuscript submissions: 20 June 2024 | Viewed by 689

Special Issue Editors


E-Mail Website
Guest Editor
Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
Interests: smart mobile robots; e-health systems; cloud and edge computing; human–robot interaction; artificial intelligence

E-Mail Website
Guest Editor
Automation, Electrical Engineering and Electronic Technology Department, Industrial Engineering Technical School, Technical University of Cartagena, 30202 Cartagena, Spain
Interests: embedded systems; wireless sensor networks; e-health

Special Issue Information

Dear Colleagues,

The rise of robotics technology has led us into a new era of intelligence. Robots have been widely used in industrial manufacturing, medicine, transportation, agriculture, etc., greatly improving human production and living standards. Computer vision and artificial intelligence are important guarantees for achieving intelligent human–robot interaction. With the help of these advanced technologies, robots can recognize things, track their motion, comprehend spatial connections, and autonomously navigate and control complex and dynamic environments. In addition, the application of speech and emotion recognition in robots makes the interaction between humans and robots more intuitive and natural. With the advancement of machine learning, computer vision, natural language processing, and artificial intelligence technology, interactive robots will be more intelligent and efficient and provide humans with better services and experiences.

We invite authors to submit high-quality, original research papers in the field of interactive Robotics. The topics of interest in this Special Issue include, but are not limited to

  • Human–robot interaction;
  • Mobile robots;
  • Industrial robots;
  • Robots for medicine and healthcare;
  • Obstacle-avoidance robots;
  • Robot vision;
  • Robot design and control;
  • Robot perception;
  • Robot motion planning;
  • Visual robot for navigation;
  • Image processing based on robots;
  • Object detection based on robots and automation;
  • Robots and Cloud and Edge Computing.

Prof. Dr. Nieves Pavón-Pulido
Prof. Dr. Juan Antonio López Riquelme
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • interactive robots
  • computer vision
  • artificial intelligence
  • machine learning
  • deep learning
  • neural network
  • natural language processing
  • emotion recognition technology

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

31 pages, 9940 KiB  
Article
Combining Transformer, Convolutional Neural Network, and Long Short-Term Memory Architectures: A Novel Ensemble Learning Technique That Leverages Multi-Acoustic Features for Speech Emotion Recognition in Distance Education Classrooms
by Eman Abdulrahman Alkhamali, Arwa Allinjawi and Rehab Bahaaddin Ashari
Appl. Sci. 2024, 14(12), 5050; https://doi.org/10.3390/app14125050 - 10 Jun 2024
Viewed by 452
Abstract
Speech emotion recognition (SER) is a technology that can be applied to distance education to analyze speech patterns and evaluate speakers’ emotional states in real time. It provides valuable insights and can be used to enhance students’ learning experiences by enabling the assessment [...] Read more.
Speech emotion recognition (SER) is a technology that can be applied to distance education to analyze speech patterns and evaluate speakers’ emotional states in real time. It provides valuable insights and can be used to enhance students’ learning experiences by enabling the assessment of their instructors’ emotional stability, a factor that significantly impacts the effectiveness of information delivery. Students demonstrate different engagement levels during learning activities, and assessing this engagement is important for controlling the learning process and improving e-learning systems. An important aspect that may influence student engagement is their instructors’ emotional state. Accordingly, this study used deep learning techniques to create an automated system for recognizing instructors’ emotions in their speech when delivering distance learning. This methodology entailed integrating transformer, convolutional neural network, and long short-term memory architectures into an ensemble to enhance the SER. Feature extraction from audio data used Mel-frequency cepstral coefficients; chroma; a Mel spectrogram; the zero-crossing rate; spectral contrast, centroid, bandwidth, and roll-off; and the root-mean square, with subsequent optimization processes such as adding noise, conducting time stretching, and shifting the audio data. Several transformer blocks were incorporated, and a multi-head self-attention mechanism was employed to identify the relationships between the input sequence segments. The preprocessing and data augmentation methodologies significantly enhanced the precision of the results, with accuracy rates of 96.3%, 99.86%, 96.5%, and 85.3% for the Ryerson Audio–Visual Database of Emotional Speech and Song, Berlin Database of Emotional Speech, Surrey Audio–Visual Expressed Emotion, and Interactive Emotional Dyadic Motion Capture datasets, respectively. Furthermore, it achieved 83% accuracy on another dataset created for this study, the Saudi Higher-Education Instructor Emotions dataset. The results demonstrate the considerable accuracy of this model in detecting emotions in speech data across different languages and datasets. Full article
(This article belongs to the Special Issue Computer Vision and AI for Interactive Robotics)
Show Figures

Figure 1

Back to TopTop