MDPI - Publisher of Open Access Journals

17 pages, 465 KB

Open AccessReview

Dental Anxiety in Pediatric Patients: Contemporary Assessment and Multimodal Management Strategies

by Roxana Alexandra Cristea, Ioana Scrobota, Mihail Pantor, Liliana Sachelarie and Gabriela Ciavoi

Children 2026, 13(3), 397; https://doi.org/10.3390/children13030397 - 12 Mar 2026

Viewed by 419

Background: Dental anxiety remains a prevalent and persistent challenge in pediatric dentistry, significantly affecting children’s cooperation, treatment outcomes, and long-term oral health behaviors. Despite advances in minimally invasive care, anxiety continues to act as a barrier to effective clinical management. This narrative [...] Read more.

Background: Dental anxiety remains a prevalent and persistent challenge in pediatric dentistry, significantly affecting children’s cooperation, treatment outcomes, and long-term oral health behaviors. Despite advances in minimally invasive care, anxiety continues to act as a barrier to effective clinical management. This narrative review aims to synthesize current evidence on validated assessment tools for pediatric dental anxiety and to examine contemporary non-pharmacological management strategies applicable in routine clinical practice. Methods: A structured literature search was conducted in major electronic databases to identify relevant studies, systematic reviews, and clinical guidelines addressing dental anxiety assessment and behavioral management in children. Particular emphasis was placed on validated anxiety scales, communication strategies, environmental adaptations, and emerging digital interventions such as immersive distraction technologies. Results: Multiple validated instruments are available to assess pediatric dental anxiety; however, their applicability varies by age, cognitive development, and clinical context. Non-pharmacological approaches including tell–show–do, modeling, parental guidance, audiovisual distraction, and virtual reality-based techniques demonstrate consistent effectiveness in reducing anxiety and improving behavioral cooperation. Recent trends emphasize multimodal, patient-centered strategies integrating communication, environmental modification, and digital tools. Conclusions: Structured anxiety assessment combined with contemporary multimodal management strategies can enhance clinical efficiency, improve child cooperation, and promote positive dental experiences. The integration of emerging digital technologies represents a promising advancement in pediatric anxiety management and supports a more individualized approach to care. Furthermore, a structured multimodal clinical framework is proposed to facilitate chairside decision-making and practical implementation. Full article

(This article belongs to the Special Issue Recent Advances in Pediatric Dentistry: Techniques and Treatments)

► Show Figures

Figure 1

39 pages, 3511 KB

Open AccessSystematic Review

From Senses to Memory During Childhood: A Systematic Review and Bayesian Meta-Analysis Exploring Multisensory Processing and Working Memory Development

by Areej A. Alhamdan, Hayley E. Pickering, Melanie J. Murphy and Sheila G. Crewther

Eur. J. Investig. Health Psychol. Educ. 2025, 15(8), 157; https://doi.org/10.3390/ejihpe15080157 - 12 Aug 2025

Viewed by 4679

Abstract

Multisensory processing has long been recognized to enhance perception, cognition, and actions in adults. However, there is currently limited understanding of how multisensory stimuli, in comparison to unisensory stimuli, contribute to the development of both motor and verbally assessed working memory (WM) in [...] Read more.

Multisensory processing has long been recognized to enhance perception, cognition, and actions in adults. However, there is currently limited understanding of how multisensory stimuli, in comparison to unisensory stimuli, contribute to the development of both motor and verbally assessed working memory (WM) in children. Thus, the current study aimed to systematically review and meta-analyze the associations between the multisensory processing of auditory and visual stimuli, and performance on simple and more complex WM tasks, in children from birth to 15 years old. We also aimed to determine whether there are differences in WM capacity for audiovisual compared to unisensory auditory or visual stimuli alone after receptive and spoken language develop. Following PRISMA guidelines, a systematic search of PsycINFO, MEDLINE, Embase, PubMed, CINAHL and Web of Science databases identified that 21 out of 3968 articles met the inclusion criteria for Bayesian meta-analysis and the AXIS risk of bias criteria. The results showed at least extreme/decisive evidence for associations between verbal and motor reaction times on multisensory tasks and a variety of visual and auditory WM tasks, with verbal multisensory stimuli contributing more to verbally assessed WM capacity than unisensory auditory or visual stimuli alone. Furthermore, a meta-regression confirmed that age significantly moderates the observed association between multisensory processing and both visual and auditory WM tasks, indicating that verbal- and motor-assessed multisensory processing contribute differentially to WM performance, and to different age-determined extents. These findings have important implications for school-based learning methods and other educational activities where the implementation of multisensory stimuli is likely to enhance outcomes. Full article

► Show Figures

Figure 1

15 pages, 4273 KB

Open AccessEditor’s ChoiceArticle

Speech Emotion Recognition: Comparative Analysis of CNN-LSTM and Attention-Enhanced CNN-LSTM Models

by Jamsher Bhanbhro, Asif Aziz Memon, Bharat Lal, Shahnawaz Talpur and Madeha Memon

Signals 2025, 6(2), 22; https://doi.org/10.3390/signals6020022 - 9 May 2025

Cited by 11 | Viewed by 6775

Abstract

Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. [...] Read more.

Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. Despite its importance in various fields like human–computer interaction and mental health diagnosis, accurately identifying emotions from speech can be challenging due to differences in speakers, accents, and background noise. The work proposes two innovative deep learning models to improve SER accuracy: a CNN-LSTM model and an Attention-Enhanced CNN-LSTM model. These models were tested on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), collected between 2015 and 2018, which comprises 1440 audio files of male and female actors expressing eight emotions. Both models achieved impressive accuracy rates of over 96% in classifying emotions into eight categories. By comparing the CNN-LSTM and Attention-Enhanced CNN-LSTM models, this study offers comparative insights into modeling techniques, contributes to the development of more effective emotion recognition systems, and offers practical implications for real-time applications in healthcare and customer service. Full article

► Show Figures

Figure 1

20 pages, 917 KB

Open AccessArticle

Developing a Dataset of Audio Features to Classify Emotions in Speech

by Alvaro A. Colunga-Rodriguez, Alicia Martínez-Rebollar, Hugo Estrada-Esquivel, Eddie Clemente and Odette A. Pliego-Martínez

Computation 2025, 13(2), 39; https://doi.org/10.3390/computation13020039 - 5 Feb 2025

Cited by 7 | Viewed by 5961

Abstract

Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) to classify [...] Read more.

Emotion recognition in speech has gained increasing relevance in recent years, enabling more personalized interactions between users and automated systems. This paper presents the development of a dataset of features obtained from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) to classify emotions in speech. The paper highlights audio processing techniques such as silence removal and framing to extract features from the recordings. The features are extracted from the audio signals using spectral techniques, time-domain analysis, and the discrete wavelet transform. The resulting dataset is used to train a neural network and the support vector machine learning algorithm. Cross-validation is employed for model training. The developed models were optimized using a software package that performs hyperparameter tuning to improve results. Finally, the emotional classification outcomes were compared. The results showed an emotion classification accuracy of 0.654 for the perceptron neural network and 0.724 for the support vector machine algorithm, demonstrating satisfactory performance in emotion classification. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Figure 1

17 pages, 1741 KB

Open AccessReview

Effectiveness of Non-Pharmacological Interventions in Reducing Dental Anxiety Among Children with Special Needs: A Scoping Review with Conceptual Map

by Zuhair Motlak Alkahtani

Children 2025, 12(2), 165; https://doi.org/10.3390/children12020165 - 29 Jan 2025

Cited by 1 | Viewed by 5421

Abstract

Background: Children with special needs often need tailored approaches to oral healthcare to address their unique needs effectively. It is essential to analyze the effectiveness of non-pharmacological management in reducing dental anxiety among special needs children during dental treatment. Methods: Five electronic databases, [...] Read more.

Background: Children with special needs often need tailored approaches to oral healthcare to address their unique needs effectively. It is essential to analyze the effectiveness of non-pharmacological management in reducing dental anxiety among special needs children during dental treatment. Methods: Five electronic databases, PubMed, Scopus, Web of Science, Embase, and Google Scholar, were searched from 2007 to August 2024 for randomized control trials and observational studies comparing the effectiveness of non-pharmacological techniques in reducing dental anxiety during invasive and noninvasive dental treatment. The primary outcomes of the studied intervention were reduced dental anxiety and improved behavior during dental treatment. The conceptual map was created to understand the need for assessment and behavior management for special needs children (SN). Results: Nineteen articles qualified for the final analysis from 250 screened articles. Included studies evaluated the effect of strategies applied clinically, such as audio–visual distraction, sensory-adapted environment, and virtual reality. The included studies measured the trivial to large effect of measured interventions and supported non-pharmacological interventions in clinical settings. Conclusions: Most basic non-pharmacological interventions showed a trivial to large reduction in dental anxiety among SN patients. The conceptual map developed in this study supports the need for non-pharmacological interventions as they are cost-effective and create a positive environment in dental clinics. However, more studies need to focus on non-pharmacological behavior interventions in SN children to support the findings of this scoping review. Full article

(This article belongs to the Section Pediatric Dentistry & Oral Medicine)

► Show Figures

Figure 1

25 pages, 2085 KB

Open AccessArticle

How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?

by Tae-Jin Yoon

Appl. Sci. 2024, 14(23), 10972; https://doi.org/10.3390/app142310972 - 26 Nov 2024

Cited by 1 | Viewed by 2373

Abstract

The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying [...] Read more.

The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems. Full article

(This article belongs to the Special Issue Advances and Applications of Audio and Speech Signal Processing)

► Show Figures

Figure 1

23 pages, 3094 KB

Open AccessArticle

Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios

by Zhiyuan Wei, Hanchu Zhou and Rui Zhou

Appl. Sci. 2024, 14(21), 9866; https://doi.org/10.3390/app14219866 - 28 Oct 2024

Cited by 11 | Viewed by 3951

Abstract

Autonomous vehicles (AVs) must fulfill adequate safety requirements before formal application, and performing an effective functional evaluation to verify vehicle safety requires extensive testing in different scenarios. However, it is crucial to rationalize the application of different scenarios to support different testing needs; [...] Read more.

Autonomous vehicles (AVs) must fulfill adequate safety requirements before formal application, and performing an effective functional evaluation to verify vehicle safety requires extensive testing in different scenarios. However, it is crucial to rationalize the application of different scenarios to support different testing needs; thus, one of the current challenges limiting the development of AVs is the critical evaluation of scenarios, i.e., the lack of quantitative criteria for scenario design. This study introduces a method using the Spherical Fuzzy-Analytical Network Process (SF-ANP) to evaluate these scenarios, addressing their inherent risks and complexities. The method involves constructing a five-layer model to decompose scenario elements and using SF-ANP to calculate weights based on element interactions. The study evaluates 700 scenarios from the China In-depth Traffic Safety Study–Traffic Accident (CIMSS-TA) database, incorporating fuzzy factors and element weights. Virtual simulation of vehicles in the scenarios was performed using Baidu Apollo, and the performance of the scenarios was assessed by collecting the vehicle test results. The correlation between the obtained alternative safety indicators and the quantitative values confirms the validity and scientific validity of this approach. This will provide valuable guidance for categorizing audiovisual test scenarios and selecting corresponding scenarios to challenge different levels of vehicle functionality. At the same time, it can be used as a design basis to generate a large number of effective scenarios to accelerate the construction of scenario libraries and promote commercialization of AVs. Full article

(This article belongs to the Section Transportation and Future Mobility)

► Show Figures

Figure 1

13 pages, 2148 KB

Open AccessSystematic Review

The Role of Different Feedback Devices in the Survival of Patients in Cardiac Arrest: Systematic Review with Meta-Analysis

by Luca Gambolò, Pasquale Di Fronzo, Giuseppe Ristagno, Sofia Biserni, Martina Milazzo, Delia Marta Socaci, Leopoldo Sarli, Giovanna Artioli, Antonio Bonacaro and Giuseppe Stirparo

J. Clin. Med. 2024, 13(19), 5989; https://doi.org/10.3390/jcm13195989 - 8 Oct 2024

Cited by 7 | Viewed by 2408

Abstract

Background: Cardiac arrest is a critical condition affecting approximately 1 in every 1000 people in Europe. Feedback devices have been developed to enhance the quality of chest compressions during CPR, but their clinical impact remains uncertain. This study aims to evaluate the effect [...] Read more.

Background: Cardiac arrest is a critical condition affecting approximately 1 in every 1000 people in Europe. Feedback devices have been developed to enhance the quality of chest compressions during CPR, but their clinical impact remains uncertain. This study aims to evaluate the effect of feedback devices on key clinical outcomes in adult patients experiencing both out-of-hospital (OHCA) and in-hospital cardiac arrest (IHCA). The primary objective is to assess their impact on the return of spontaneous circulation (ROSC); secondary objectives include the evaluation of neurological status and survival to discharge. Methods: A systematic review was conducted following PRISMA guidelines, utilizing databases including PubMed, Scopus, Web of Science, and Embase. Studies published between 2000 and 2023 comparing CPR with and without the use of feedback devices were included. A fixed-effects network meta-analysis was performed for ROSC and survival, while a frequentist meta-analysis was conducted for neurological outcomes. Results: Twelve relevant studies met the inclusion criteria. The meta-analysis results showed that the use of audiovisual feedback devices significantly increases the likelihood of ROSC (OR 1.26, 95% CI 1.13–1.41, p < 0.0001) and survival (OR 1.52, 95% CI 1.27–1.81, p < 0.0001) compared to CPR without feedback. However, the effect of metronomes did not reach statistical significance. Conclusions: Feedback devices, particularly audiovisual ones, are associated with improved clinical outcomes in cardiac arrest patients. Their use should be encouraged in both training settings and real-life emergency scenarios to enhance survival rates and ROSC. However, further studies are needed to confirm long-term impacts and to explore the potential benefits of metronomes. Full article

(This article belongs to the Section Epidemiology & Public Health)

► Show Figures

Figure 1

21 pages, 1315 KB

Open AccessReview

The Use of Audiovisual Distraction Tools in the Dental Setting for Pediatric Subjects with Special Healthcare Needs: A Review and Proposal of a Multi-Session Model for Behavioral Management

by Massimo Pisano, Alessia Bramanti, Giuseppina De Benedetto, Carmen Martin Carreras-Presas and Federica Di Spirito

Children 2024, 11(9), 1077; https://doi.org/10.3390/children11091077 - 2 Sep 2024

Cited by 9 | Viewed by 4503

Abstract

Background: A Special Health Care Need (SHCN) is characterized by any type of physical, mental, sensorial, cognitive, emotional, or developmental condition that requires medical treatment, specialized services, or healthcare interventions. These conditions can negatively impact oral health as SHCN children can hardly cooperate [...] Read more.

Background: A Special Health Care Need (SHCN) is characterized by any type of physical, mental, sensorial, cognitive, emotional, or developmental condition that requires medical treatment, specialized services, or healthcare interventions. These conditions can negatively impact oral health as SHCN children can hardly cooperate or communicate and experience higher levels of dental fear/anxiety, which interfere with regular appointments. The present narrative review aims to analyze the use of audiovisual (AV) tools in dental setting for the management of SHCN children during dental treatment and to evaluate their effectiveness in anxiety/behavior control from the child, dentist, and care-giver perspectives. This analysis leads to the proposal of a new multi-session model for the behavioral management of SHCN pediatric subjects. Methods: An electronic search on the MEDLINE/Pubmed, Scopus, and Web of Science databases was carried out and through this analysis, a new model was proposed, the “UNISA-Virtual Stepwise Distraction model”, a multi-session workflow combining traditional behavior management and the progressive introduction of AV media to familiarize the SHCN child with dental setting and manage behavior. Results: AV tools helped in most cases to manage SHCN behavior and decreased stress in both the dentist and child during dental treatments. Care-givers also welcomed AV distractors, reporting positive feedback in using them during future treatments. Conclusions: The present narrative review found increasing evidence of the use of AV media for SHCN pediatric subjects as distraction tools during dental treatment. In the majority of the studies, AV tools proved to be effective for the management of anxiety, dental fear, and behavior in dental setting. Full article

(This article belongs to the Special Issue Oral, Dental and Periodontal Manifestations of Infectious Diseases in Children)

► Show Figures

Figure 1

18 pages, 2938 KB

Open AccessArticle

Facial Animation Strategies for Improved Emotional Expression in Virtual Reality

by Hyewon Song and Beom Kwon

Electronics 2024, 13(13), 2601; https://doi.org/10.3390/electronics13132601 - 2 Jul 2024

Cited by 8 | Viewed by 5551

Abstract

The portrayal of emotions by virtual characters is crucial in virtual reality (VR) communication. Effective communication in VR relies on a shared understanding, which is significantly enhanced when virtual characters authentically express emotions that align with their spoken words. While human emotions are [...] Read more.

The portrayal of emotions by virtual characters is crucial in virtual reality (VR) communication. Effective communication in VR relies on a shared understanding, which is significantly enhanced when virtual characters authentically express emotions that align with their spoken words. While human emotions are often conveyed through facial expressions, existing facial animation techniques have mainly focused on lip-syncing and head movements to improve naturalness. This study investigates the influence of various factors in facial animation on the emotional representation of virtual characters. We conduct a comparative and analytical study using an audio-visual database, examining the impact of different animation factors. To this end, we utilize a total of 24 voice samples, representing 12 different speakers, with each emotional voice segment lasting approximately 4–5 s. Using these samples, we design six perceptual experiments to investigate the impact of facial cues—including facial expression, lip movement, head motion, and overall appearance—on the expression of emotions by virtual characters. Additionally, we engaged 20 participants to evaluate and select appropriate combinations of facial expressions, lip movements, head motions, and appearances that align with the given emotion and its intensity. Our findings indicate that emotional representation in virtual characters is closely linked to facial expressions, head movements, and overall appearance. Conversely, lip-syncing, which has been a primary focus in prior studies, seems less critical for conveying emotions, as its accuracy is difficult to perceive with the naked eye. The results of our study can significantly benefit the VR community by aiding in the development of virtual characters capable of expressing a diverse range of emotions. Full article

► Show Figures

Figure 1

31 pages, 9940 KB

Open AccessArticle

Combining Transformer, Convolutional Neural Network, and Long Short-Term Memory Architectures: A Novel Ensemble Learning Technique That Leverages Multi-Acoustic Features for Speech Emotion Recognition in Distance Education Classrooms

by Eman Abdulrahman Alkhamali, Arwa Allinjawi and Rehab Bahaaddin Ashari

Appl. Sci. 2024, 14(12), 5050; https://doi.org/10.3390/app14125050 - 10 Jun 2024

Cited by 9 | Viewed by 3171

Abstract

Speech emotion recognition (SER) is a technology that can be applied to distance education to analyze speech patterns and evaluate speakers’ emotional states in real time. It provides valuable insights and can be used to enhance students’ learning experiences by enabling the assessment [...] Read more.

Speech emotion recognition (SER) is a technology that can be applied to distance education to analyze speech patterns and evaluate speakers’ emotional states in real time. It provides valuable insights and can be used to enhance students’ learning experiences by enabling the assessment of their instructors’ emotional stability, a factor that significantly impacts the effectiveness of information delivery. Students demonstrate different engagement levels during learning activities, and assessing this engagement is important for controlling the learning process and improving e-learning systems. An important aspect that may influence student engagement is their instructors’ emotional state. Accordingly, this study used deep learning techniques to create an automated system for recognizing instructors’ emotions in their speech when delivering distance learning. This methodology entailed integrating transformer, convolutional neural network, and long short-term memory architectures into an ensemble to enhance the SER. Feature extraction from audio data used Mel-frequency cepstral coefficients; chroma; a Mel spectrogram; the zero-crossing rate; spectral contrast, centroid, bandwidth, and roll-off; and the root-mean square, with subsequent optimization processes such as adding noise, conducting time stretching, and shifting the audio data. Several transformer blocks were incorporated, and a multi-head self-attention mechanism was employed to identify the relationships between the input sequence segments. The preprocessing and data augmentation methodologies significantly enhanced the precision of the results, with accuracy rates of 96.3%, 99.86%, 96.5%, and 85.3% for the Ryerson Audio–Visual Database of Emotional Speech and Song, Berlin Database of Emotional Speech, Surrey Audio–Visual Expressed Emotion, and Interactive Emotional Dyadic Motion Capture datasets, respectively. Furthermore, it achieved 83% accuracy on another dataset created for this study, the Saudi Higher-Education Instructor Emotions dataset. The results demonstrate the considerable accuracy of this model in detecting emotions in speech data across different languages and datasets. Full article

(This article belongs to the Special Issue Computer Vision and AI for Interactive Robotics)

► Show Figures

Figure 1

22 pages, 6938 KB

Open AccessArticle

Streamline Intelligent Crowd Monitoring with IoT Cloud Computing Middleware

by Alexandros Gazis and Eleftheria Katsiri

Sensors 2024, 24(11), 3643; https://doi.org/10.3390/s24113643 - 4 Jun 2024

Cited by 1 | Viewed by 4052

Abstract

This article introduces a novel middleware that utilizes cost-effective, low-power computing devices like Raspberry Pi to analyze data from wireless sensor networks (WSNs). It is designed for indoor settings like historical buildings and museums, tracking visitors and identifying points of interest. It serves [...] Read more.

This article introduces a novel middleware that utilizes cost-effective, low-power computing devices like Raspberry Pi to analyze data from wireless sensor networks (WSNs). It is designed for indoor settings like historical buildings and museums, tracking visitors and identifying points of interest. It serves as an evacuation aid by monitoring occupancy and gauging the popularity of specific areas, subjects, or art exhibitions. The middleware employs a basic form of the MapReduce algorithm to gather WSN data and distribute it across available computer nodes. Data collected by RFID sensors on visitor badges is stored on mini-computers placed in exhibition rooms and then transmitted to a remote database after a preset time frame. Utilizing MapReduce for data analysis and a leader election algorithm for fault tolerance, this middleware showcases its viability through metrics, demonstrating applications like swift prototyping and accurate validation of findings. Despite using simpler hardware, its performance matches resource-intensive methods involving audiovisual and AI techniques. This design’s innovation lies in its fault-tolerant, distributed setup using budget-friendly, low-power devices rather than resource-heavy hardware or methods. Successfully tested at a historical building in Greece (M. Hatzidakis’ residence), it is tailored for indoor spaces. This paper compares its algorithmic application layer with other implementations, highlighting its technical strengths and advantages. Particularly relevant in the wake of the COVID-19 pandemic and general monitoring middleware for indoor locations, this middleware holds promise in tracking visitor counts and overall building occupancy. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

7 pages, 167 KB

Open AccessEditorial

Special Issue on IberSPEECH 2022: Speech and Language Technologies for Iberian Languages

by José L. Pérez-Córdoba, Francesc Alías-Pujol and Zoraida Callejas

Appl. Sci. 2024, 14(11), 4505; https://doi.org/10.3390/app14114505 - 24 May 2024

Viewed by 1348

Abstract

ThisSpecial Issue presents the latest advances in research and novel applications of speech and language technologies based on the works presented at the sixth edition of the IberSPEECH conference held in Granada in 2022, paying special attention to those focused on Iberian languages. [...] Read more.

ThisSpecial Issue presents the latest advances in research and novel applications of speech and language technologies based on the works presented at the sixth edition of the IberSPEECH conference held in Granada in 2022, paying special attention to those focused on Iberian languages. IberSPEECH is the international conference of the Special Interest Group on Iberian Languages (SIG-IL) of the International Speech Communication Association (ISCA) and the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla, or RTTH for short). Several researchers were invited to extend the contributions presented at IberSPEECH2022 due to their interest and quality. As a result, the Special Issue is composed of 11 papers that cover different research topics related to speech perception, speech analysis and enhancement, speaker verification and identification, speech production and synthesis, natural language processing, together with several applications and evaluation challenges. Full article

(This article belongs to the Special Issue IberSPEECH 2022: Speech and Language Technologies for Iberian Languages)

15 pages, 483 KB

Open AccessArticle

A Feature Selection Algorithm Based on Differential Evolution for English Speech Emotion Recognition

by Liya Yue, Pei Hu, Shu-Chuan Chu and Jeng-Shyang Pan

Appl. Sci. 2023, 13(22), 12410; https://doi.org/10.3390/app132212410 - 16 Nov 2023

Cited by 6 | Viewed by 2544

Abstract

The automatic identification of emotions from speech holds significance in facilitating interactions between humans and machines. To improve the recognition accuracy of speech emotion, we extract mel-frequency cepstral coefficients (MFCCs) and pitch features from raw signals, and an improved differential evolution (DE) algorithm [...] Read more.

The automatic identification of emotions from speech holds significance in facilitating interactions between humans and machines. To improve the recognition accuracy of speech emotion, we extract mel-frequency cepstral coefficients (MFCCs) and pitch features from raw signals, and an improved differential evolution (DE) algorithm is utilized for feature selection based on K-nearest neighbor (KNN) and random forest (RF) classifiers. The proposed multivariate DE (MDE) adopts three mutation strategies to solve the slow convergence of the classical DE and maintain population diversity, and employs a jumping method to avoid falling into local traps. The simulations are conducted on four public English speech emotion datasets: eNTERFACE05, Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Surrey Audio-Visual Expressed Emotion (SAEE), and Toronto Emotional Speech Set (TESS), and they cover a diverse range of emotions. The MDE algorithm is compared with PSO-assisted biogeography-based optimization (BBO_PSO), DE, and the sine cosine algorithm (SCA) on emotion recognition error, number of selected features, and running time. From the results obtained, MDE obtains the errors of 0.5270, 0.5044, 0.4490, and 0.0420 in eNTERFACE05, RAVDESS, SAVEE, and TESS based on the KNN classifier, and the errors of 0.4721, 0.4264, 0.3283 and 0.0114 based on the RF classifier. The proposed algorithm demonstrates excellent performance in emotion recognition accuracy, and it finds meaningful acoustic features from MFCCs and pitch. Full article

(This article belongs to the Special Issue Recent Applications of Explainable AI (XAI))

► Show Figures

Figure 1

14 pages, 406 KB

Open AccessArticle

English Speech Emotion Classification Based on Multi-Objective Differential Evolution

by Liya Yue, Pei Hu, Shu-Chuan Chu and Jeng-Shyang Pan

Appl. Sci. 2023, 13(22), 12262; https://doi.org/10.3390/app132212262 - 13 Nov 2023

Cited by 8 | Viewed by 1926

Abstract

Speech signals involve speakers’ emotional states and language information, which is very important for human–computer interaction that recognizes speakers’ emotions. Feature selection is a common method for improving recognition accuracy. In this paper, we propose a multi-objective optimization method based on differential evolution [...] Read more.

Speech signals involve speakers’ emotional states and language information, which is very important for human–computer interaction that recognizes speakers’ emotions. Feature selection is a common method for improving recognition accuracy. In this paper, we propose a multi-objective optimization method based on differential evolution (MODE-NSF) that maximizes recognition accuracy and minimizes the number of selected features (NSF). First, the Mel-frequency cepstral coefficient (MFCC) features and pitch features are extracted from speech signals. Then, the proposed algorithm implements feature selection where the NSF guides the initialization, crossover, and mutation of the algorithm. We used four English speech emotion datasets, and K-nearest neighbor (KNN) and random forest (RF) classifiers to validate the performance of the proposed algorithm. The results illustrate that MODE-NSF is superior to other multi-objective algorithms in terms of the hypervolume (HV), inverted generational distance (IGD), Pareto optimal solutions, and running time. MODE-NSF achieved an accuracy of 49% using eNTERFACE05, 53% using the Ryerson audio-visual database of emotional speech and song (RAVDESS), 76% using Surrey audio-visual expressed emotion (SAVEE) database, and 98% using the Toronto emotional speech set (TESS). MODE-NSF obtained good recognition results, which provides a basis for the establishment of emotional models. Full article

(This article belongs to the Special Issue Multi-Modal Deep Learning and Its Applications)

► Show Figures

Figure 1

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI