MDPI - Publisher of Open Access Journals

28 pages, 6576 KiB

Open AccessArticle

Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning

by Akshara Pande and Deepti Mishra

Biomimetics 2024, 9(7), 391; https://doi.org/10.3390/biomimetics9070391 - 27 Jun 2024

Cited by 1 | Viewed by 3083

Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper’s speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements. Full article

(This article belongs to the Special Issue Intelligent Human-Robot Interaction: 2nd Edition)

► Show Figures

Graphical abstract

12 pages, 2286 KiB

Open AccessArticle

Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features

by Qiang Zhu, Zhong Wang, Yunfeng Dou and Jian Zhou

Algorithms 2022, 15(2), 68; https://doi.org/10.3390/a15020068 - 20 Feb 2022

Cited by 6 | Viewed by 3453

Abstract

A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved. Full article

► Show Figures

Figure 1

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI