Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (2)

Search Parameters:
Keywords = whispered speech conversion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 6576 KiB  
Article
Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning
by Akshara Pande and Deepti Mishra
Biomimetics 2024, 9(7), 391; https://doi.org/10.3390/biomimetics9070391 - 27 Jun 2024
Cited by 1 | Viewed by 3083
Abstract
Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, [...] Read more.
Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper’s speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements. Full article
(This article belongs to the Special Issue Intelligent Human-Robot Interaction: 2nd Edition)
Show Figures

Graphical abstract

12 pages, 2286 KiB  
Article
Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features
by Qiang Zhu, Zhong Wang, Yunfeng Dou and Jian Zhou
Algorithms 2022, 15(2), 68; https://doi.org/10.3390/a15020068 - 20 Feb 2022
Cited by 6 | Viewed by 3453
Abstract
A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature [...] Read more.
A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved. Full article
Show Figures

Figure 1

Back to TopTop