Next Article in Journal
A Novel Preprocessing Method for Dynamic Point-Cloud Compression
Next Article in Special Issue
Speech-Based Support System to Supervise Chronic Obstructive Pulmonary Disease Patient Status
Previous Article in Journal
Impact of Praseodymia Additions and Firing Conditions on Structural and Electrical Transport Properties of 5 mol.% Yttria Partially Stabilized Zirconia (5YSZ)
 
 
Article

Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target

HiTZ-Aholab, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
*
Authors to whom correspondence should be addressed.
Academic Editor: Francesc Alías
Appl. Sci. 2021, 11(13), 5940; https://doi.org/10.3390/app11135940
Received: 9 April 2021 / Revised: 15 June 2021 / Accepted: 18 June 2021 / Published: 26 June 2021
(This article belongs to the Special Issue Applications of Speech and Language Technologies in Healthcare)
Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set. View Full-Text
Keywords: pathological speech; voice conversion; intelligibility; speech recognition pathological speech; voice conversion; intelligibility; speech recognition
Show Figures

Figure 1

MDPI and ACS Style

Raman, S.; Sarasola, X.; Navas, E.; Hernaez, I. Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target. Appl. Sci. 2021, 11, 5940. https://doi.org/10.3390/app11135940

AMA Style

Raman S, Sarasola X, Navas E, Hernaez I. Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target. Applied Sciences. 2021; 11(13):5940. https://doi.org/10.3390/app11135940

Chicago/Turabian Style

Raman, Sneha, Xabier Sarasola, Eva Navas, and Inma Hernaez. 2021. "Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target" Applied Sciences 11, no. 13: 5940. https://doi.org/10.3390/app11135940

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop