Next Article in Journal
Improving Cybersafety Maturity of South African Schools
Next Article in Special Issue
Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese
Previous Article in Journal
Multiple Resolution Modeling: A Particular Case of Distributed Simulation
Previous Article in Special Issue
Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese
Open AccessArticle

The BioVisualSpeech Corpus of Words with Sibilants for Speech Therapy Games Development

NOVA LINCS, Department of Computer Science, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal
Escola Superior de Saúde do Alcoitão, Rua Conde Barão, Alcoitão, 2649-506 Alcabideche, Portugal
Clinical Pharmacological Unit, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisboa, Portugal
INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, Portugal
Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1 1049-001 Lisboa, Portugal
Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
Authors to whom correspondence should be addressed.
Information 2020, 11(10), 470;
Received: 4 August 2020 / Revised: 20 September 2020 / Accepted: 23 September 2020 / Published: 2 October 2020
(This article belongs to the Special Issue Selected Papers from PROPOR 2020)
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also include samples from people with speech sound disorders. In addition, the annotation of the data should include information on the correctness of the speech productions. Following these criteria, we collected a corpus that can be used to develop computer tools for speech and language therapy of Portuguese children with sigmatism. The proposed corpus contains European Portuguese children’s word productions in which the words have sibilant consonants. The corpus has productions from 356 children from 5 to 9 years of age. Some important characteristics of this corpus, that are relevant to speech and language therapy and computer science research, are that (1) the corpus includes data from children with speech sound disorders; and (2) the productions were annotated according to the criteria of speech and language pathologists, and have information about the speech production errors. These are relevant features for the development and assessment of speech processing tools for speech therapy of Portuguese children. In addition, as an illustration on how to use the corpus, we present three speech therapy games that use a convolutional neural network sibilants classifier trained with data from this corpus and a word recognition module trained on additional children data and calibrated and evaluated with the collected corpus. View Full-Text
Keywords: sibilant consonants; children’s speech corpus; speech sound disorders; serious games for speech and language therapy sibilant consonants; children’s speech corpus; speech sound disorders; serious games for speech and language therapy
Show Figures

Figure 1

MDPI and ACS Style

Cavaco, S.; Guimarães, I.; Ascensão, M.; Abad, A.; Anjos, I.; Oliveira, F.; Martins, S.; Marques, N.; Eskenazi, M.; Magalhães, J.; Grilo, M. The BioVisualSpeech Corpus of Words with Sibilants for Speech Therapy Games Development. Information 2020, 11, 470.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop