MDPI - Publisher of Open Access Journals

29 pages, 2669 KB

Open AccessArticle

How Has Poets’ Reading Style Changed? A Phonetic Analysis of the Effects of Historical Phases and Gender on 20th Century Spanish Poetry Reading

by Valentina Colonna

Languages 2025, 10(10), 255; https://doi.org/10.3390/languages10100255 - 30 Sep 2025

Viewed by 511

Abstract

Poetry reading remains a largely underexplored area in phonetic research. While previous studies have highlighted its potential and challenges, experimental research in the Spanish context is still limited. This study aims to examine the evolution of Spanish poetry reading over time, focusing on [...] Read more.

Poetry reading remains a largely underexplored area in phonetic research. While previous studies have highlighted its potential and challenges, experimental research in the Spanish context is still limited. This study aims to examine the evolution of Spanish poetry reading over time, focusing on its main prosodic features. Applying the VIP-VSP phonetic model to 40 poetry recordings, we analyzed the organizational and prosodic indices that characterize poetry reading. Mean speech rate, plenus (the ratio of speaking time to pausing), and pitch span emerged as key parameters for capturing change. The results identified two distinct historical phases—first and second radio-television—showing significant effects on speech rate, plenus, and pitch span: speech rate and pitch span increased over time, while plenus decreased. Gender also played a key role, with female voices exhibiting significantly higher values in both pitch span and plenus. Variability and recurring strategies were observed within and across authors. This study confirms that poetry reading has evolved along a ‘stylistic-chronological’ trajectory, while also reflecting gender-based distinctions. These findings underscore the need for interdisciplinary analytical approaches and diversified classification groupings to fully capture the complexity of this mode of speech. Full article

► Show Figures

Figure 1

23 pages, 3556 KB

Open AccessArticle

The Neglected Group: Cognitive Discourse Markers as Signposts of Prosodic Unit Boundaries

by Simona Majhenič, Mitja Beras and Janez Križaj

Languages 2025, 10(7), 159; https://doi.org/10.3390/languages10070159 - 27 Jun 2025

Viewed by 1222

Abstract

The present paper examines and compares the role of cognitive discourse markers (DMs), such as uhm, like, or I mean, and a set of prosodic parameters as indicators of prosodic boundaries. Cognitive DMs traditionally are not studied as a separate [...] Read more.

The present paper examines and compares the role of cognitive discourse markers (DMs), such as uhm, like, or I mean, and a set of prosodic parameters as indicators of prosodic boundaries. Cognitive DMs traditionally are not studied as a separate DM group on par with the ideational, sequential, rhetorical, or interpersonal group. However, as they reflect the speaker’s mental processes during speech production, they offer an exceptional glimpse into how speakers construct their verbalisations. Along with the analysis of DMs, prosodic parameters, including pitch and intensity reset, speech rate change, and pauses, were automatically annotated to determine how well they overlapped with the manually annotated prosodic boundaries. To accommodate for the natural variability in speech, the parameters were evaluated using relative comparison methods. Among the prosodic parameters, pauses were found to overlap most often with the manually annotated prosodic boundaries. Cognitive DMs in the function of realising new information, restructuring, and emphasis indeed proved as relevant boundary indicators, however, the group of cognitive DMs as a whole fell behind the group of sequential and rhetorical DMs, which overlapped most frequently with the manually annotated prosodic boundaries. Full article

(This article belongs to the Special Issue Current Trends in Discourse Marker Research)

► Show Figures

Figure 1

15 pages, 1134 KB

Open AccessArticle

Is the Prosodic Structure of Texts Reflected in Silent Reading? An Eye-Tracking Corpus Analysis

by Marijan Palmović and Kristina Cergol

J. Eye Mov. Res. 2025, 18(3), 24; https://doi.org/10.3390/jemr18030024 - 18 Jun 2025

Viewed by 712

Abstract

The aim of this study was to test the Implicit Prosody Hypothesis using a reading corpus, i.e., a text without experimental manipulation labelled with eye-tracking parameters. For this purpose, a bilingual Croatian–English reading corpus was analysed. In prosodic terms, Croatian and English are [...] Read more.

The aim of this study was to test the Implicit Prosody Hypothesis using a reading corpus, i.e., a text without experimental manipulation labelled with eye-tracking parameters. For this purpose, a bilingual Croatian–English reading corpus was analysed. In prosodic terms, Croatian and English are at the opposite ends of the spectrum: English is considered a time-framed language, while Croatian is a syllable-framed language. This difference served as a kind of experimental control in this study on natural reading. The results show that readers’ eyes lingered more on stressed syllables than on the arrangement of stressed and unstressed syllables for both languages. This is especially pronounced for English, a language with greater differences in the duration of stressed and unstressed syllables. This study provides indirect evidence in favour of the Implicit Prosody Hypothesis, i.e., the idea that readers are guided by their inner voice with its suprasegmental features when reading silently. The differences between the languages can be traced back to the typological differences in stress in English and Croatian. Full article

► Show Figures

Figure 1

15 pages, 549 KB

Open AccessArticle

Math for Everybody: A Sonification Module for Computer Algebra Systems Aimed at Visually Impaired People

by Ana M. Zambrano, Mateo N. Salvador, Felipe Grijalva, Henry Carvajal Mora and Nathaly Orozco Garzón

Technologies 2024, 12(8), 133; https://doi.org/10.3390/technologies12080133 - 12 Aug 2024

Viewed by 3359

Abstract

Computer Algebra Systems (CAS) currently lack an effective auditory representation, with most existing solutions relying on screen readers that provide limited functionality. This limitation prevents blind users from fully understanding and interpreting mathematical expressions, leading to confusion and self-doubt. This paper addresses the [...] Read more.

Computer Algebra Systems (CAS) currently lack an effective auditory representation, with most existing solutions relying on screen readers that provide limited functionality. This limitation prevents blind users from fully understanding and interpreting mathematical expressions, leading to confusion and self-doubt. This paper addresses the challenges blind individuals face when comprehending mathematical expressions within a CAS environment. We propose “Math for Everybody” (Math4e, version 1.0), a software module to reduce barriers for blind users in education. Math4e is a Sonification Module for CAS that generates a series of auditory tones, prosodic cues, and variations in audio parameters such as volume and speed. These resources are designed to eliminate ambiguity and facilitate the interpretation and understanding of mathematical expressions for blind users. To assess the effectiveness of Math4e, we conducted standardized tests employing the methodologies outlined in the Software Engineering Body of Knowledge (SWEBOK), International Software Testing Qualifications Board (ISTBQ), and ISO/IEC/IEEE 29119. The evaluation encompassed two scenarios: one involving simulated blind users and another with real blind users associated with the “Asociación de Invidentes Milton Vedado” foundation in Ecuador. Through the SAM methodology and verbal surveys (given the condition of the evaluated user), results are obtained, such as 90.56% for pleasure, 90.78% for arousal, and 91.56% for dominance, which demonstrates significant acceptance of the systems by the users. The outcomes underscored the users’ commendable ability to identify mathematical expressions accurately. Full article

(This article belongs to the Section Assistive Technologies)

► Show Figures

Figure 1

18 pages, 1861 KB

Open AccessArticle

The Interplay between Syllabic Duration and Melody to Indicate Prosodic Functions in Brazilian Portuguese Story Retelling

by Plinio A. Barbosa and Luís H. G. Alvarenga

Languages 2024, 9(8), 268; https://doi.org/10.3390/languages9080268 - 1 Aug 2024

Viewed by 1808

Abstract

This paper investigates the relationship between syllabic duration and F0 contours for implementing three prosodic functions. Work on rhythm usually describes the evolution of syllable-sized durations throughout utterances, rarely making reference to melodic events. On the other hand, work on intonation usually describes [...] Read more.

This paper investigates the relationship between syllabic duration and F0 contours for implementing three prosodic functions. Work on rhythm usually describes the evolution of syllable-sized durations throughout utterances, rarely making reference to melodic events. On the other hand, work on intonation usually describes linear sequences of melodic events with indirect references to duration. Although some scholars have explored the relationship between these two parameters for particular functions, to our knowledge, there has been no investigation on the systematic correlation between syllabic duration and F0 values throughout narrative sequences. Based on a corpus of story retelling with nine speakers of Brazilian Portuguese from two regions, our work investigated the interplay between syllabic duration and melody to signal three prosodic functions: terminal and non-terminal boundary marking and prominence. The examination of local syllabic duration maxima and four F0 descriptors revealed that these maxima act as landmarks for particular F0 shapes: for non-terminal boundaries, the great majority of shapes were increasing and increasing–decreasing patterns; for terminal boundaries, almost all shapes were decreasing F0 patterns; and for prominence marking, the great majority of shapes were high tones across the stressed syllable. Time series analyses revealed significant correlations between duration and specific F0 descriptors, pointing to a ruled interplay between F0 and syllabic duration patterns in Brazilian Portuguese story retelling. Full article

(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)

► Show Figures

Figure 1

24 pages, 2631 KB

Open AccessArticle

The ProA Online Tool for Prosody Assessment and Its Use for the Definition of Acoustic Models for Prosodic Evaluation of L2 Spanish Learners

by Juan-María Garrido and Daniel Ortega

Languages 2024, 9(1), 28; https://doi.org/10.3390/languages9010028 - 15 Jan 2024

Viewed by 2903

Abstract

Assessment of prosody is not usually included in the evaluation of oral expression skills of L2 Spanish learners. Some of the factors that probably explain this fact are the lack of adequate materials, correctness models and tools to carry out this assessment. This [...] Read more.

Assessment of prosody is not usually included in the evaluation of oral expression skills of L2 Spanish learners. Some of the factors that probably explain this fact are the lack of adequate materials, correctness models and tools to carry out this assessment. This paper describes one of the results of the ProA (Prosody Assessment) project, a web tool for the online assessment of Spanish prosody. The tool allows the online development of evaluation tests and rubrics, the completion of these tests and their remote scoring. An example of use of this tool for research purposes is also presented: three prosodic parameters (global energy, speech rate, F0 range) of a set of oral productions of two L2 Spanish learners, collected using the tests developed in the project, were evaluated by three L2 Spanish teachers using the web tool and the rubrics developed also in the ProA project, and the obtained ratings were compared with the results of the acoustic analysis of these parameters in the material to determine to what extent there was a correlation between evaluators’ judgements and prosodic parameters. The results obtained may be of interest, for example, for the development of future automatic prosody assessment systems. Full article

(This article belongs to the Special Issue Speech Analysis and Tools in L2 Pronunciation Acquisition)

► Show Figures

Figure 1

18 pages, 4647 KB

Open AccessArticle

The Dance of Pauses in Poetry Declamation

by Plinio A. Barbosa

Languages 2023, 8(1), 76; https://doi.org/10.3390/languages8010076 - 8 Mar 2023

Cited by 2 | Viewed by 3085

Abstract

In poetry declamation, the appropriate use of prosody to cause pleasure is essential. Among the prosodic parameters, pause is one of the most effective to engage the listeners and provide them with a pleasant experience. The declamation of three poems in two varieties [...] Read more.

In poetry declamation, the appropriate use of prosody to cause pleasure is essential. Among the prosodic parameters, pause is one of the most effective to engage the listeners and provide them with a pleasant experience. The declamation of three poems in two varieties of Portuguese by ten Brazilian Portuguese (BP) speakers and ten European Portuguese (EP) speakers, balanced for gender, was used as a corpus for evaluating the degree of pleasantness by listeners from the same language variety. The distributions of pause duration and inter-pause interval (IPI) both varied greatly across the subjects, being the main source of variability and strongly right-tailed. The evaluation of the degree of pleasantness revealed that pause duration predicts degree of pleasantness in EP, whereas IPI predicts degree of pleasantness in BP. Reciters perform a kind of complex “dance”, where sonority between pauses is favored in BP and pause duration in EP. Full article

(This article belongs to the Special Issue Pauses in Speech)

► Show Figures

Figure 1

14 pages, 1014 KB

Open AccessArticle

Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis

by Weizhao Zhang and Hongwu Yang

Appl. Sci. 2022, 12(23), 12185; https://doi.org/10.3390/app122312185 - 28 Nov 2022

Cited by 1 | Viewed by 2220

Abstract

The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is [...] Read more.

The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is a separate encoder Mandarin-Tibetan cross-lingual TTS. Both baseline TTS use the speaker classifier with a gradient reversal layer to disentangle speaker-specific information from the text encoder. At the same time, we design a prosody generator to extract prosodic information from sentences to explore syntactic and semantic information adequately. To further improve the synthesized speech quality of the Tacotron2-based Mandarin-Tibetan cross-lingual TTS, we propose a meta-learning-based Mandarin-Tibetan cross-lingual TTS. Based on the separate encoder Mandarin-Tibetan cross-lingual TTS, we use an additional dynamic network to predict the parameters of the language-dependent text encoder that could realize better cross-lingual knowledge sharing in the sequence-to-sequence TTS. Lastly, we synthesize Mandarin or Tibetan speech through the unique acoustic model. The baseline experimental results show that the separate encoder Mandarin-Tibetan cross-lingual TTS could handle the input of different languages better than the shared encoder Mandarin-Tibetan cross-lingual TTS. The experimental results further show that the proposed meta-learning-based Mandarin-Tibetan cross-lingual speech synthesis method could effectively improve the voice quality of synthesized speech in terms of naturalness and speaker similarity. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 6632 KB

Open AccessArticle

Prosodic Transfer in Contact Varieties: Vocative Calls in Metropolitan and Basaá-Cameroonian French

by Fatima Hamlaoui, Marzena Żygis, Jonas Engelmann and Sergio I. Quiroz

Languages 2022, 7(4), 285; https://doi.org/10.3390/languages7040285 - 7 Nov 2022

Cited by 1 | Viewed by 6481

Abstract

This paper examines the production of vocative calls in (Northern) Metropolitan French (MF) and Cameroonian French (CF) as it is spoken by native speakers of a tone language, Basaá. While the results of our Discourse Completion Task confirm previous descriptions of MF, they [...] Read more.

This paper examines the production of vocative calls in (Northern) Metropolitan French (MF) and Cameroonian French (CF) as it is spoken by native speakers of a tone language, Basaá. While the results of our Discourse Completion Task confirm previous descriptions of MF, they also further our understanding of the relationship between pragmatics and prosody across different groups of French speakers. MF favors the vocative chant in routine contexts and a rising-falling contour in urgent contexts. In contrast, context has little influence on the choice of contour in CF. A melody consisting of the surface realization of lexical tones is produced in both contexts. Regarding acoustic parameters, context only exerts a significant effect on the loudness of vocative calls (RMS amplitude) and has little effect on their F0 height, F0 range and duration. A target-use of vocative calls in CF thus does not amount to target-like use of the original standard target language, MF. Our results provide novel evidence for the transfer of lexical tones onto the contact variety of an intonation language. They also corroborate previous studies involving the pragmatics-prosody interface: the more marked a prosodic pattern is (here, the vocative chant), the more difficult it is to acquire. Full article

(This article belongs to the Special Issue The Effects of Cross-Language Differences on Bilingual Production and/or Perception of Sentence-Level Intonation)

► Show Figures

Figure 1

12 pages, 415 KB

Open AccessArticle

Acoustic Identification of Sentence Accent in Speakers with Dysarthria: Cross-Population Validation and Severity Related Patterns

by Viviana Mendoza Ramos, Anja Lowit, Leen Van den Steen, Hector Arturo Kairuz Hernandez-Diaz, Maria Esperanza Hernandez-Diaz Huici, Marc De Bodt and Gwen Van Nuffelen

Brain Sci. 2021, 11(10), 1344; https://doi.org/10.3390/brainsci11101344 - 13 Oct 2021

Cited by 3 | Viewed by 3347

Abstract

Dysprosody is a hallmark of dysarthria, which can affect the intelligibility and naturalness of speech. This includes sentence accent, which helps to draw listeners’ attention to important information in the message. Although some studies have investigated this feature, we currently lack properly validated [...] Read more.

Dysprosody is a hallmark of dysarthria, which can affect the intelligibility and naturalness of speech. This includes sentence accent, which helps to draw listeners’ attention to important information in the message. Although some studies have investigated this feature, we currently lack properly validated automated procedures that can distinguish between subtle performance differences observed across speakers with dysarthria. This study aims for cross-population validation of a set of acoustic features that have previously been shown to correlate with sentence accent. In addition, the impact of dysarthria severity levels on sentence accent production is investigated. Two groups of adults were analysed (Dutch and English speakers). Fifty-eight participants with dysarthria and 30 healthy control participants (HCP) produced sentences with varying accent positions. All speech samples were evaluated perceptually and analysed acoustically with an algorithm that extracts ten meaningful prosodic features and allows a classification between accented and unaccented syllables based on a linear combination of these parameters. The data were statistically analysed using discriminant analysis. Within the Dutch and English dysarthric population, the algorithm correctly identified 82.8 and 91.9% of the accented target syllables, respectively, indicating that the capacity to discriminate between accented and unaccented syllables in a sentence is consistent with perceptual impressions. Moreover, different strategies for accent production across dysarthria severity levels could be demonstrated, which is an important step toward a better understanding of the nature of the deficit and the automatic classification of dysarthria severity using prosodic features. Full article

(This article belongs to the Special Issue Motor Speech Disorders and Prosody)

► Show Figures

Figure 1

23 pages, 2357 KB

Open AccessArticle

Levodopa-Based Changes on Vocalic Speech Movements during Prosodic Prominence Marking

by Tabea Thies, Doris Mücke, Richard Dano and Michael T. Barbe

Brain Sci. 2021, 11(5), 594; https://doi.org/10.3390/brainsci11050594 - 4 May 2021

Cited by 12 | Viewed by 3424

Abstract

The present study investigates speech changes in Parkinson’s disease on the acoustic and articulatory level with respect to prosodic prominence marking. To display movements of the underlying articulators, speech data from 16 patients with Parkinson’s disease were recorded using electromagnetic articulography. Speech tasks [...] Read more.

The present study investigates speech changes in Parkinson’s disease on the acoustic and articulatory level with respect to prosodic prominence marking. To display movements of the underlying articulators, speech data from 16 patients with Parkinson’s disease were recorded using electromagnetic articulography. Speech tasks focused on strategies of prominence marking. Patients’ ability to encode prominence in the laryngeal and supra-laryngeal domain is tested in two conditions to examine the influence of motor performance on speech production further: without dopaminergic medication and with dopaminergic medication. The data reveal that patients with Parkinson’s disease are able to highlight important information in both conditions. They maintain prominence relations across- and within-accentuation by adjusting prosodic markers, such as vowel duration and pitch modulation, while the acoustic vowel space remains the same. For differentiating across-accentuation, not only intensity but also all temporal and spatial parameters related to the articulatory tongue body movements during the production of vowels are modulated to signal prominence. In response to the levodopa intake, gross motor performance improved significantly by 42%. The improvement in gross motor performance was accompanied by an improvement in speech motor performance in terms of louder speech and shorter, larger and faster tongue body movements. The tongue body is more agile under levodopa increase, a fact that is not necessarily detectable on the acoustic level but important for speech therapy. Full article

(This article belongs to the Special Issue Motor Speech Disorders and Prosody)

► Show Figures

Figure 1

16 pages, 1520 KB

Open AccessArticle

A Distributed Approach to Speaker Count Problem in an Open-Set Scenario by Clustering Pitch Features

by Sakshi Pandey and Amit Banerjee

Information 2021, 12(4), 157; https://doi.org/10.3390/info12040157 - 9 Apr 2021

Viewed by 2644

Abstract

Counting the number of speakers in an audio sample can lead to innovative applications, such as a real-time ranking system. Researchers have studied advanced machine learning approaches for solving the speaker count problem. However, these solutions are not efficient in real-time environments, as [...] Read more.

Counting the number of speakers in an audio sample can lead to innovative applications, such as a real-time ranking system. Researchers have studied advanced machine learning approaches for solving the speaker count problem. However, these solutions are not efficient in real-time environments, as it requires pre-processing of a finite set of data samples. Another approach for solving the problem is via unsupervised learning or by using audio processing techniques. The research in this category is limited and does not consider the large-scale open set environment. In this paper, we propose a distributed clustering approach to address the speaker count problem. The separability of the speaker is computed using statistical pitch parameters. The proposed solution uses multiple microphones available in smartphones in a large geographical area to capture and extract statistical pitch features from the audio samples. These features are shared between the nodes to estimate the number of speakers in the neighborhood. One of the major challenges is to reduce the error count that arises due to the proximity of the users and multiple microphones. We evaluate the algorithm’s performance using real smartphones in a multi-group arrangement by capturing parallel conversations between the users in both indoor and outdoor scenarios. The average error count distance is 1.667 in a multi-group scenario. The average error count distances in indoor environments are 16% which is better than in the outdoor environment. Full article

► Show Figures

Figure 1

26 pages, 379 KB

Open AccessArticle

Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

by Aitor Álvarez, Basilio Sierra, Andoni Arruti, Juan-Miguel López-Gil and Nestor Garay-Vitoria

Sensors 2016, 16(1), 21; https://doi.org/10.3390/s16010021 - 25 Dec 2015

Cited by 17 | Viewed by 7848

Abstract

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means [...] Read more.

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI