applsci-logo

Journal Browser

Journal Browser

Advances and Applications of Audio and Speech Signal Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 3772

Special Issue Editors


E-Mail Website
Guest Editor
National Engineering Research Center for Speech and Language Information Processing (NERC-SLIP), University of Science and Technology of China, Hefei, China
Interests: voice signal processing; wireless acoustic sensor networks; automatic speech recognition; sound source localization

E-Mail Website
Guest Editor
Énergie Matériaux Télécommunications Research Centre, Institut National de la Recherche Scientifique, Quebec, QC J3X 1P7, Canada
Interests: DSP; speech embedding; speech processing; deep learning; speech recognition; speaker recognition

Special Issue Information

Dear Colleagues,

In this Special Issue, original research article and reviews are welcome. Topics may include (but are not limited to) the following:

  1. Audio and speech modeling, speech coding and transmission.
  2. Single/multiple microphone signal processing for speech enhancement/separation, sound source localization/tracking, speech dereverberation, active noise control, and echo cancellation.
  3. Audio for multimedia and audio processing systems, e.g., audiovisual speech enhancement/recognition/localization.
  4. Bioacoustics and medical acoustics, e.g., using EEG/fMRI measurements to assist speech processing.
  5. The detection and classification of acoustic scenes and events, and the modeling, analysis and synthesis of acoustic environments.
  6. Music information retrieval, and music signal analysis, processing and synthesis.
  7. Speech quality and intelligibility measures, auditory modeling and hearing instruments.
  8. Speech production, speech perception and psychoacoustics.
  9. Speech synthesis and generation, and spatial stereo sound production.
  10. Automatic speech recognition.
  11. Speaker recognition and identity/privacy preservation.
  12. Advanced machine learning methods with application to audio and speech signal processing.

Dr. Jie Zhang
Dr. Douglas O'Shaughnessy
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • speech signal processing
  • audio signal processing
  • speech recognition
  • music signal analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 2085 KiB  
Article
How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?
by Tae-Jin Yoon
Appl. Sci. 2024, 14(23), 10972; https://doi.org/10.3390/app142310972 - 26 Nov 2024
Viewed by 978
Abstract
The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying [...] Read more.
The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems. Full article
(This article belongs to the Special Issue Advances and Applications of Audio and Speech Signal Processing)
Show Figures

Figure 1

23 pages, 421 KiB  
Article
On Beamforming with the Single-Sideband Transform
by Vitor Probst Curtarelli and Israel Cohen
Appl. Sci. 2024, 14(17), 7514; https://doi.org/10.3390/app14177514 - 25 Aug 2024
Viewed by 651
Abstract
In this paper, we examine the use of the Single-Sideband Transform (SSBT) for convolutive beamformers. We explore its unique properties and implications for beamformer design. Our study sheds light on the tradeoffs involved in using the SSBT in beamforming applications, offering insights into [...] Read more.
In this paper, we examine the use of the Single-Sideband Transform (SSBT) for convolutive beamformers. We explore its unique properties and implications for beamformer design. Our study sheds light on the tradeoffs involved in using the SSBT in beamforming applications, offering insights into both its strengths and limitations. Despite the advantage of having real-valued coefficients, we show that the convolution handling of the transform presents challenges that impact fundamental beamforming principles. When compared to the Short-Time Fourier Transform (STFT), the SSBT displays lower robustness, especially in scenarios involving mismatch and modeling noise. Notably, we establish a direct equivalence between the SSBT and STFT when using identical transform parameters, enabling their seamless interchangeability and joint use in time–frequency signal enhancements. We validate our theoretical findings through realistic simulations using the Minimum-Power Distortionless Response beamformer. These simulations illustrate that although the STFT performs marginally better than the SSBT under optimal conditions, it outperforms significantly in non-ideal scenarios. Full article
(This article belongs to the Special Issue Advances and Applications of Audio and Speech Signal Processing)
Show Figures

Figure 1

15 pages, 507 KiB  
Article
Automatic Age and Gender Recognition Using Ensemble Learning
by Ergün Yücesoy
Appl. Sci. 2024, 14(16), 6868; https://doi.org/10.3390/app14166868 - 6 Aug 2024
Cited by 2 | Viewed by 1675
Abstract
The use of speech-based recognition technologies in human–computer interactions is increasing daily. Age and gender recognition, one of these technologies, is a popular research topic used directly or indirectly in many applications. In this research, a new age and gender recognition approach based [...] Read more.
The use of speech-based recognition technologies in human–computer interactions is increasing daily. Age and gender recognition, one of these technologies, is a popular research topic used directly or indirectly in many applications. In this research, a new age and gender recognition approach based on the ensemble of different machine learning algorithms is proposed. In the study, five different classifiers, namely KNN, SVM, LR, RF, and E-TREE, are used as base-level classifiers and the majority voting and stacking methods are used to create the ensemble models. First, using MFCC features, five base-level classifiers are created and the performance of each model is evaluated. Then, starting from the one with the highest performance, these classifiers are combined and ensemble models are created. In the study, eight different ensemble models are created and the performances of each are examined separately. The experiments conducted with the Turkish subsection of the Mozilla Common Voice dataset show that the ensemble models increase the recognition accuracy, and the highest accuracy of 97.41% is achieved with the ensemble model created by stacking five classifiers (SVM, E-TREE, RF, KNN, and LR). According to this result, the proposed ensemble model achieves superior accuracy compared to similar studies in recognizing age and gender from speech signals. Full article
(This article belongs to the Special Issue Advances and Applications of Audio and Speech Signal Processing)
Show Figures

Figure 1

Back to TopTop