Advances in Acoustic, Speech, and Signal Processing and Recognition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Circuit and Signal Processing".

Deadline for manuscript submissions: 15 December 2026 | Viewed by 888

Special Issue Editor


E-Mail Website
Guest Editor
1. Department of Electrical Engineering and Department of Mechanical Engineering, Columbia University, New York, NY 10027, USA
2. Recognition Technologies, Inc., South Salem, NY 10590, USA
Interests: speaker recognition; speech recognition; music recognition; machine learning; handwriting recognition; structural health monitoring; nonlinear control systems; learning adaptive control; face and object recognition; neural network architecture; neural network learning

Special Issue Information

Dear Colleagues,

This is a Special Issue on the latest advancements in Acoustic, Speech, and Signal Processing and Recognition. The most important objective of this Special Issue is to bring together the latest techniques that aim at making significant improvements to accuracy, efficiency, model size, processing speed, and other practical aspects related to improving the quality of the signals of interest and to provide superior recognition techniques. Original research in preprocessing and feature extraction, as well as back-end systems such as modeling and machine learning techniques, is welcome. Of course, End-to-End systems are also entertained, if they are not treated as black-box processing. The Special Issue aims at exploring modern techniques in the listed areas, such that there is a definite theoretical and technical basis, and the justification for the improvements observed in any experiments is thoroughly discussed. Presentation of learning techniques through domain transfers is also highly encouraged.

I would like to invite you to submit your latest research results in the following or related areas. The objective is to include the most recent advancements in Acoustic, Speech, and Signal Processing and Recognition. This Special Issue concentrates on techniques designed to improve recognition through new modeling or new signal processing techniques. The overall aim is to compile a list of new approaches that address accuracy, efficiency, model size, processing speed, and other practical considerations. The following is a suggested list of topics that may be considered. However, if you believe that you have results on related topics that would fit the above objective, you are more than welcome to submit your original research.

  • Automatic Speech Recognition (ASR);
  • Speaker Recognition – ID, Verification, and Classification;
  • Speech Synthesis and Text-to-Speech (TTS);
  • Speech Enhancement and Denoising;
  • Acoustic Echo Cancellation (AEC) and Active Noise Control (ANC);
  • Noise Reduction and Suppression Techniques;
  • Source Localization, Microphone Arrays, Beamforming;
  • Far-Field Speech Processing and Recognition;
  • Feature Extraction (e.g., MFCC, LPC, PLP, Wavelet Transforms);
  • Modern Acoustic Modeling (e.g., SSM, Conformers, Transformers, GRU, LSTM, TDNN, PINN, new architectures);
  • Language Modeling for Speech Systems;
  • Keyword and Spoken Term Detection (e.g., low power, compact models);
  • Emotion and Paralinguistic Recognition in Speech;
  • Prosody and Intonation Analysis;
  • Transfer Learning Techniques (e.g., Inter-Domain) in Acoustic and Signal Recognition;
  • Music Recognition (e.g., mode recognition, voice separation, timbre transfer, automatic transcription);
  • Robust Recognition in Noisy Environments;
  • End-to-End Speech Recognition Systems;
  • Deep Learning for Acoustic Signal Processing;
  • New Machine Learning Architectures (including but not limited to new NN Architectures);
  • Large Language Models (LLMs) in Speech Recognition and Diarization;
  • Speaker Diarization, Audio Indexing, and Retrieval;
  • Privacy and Security in Speaker Data (e.g., Voice Spoofing, Obfuscation);
  • Multimodal Signal Processing (Audio-Visual);
  • Code-Switching and Multilingual Speech Recognition;
  • Accurate Word and Phone Alignment (timestamp estimation);
  • Postprocessing (e.g., Punctuation and Case Correction, Content-Based output selection/modification);
  • Real-Time and Low-Latency Signal Processing;
  • Applications in Hearing Aids and Assistive Technologies;
  • Medical Diagnosis through Speech Processing;
  • Children’s Speech Recognition;
  • Acoustic Event and Scene Classification;
  • Voice Conversion and Voice Cloning;
  • Signal Recognition, such as in Structural and Machine Health Monitoring, Earthquake Warning, Medical Signal Recognition, etc.

I look forward to receiving your contributions.

Prof. Dr. Homayoon Beigi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • speech recognition
  • speaker recognition
  • acoustic and event detection
  • signal recognition
  • acoustic processing
  • acoustic modeling
  • acoustic feature extraction
  • speaker diarization
  • music mode recognition
  • automatic music transcription

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 16882 KB  
Article
Audio-Sensitive Speech Emotion Recognition via Content- Independent Pretraining and Threshold-Based Fusion
by Zhaojie Luo, Huaming Xu and Shuqiong Wu
Electronics 2026, 15(11), 2313; https://doi.org/10.3390/electronics15112313 - 27 May 2026
Viewed by 149
Abstract
Speech emotion recognition (SER) has attracted increasing attention in human–computer interaction, mental health monitoring, and multimedia retrieval. However, many existing multimodal SER systems exhibit a strong bias toward the text modality: because utterance-level labels are often easily inferred from lexical content, models tend [...] Read more.
Speech emotion recognition (SER) has attracted increasing attention in human–computer interaction, mental health monitoring, and multimedia retrieval. However, many existing multimodal SER systems exhibit a strong bias toward the text modality: because utterance-level labels are often easily inferred from lexical content, models tend to under-utilize non-verbal acoustic cues, which can lead to erroneous predictions when crucial emotional information is predominantly conveyed by prosodic and spectral features. To alleviate this imbalance, we propose an audio-sensitive SER framework that explicitly enhances the contribution of the audio modality through a two-step strategy. First, we construct an Audio Sensitive Network (ASN) by pretraining on the parallel Emotional Speech Dataset (ESD), in which identical linguistic content is spoken with different emotions. This setting allows the ASN to learn speech content-independent emotional representations that emphasize paralinguistic information. Second, we introduce a threshold fusion scheme that integrates the ASN with existing SER classifiers. Specifically, we employ the Tree-structured Parzen Estimator (TPE) to optimize label-wise decision thresholds, enabling flexible calibration of the joint prediction space across modalities and models. We conduct experiments on both the IEMOCAP and ESD corpora, comparing multiple baseline classifiers with and without the proposed audio-sensitive enhancement. The results show consistent, albeit moderate, improvements in emotion recognition performance (e.g., up to +11.7% absolute accuracy on angry for MMAN on IEMOCAP), particularly for emotions that rely heavily on prosodic and spectral cues, thereby demonstrating the effectiveness of the proposed framework in boosting audio sensitivity within multimodal SER systems. Full article
(This article belongs to the Special Issue Advances in Acoustic, Speech, and Signal Processing and Recognition)
Show Figures

Figure 1

43 pages, 13812 KB  
Article
A Novel Dual-Branch Bi-Mamba Architecture for Acoustic Cough Segmentation
by Turgay Koç
Electronics 2026, 15(9), 1930; https://doi.org/10.3390/electronics15091930 - 2 May 2026
Viewed by 275
Abstract
Precise temporal segmentation of acoustic cough signals is critical for digital health, yet existing literature predominantly focuses on simple event detection rather than exact boundary delineation. To bridge this gap, we introduce a comprehensive benchmarking framework specifically designed to systematically evaluate continuous boundary [...] Read more.
Precise temporal segmentation of acoustic cough signals is critical for digital health, yet existing literature predominantly focuses on simple event detection rather than exact boundary delineation. To bridge this gap, we introduce a comprehensive benchmarking framework specifically designed to systematically evaluate continuous boundary detection performance using modern deep learning architectures. Built upon this evaluation paradigm, we propose a novel Dual-Branch Bi-Mamba architecture that effectively integrates the local morphological feature extraction capabilities of a 2D U-Net with the long-range sequential modeling power of 1D Bidirectional State-Space Models (SSMs). Evaluated on the clinical DKPNet41 dataset, the proposed compact 0.54-million-parameter model achieved an F1-Score of 87.66% while reducing offset boundary error by over 50%. Operating 56× faster than real time on a standard CPU, this study establishes a reliable evaluation framework for precise boundary segmentation and provides a computationally efficient architectural solution for high-resolution automated acoustic signal processing. Full article
(This article belongs to the Special Issue Advances in Acoustic, Speech, and Signal Processing and Recognition)
Show Figures

Figure 1

Back to TopTop