sensors-logo

Journal Browser

Journal Browser

Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 30 June 2025 | Viewed by 2612

Special Issue Editors


E-Mail Website
Guest Editor
National Key Laboratory of Parallel and Distributed Processing; National University of Defense Technology, Changsha, China
Interests: acoustic signal processing; machine learning; and intelligent software systems

E-Mail Website
Guest Editor
College of Computer Science, Inner Mongolia University, Hohhot 010031, China
Interests: acoustic signal processing; speech synthesis; human-machine conversation

E-Mail Website
Guest Editor
School of Information Science and Technology; Beijing University of Technology; Beijing 100124, China
Interests: speech and audio coding; multichannel audio signal processing; and array signal processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
National Key Laboratory of Parallel and Distributed Processing, National University of Defense Technology, Changsha, China
Interests: distributed computing; machine learning; intelligent software systems

Special Issue Information

Dear Colleagues,

The intersection of technology and acoustics has ushered in a new era of innovation in signal processing.  This Special Issue, "Advances in Automatic Speech Recognition, Audio, and Underwater Acoustic Signal Analysis", is dedicated to exploring the latest breakthroughs in these dynamic fields.  With a focus on applying advanced algorithms, machine learning, and sensor technologies, we aim to present a comprehensive view of the current state of research and its potential impact on future developments. 

We invite scholars, researchers, and industry experts to contribute their insights, fostering a multidisciplinary dialogue that propels the field forward. The themes include:

  • Automatic Speech Recognition;
  • Audio Signal Acquisition;
  • Audio Signal Processing;
  • Audio and Underwater Acoustic Signal Recognition and Classification;
  • Machine Learning and Deep Learning algorithm Application in Audio Signal Analysis;
  • Safety and Privacy;
  • Acceleration and Deployment of Audio Signal Processing Algorithms.

The Special Issue, "Advances in Automatic Speech Recognition, Audio, and Underwater Acoustic Signal Analysis", is inherently linked to "Sensors", focusing on the critical role of sensor technology in capturing and processing acoustic signals.  It explores the application of advanced algorithms and machine learning to enhance signal recognition and classification, emphasizing the importance of sensor data quality for effective audio analysis.  Addressing safety, privacy, and algorithm deployment, the issue underscores the multidisciplinary innovation driven by sensor advancements in acoustic signal processing.

Dr. Kele Xu
Prof. Dr. Rui Liu
Prof. Dr. Maoshen Jia
Dr. Dawei Feng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • automatic speech recognition
  • audio signal acquisition
  • audio signal processing
  • audio and underwater acoustic signal recognition and classification
  • machine learning and deep learning algorithm application in audio signal analysis
  • safety and privacy
  • acceleration and deployment of audio signal processing algorithms

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5438 KiB  
Article
Separation of Simultaneous Speakers with Acoustic Vector Sensor
by Józef Kotus and Grzegorz Szwoch
Sensors 2025, 25(5), 1509; https://doi.org/10.3390/s25051509 - 28 Feb 2025
Viewed by 381
Abstract
This paper presents a method of sound source separation in live audio signals, based on sound intensity analysis. Sound pressure signals recorded with an acoustic vector sensor are analyzed, and the spectral distribution of sound intensity in two dimensions is calculated. Spectral components [...] Read more.
This paper presents a method of sound source separation in live audio signals, based on sound intensity analysis. Sound pressure signals recorded with an acoustic vector sensor are analyzed, and the spectral distribution of sound intensity in two dimensions is calculated. Spectral components of the analyzed signal are selected based on the calculated source direction, which leads to a spatial filtration of the sound. The experiments were performed with test signals convolved with impulse responses of a real sensor, recorded for a varying sound source position. The experiments evaluated the proposed method’s ability to separate sound sources, depending on their position, spectral content, and signal-to-noise ratio, especially when multiple sources are active at the same time. The obtained results are presented and discussed. The proposed algorithm provided signal-to-distortion ratio (SDR) values 10–12 dB, and Short-Time Objective Intelligibility Measure (STOI) values in the range 0.86–0.94, an increase by 0.15–0.30 compared with the unprocessed speech signal. The proposed method is intended for applications in automated speech recognition systems, speaker diarization, and separation in the concurrent speech scenarios, using a small acoustic sensor. Full article
Show Figures

Figure 1

15 pages, 1603 KiB  
Article
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition
by Changhwan Go, Young Han Lee, Taewoo Kim, Nam In Park and Chanjun Chun
Sensors 2024, 24(19), 6213; https://doi.org/10.3390/s24196213 - 25 Sep 2024
Viewed by 1655
Abstract
Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve [...] Read more.
Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples—those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset. Full article
Show Figures

Figure 1

Back to TopTop