Acoustics, Speech and Signal Processing

A special issue of Acoustics (ISSN 2624-599X).

Deadline for manuscript submissions: closed (31 March 2023) | Viewed by 26035

Special Issue Editors


E-Mail Website
Guest Editor
School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588-0115, USA
Interests: hardware systems security; internet of things; cyber physical systems; wireless and mobile networks
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Interests: aerospace systems
Department of Electrical Engineering, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan
Interests: machine learning; signal processing; dictionary learning

Special Issue Information

Dear Colleagues,

Engineers and scientists have been studying the phenomenon of speech communication (even before Alexander Graham Bell’s revolutionary invention) with an intent to develop more efficient and effective human-to-human and human-to-machine communication systems. Although digital signal processing took the center stage in speech communication research in the 1960s, it is still critical for exploiting the gains of decades of research. Artificial intelligence has greatly influenced speech processing during the last decade, with much research improving the performance of acoustics and speech processing systems. Therefore, this Special Issue focuses on recent state-of-the-art techniques for improving acoustic and speech signal processing using digital signal processing techniques and/or artificial intelligence techniques. Papers submitted to this Special Issue will be judged on the importance of the problem, originality of the solution, and contribution toward the existing body of knowledge. Topics of interest include but are not limited to the following:

  • Applied signal processing systems;
  • Speech and language processing;
  • Speech enhancement;
  • Deep learning/machine learning for signal processing;
  • Dictionary learning for signal processing;
  • Audio and acoustic signal processing;
  • Internet of Multimedia Things;
  • Information forensics, privacy, and security;
  • Multimedia signal processing;
  • Remote sensing and signal processing;
  • Sensor array and multichannel signal processing;
  • Signal processing for big data;
  • Signal processing for communication and networking;
  • Signal processing for cybersecurity;
  • Signal processing for education;
  • Signal processing for the entertainment industry.

Dr. Muhammad Naveed Aman
Dr. Anwar Ali
Dr. Asif Iqbal
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Acoustics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

29 pages, 3497 KiB  
Article
Temporal Howling Detector for Speech Reinforcement Systems
by Yehav Alkaher and Israel Cohen
Acoustics 2022, 4(4), 967-995; https://doi.org/10.3390/acoustics4040060 - 15 Nov 2022
Cited by 3 | Viewed by 3226
Abstract
In this paper, we address the problem of howling detection in speech reinforcement system applications for utilization in howling control mechanisms. A general speech reinforcement system acquires speech from a speaker’s microphone, and delivers a reinforced speech to other listeners in the same [...] Read more.
In this paper, we address the problem of howling detection in speech reinforcement system applications for utilization in howling control mechanisms. A general speech reinforcement system acquires speech from a speaker’s microphone, and delivers a reinforced speech to other listeners in the same room, or another room, through loudspeakers. The amount of gain that can be applied to the acquired speech in the closed-loop system is constrained by electro-acoustic coupling in the system, manifested in howling noises appearing as a result of acoustic feedback. A howling detection algorithm aims to early detect frequency-howls in the system, before the human ear notices. The proposed algorithm includes two cascaded stages: Soft Howling Detection and Howling False-Alarm Detection. The Soft Howling Detection is based on the temporal magnitude-slope-deviation measure, identifying potential candidate frequency-howls. Inspired by the temporal approach, the Howling False-Alarm Detection stage considers the understanding of speech-signal frequency components’ magnitude behavior under different levels of acoustic feedback. A comprehensive howling detection performance evaluation process is designed, examining the proposed algorithm in terms of detection accuracy and the time it takes for detection, under a devised set of howling scenarios. The performance improvement of the proposed algorithm, with respect to a plain magnitude-slope-deviation-based method, is demonstrated by showing faster detection response times over a set of howling change-rate configurations. The two-staged proposed algorithm also provides a significant recall improvement, while improving the precision decrease via the Howling False-Alarm Detection stage. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

10 pages, 358 KiB  
Article
Accelerated Conjugate Gradient for Second-Order Blind Signal Separation
by Hai Huyen Dam and Sven Nordholm
Acoustics 2022, 4(4), 948-957; https://doi.org/10.3390/acoustics4040058 - 11 Nov 2022
Viewed by 2223
Abstract
This paper proposes a new adaptive algorithm for the second-order blind signal separation (BSS) problem with convolutive mixtures by utilising a combination of an accelerated gradient and a conjugate gradient method. For each iteration of the adaptive algorithm, the search point and the [...] Read more.
This paper proposes a new adaptive algorithm for the second-order blind signal separation (BSS) problem with convolutive mixtures by utilising a combination of an accelerated gradient and a conjugate gradient method. For each iteration of the adaptive algorithm, the search point and the search direction are obtained based on the current and the previous iterations. The algorithm efficiently calculates the step size for the accelerated conjugate gradient algorithm in each iteration. Simulation results show that the proposed accelerated conjugate gradient algorithm with optimal step size converges faster than the accelerated descent algorithm and the steepest descent algorithm with optimal step size while having lower computational complexity. In particular, the number of iterations required for convergence of the accelerated conjugate gradient algorithm is significantly lower than the accelerated descent algorithm and the steepest descent algorithm. In addition, the proposed system achieves improvement in terms of the signal to interference ratio and signal to noise ratio for the dominant speech outputs. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

18 pages, 3845 KiB  
Article
Horizontal and Vertical Voice Directivity Characteristics of Sung Vowels in Classical Singing
by Manuel Brandner, Matthias Frank and Alois Sontacchi
Acoustics 2022, 4(4), 849-866; https://doi.org/10.3390/acoustics4040051 - 1 Oct 2022
Cited by 5 | Viewed by 3290
Abstract
Singing voice directivity for five sustained German vowels /a:/, /e:/, /i:/, /o:/, /u:/ over a wide pitch range was investigated using a multichannel microphone array with high spatial resolution along the horizontal and vertical axes. A newly created dataset allows to examine voice [...] Read more.
Singing voice directivity for five sustained German vowels /a:/, /e:/, /i:/, /o:/, /u:/ over a wide pitch range was investigated using a multichannel microphone array with high spatial resolution along the horizontal and vertical axes. A newly created dataset allows to examine voice directivity in classical singing with high resolution in angle and frequency. Three voice production modes (phonation modes) modal, breathy, and pressed that could affect the used mouth opening and voice directivity were investigated. We present detailed results for singing voice directivity and introduce metrics to discuss the differences of complex voice directivity patterns of the whole data in a more compact form. Differences were found between vowels, pitch, and gender (voice types with corresponding vocal range). Differences between the vowels /a:, e:, i:/ and /o:, u:/ and pitch can be addressed by simplified metrics up to about d2/D5/587 Hz, but we found that voice directivity generally depends strongly on pitch. Minor differences were found between voice production modes and found to be more pronounced for female singers. Voice directivity differs at low pitch between vowels with front vowels being most directional. We found that which of the front vowels is most directional depends on the evaluated pitch. This seems to be related to the complex radiation pattern of the human voice, which involves a large inter-subjective variability strongly influenced by the shape of the torso, head, and mouth. All recorded classical sung vowels at high pitches exhibit similar high directionality. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

16 pages, 1843 KiB  
Article
Effectiveness of MP3 Coding Depends on the Music Genre: Evaluation Using Semantic Differential Scales
by Nikolaos M. Papadakis, Ioanna Aroni and Georgios E. Stavroulakis
Acoustics 2022, 4(3), 704-719; https://doi.org/10.3390/acoustics4030042 - 27 Aug 2022
Cited by 2 | Viewed by 4220
Abstract
MPEG-1 Layer 3 (MP3) is one of the most popular compression formats used for sound and especially for music. However, during the coding process, the MP3 algorithm negatively affects the spectral and dynamic characteristics of the audio file being compressed. The aim of [...] Read more.
MPEG-1 Layer 3 (MP3) is one of the most popular compression formats used for sound and especially for music. However, during the coding process, the MP3 algorithm negatively affects the spectral and dynamic characteristics of the audio file being compressed. The aim of this study is to evaluate the effect the MP3 coding format for different music genres and different bitrates via listening tests in which the original uncompressed files and the compressed files are compared. For this purpose, five different music genres were selected (rock, jazz, electronic, classical and solo instrument), and the files were compressed in three different bitrates (96 kbps, 160 kbps and 320 kbps). The semantic differential method was used, and ten bipolar scales were selected for the listening tests (e.g., better–worse, more distortion–less distortion, etc.). The following are the most important findings of this study: classical music was negatively affected the most among the genres due to the MP3 compression (lowest ratings in 8 out of 10 bipolar scales), the solo instrument was least affected among the genres (highest rating in 7 out of 10 bipolar scales), and for higher bit rates, the differences in ratings were small for all music genres. The findings of this study could be used to optimize and adapt the standard, depending on the music genre and the musical piece that needs to be encoded. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

19 pages, 2256 KiB  
Article
Double-Talk Detection-Aided Residual Echo Suppression via Spectrogram Masking and Refinement
by Eran Shachar, Israel Cohen and Baruch Berdugo
Acoustics 2022, 4(3), 637-655; https://doi.org/10.3390/acoustics4030039 - 25 Aug 2022
Viewed by 3238
Abstract
Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. [...] Read more.
Acoustic echo in full-duplex telecommunication systems is a common problem that may cause desired-speech quality degradation during double-talk periods. This problem is especially challenging in low signal-to-echo ratio (SER) scenarios, such as hands-free conversations over mobile phones when the loudspeaker volume is high. This paper proposes a two-stage deep-learning approach to residual echo suppression focused on the low SER scenario. The first stage consists of a speech spectrogram masking model integrated with a double-talk detector (DTD). The second stage consists of a spectrogram refinement model optimized for speech quality by minimizing a perceptual evaluation of speech quality (PESQ) related loss function. The proposed integration of DTD with the masking model outperforms several other configurations based on previous studies. We conduct an ablation study that shows the contribution of each part of the proposed system. We evaluate the proposed system in several SERs and demonstrate its efficiency in the challenging setting of a very low SER. Finally, the proposed approach outperforms competing methods in several residual echo suppression metrics. We conclude that the proposed system is well-suited for the task of low SER residual echo suppression. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

16 pages, 7680 KiB  
Article
Audio Denoising Coprocessor Based on RISC-V Custom Instruction Set Extension
by Jun Yuan, Qiang Zhao, Wei Wang, Xiangsheng Meng, Jun Li and Qin Li
Acoustics 2022, 4(3), 538-553; https://doi.org/10.3390/acoustics4030033 - 29 Jun 2022
Cited by 3 | Viewed by 3291
Abstract
As a typical active noise control algorithm, Filtered-x Least Mean Square (FxLMS) is widely used in the field of audio denoising. In this study, an audio denoising coprocessor based on Retrenched Injunction System Computer-V (RISC-V), a custom instruction set extension was designed and [...] Read more.
As a typical active noise control algorithm, Filtered-x Least Mean Square (FxLMS) is widely used in the field of audio denoising. In this study, an audio denoising coprocessor based on Retrenched Injunction System Computer-V (RISC-V), a custom instruction set extension was designed and a software and hardware co-design was adopted; based on the traditional pure hardware implementation, the accelerator optimization design was carried out, and the accelerator was connected to the RISC-V core in the form of coprocessor. Meanwhile, the corresponding custom instructions were designed, the compiling environment was established, and the library function of coprocessor acceleration instructions was established by embedded inline assembly. Finally, the active noise control (ANC) system was built and tested based on Hbird E203-Core, and the test data were collected through an audio analyzer. The results showed that the audio denoising algorithm can be realized by combining a heterogeneous System on Chip (SoC) with a hardware accelerator, and the denoising effect was approximately 8 dB. The number of instructions consumed by testing custom instructions for specific operations was reduced by approximately 60%, and the operation acceleration effect was significant. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

19 pages, 1601 KiB  
Article
Speech Enhancement Framework with Noise Suppression Using Block Principal Component Analysis
by Abdullah Zaini Alsheibi, Kimon P. Valavanis, Asif Iqbal and Muhammad Naveed Aman
Acoustics 2022, 4(2), 441-459; https://doi.org/10.3390/acoustics4020027 - 20 May 2022
Cited by 1 | Viewed by 3967
Abstract
With the advancement in voice-communication-based human–machine interface technology in smart home devices, the ability to decompose the received speech signal into a signal of interest and an interference component has emerged as a key requirement for their successful operation. These devices perform their [...] Read more.
With the advancement in voice-communication-based human–machine interface technology in smart home devices, the ability to decompose the received speech signal into a signal of interest and an interference component has emerged as a key requirement for their successful operation. These devices perform their tasks in real time based on the received commands, and their effectiveness is limited when there is a lot of ambient noise in the area in which they operate. Most real-time speech enhancement algorithms do not perform adequately well in the presence of high amounts of noise (i.e., low input-signal-to-noise ratio). In this manuscript, we propose a speech enhancement framework to help these algorithms in situations when the noise level in the received signal is high. The proposed framework performs noise suppression in the frequency domain by generating an approximation of the noisy signals’ short-time Fourier transform, which is then used by the speech enhancement algorithms to recover the underlying clean signal. This approximation is performed by using the proposed block principal component analysis (Block-PCA) algorithm. To illustrate efficacy of the proposed framework, we present a detailed performance evaluation under different noise levels and noise types, highlighting the effectiveness of the proposed framework. Moreover, the proposed method can be used in conjunction with any speech enhancement algorithm to improve its performance under moderate to high noise scenarios. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

Back to TopTop