Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = speech envelope estimation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 2154 KiB  
Article
A Novel Computationally Efficient Approach for Exploring Neural Entrainment to Continuous Speech Stimuli Incorporating Cross-Correlation
by Luong Do Anh Quan, Le Thi Trang, Hyosung Joo, Dongseok Kim and Jihwan Woo
Appl. Sci. 2023, 13(17), 9839; https://doi.org/10.3390/app13179839 - 31 Aug 2023
Viewed by 1627
Abstract
A linear system identification technique has been widely used to track neural entrainment in response to continuous speech stimuli. Although the approach of the standard regularization method using ridge regression provides a straightforward solution to estimate and interpret neural responses to continuous speech [...] Read more.
A linear system identification technique has been widely used to track neural entrainment in response to continuous speech stimuli. Although the approach of the standard regularization method using ridge regression provides a straightforward solution to estimate and interpret neural responses to continuous speech stimuli, inconsistent results and costly computational processes can arise due to the need for parameter tuning. We developed a novel approach to the system identification method called the detrended cross-correlation function, which aims to map stimulus features to neural responses using the reverse correlation and derivative of convolution. This non-parametric (i.e., no need for parametric tuning) approach can maintain consistent results. Moreover, it provides a computationally efficient training process compared to the conventional method of ridge regression. The detrended cross-correlation function correctly captures the temporal response function to speech envelope and the spectral–temporal receptive field to speech spectrogram in univariate and multivariate forward models, respectively. The suggested model also provides more efficient computation compared to the ridge regression to process electroencephalography (EEG) signals. In conclusion, we suggest that the detrended cross-correlation function can be comparably used to investigate continuous speech- (or sound-) evoked EEG signals. Full article
(This article belongs to the Special Issue Modern Advances in Neurolinguistics and EEG Language Processing)
Show Figures

Figure 1

17 pages, 1125 KiB  
Article
Investigations on the Optimal Estimation of Speech Envelopes for the Two-Stage Speech Enhancement
by Yanjue Song and Nilesh Madhu
Sensors 2023, 23(14), 6438; https://doi.org/10.3390/s23146438 - 16 Jul 2023
Cited by 2 | Viewed by 1723
Abstract
Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility [...] Read more.
Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement. Full article
(This article belongs to the Special Issue Machine Learning and Signal Processing Based Acoustic Sensors)
Show Figures

Figure 1

18 pages, 1768 KiB  
Article
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
by Adam Kurowski, Jozef Kotus, Piotr Odya and Bozena Kostek
Sensors 2022, 22(4), 1641; https://doi.org/10.3390/s22041641 - 19 Feb 2022
Cited by 4 | Viewed by 2364
Abstract
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. [...] Read more.
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way of objectively measuring the quality of, e.g., an acoustical adaptation of conference rooms or public address systems. The wide use of this measure and implementation of this method on numerous measurement devices make STI a popular choice when the speech-related quality of rooms has to be estimated. However, the STI measure has a significant drawback which excludes it from some particular use cases. For instance, if one would like to enhance speech intelligibility by employing a nonlinear digital processing algorithm, the STI method is not suitable to measure the impact of such an algorithm, as it requires that the measurement signal should not be altered in a nonlinear way. Consequently, if a nonlinear speech enhancing algorithm has to be tested, the STI—a standard way of estimating speech transmission cannot be used. In this work, we would like to propose a method based on the STI method but modified in such a way that it makes it possible to employ it for the estimation of the performance of the nonlinear speech intelligibility enhancement method. The proposed approach is based upon a broadband comparison of cumulated energy of the transmitted envelope modulation and the received modulation, so we called it broadband STI (bSTI). Its credibility with regard to signals altered by the environment or nonlinear speech changed by a DSP algorithm is checked by performing a comparative analysis of ten selected impulse responses for which a baseline value of STI was known. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

15 pages, 2096 KiB  
Article
A Novel Scheme for Single-Channel Speech Dereverberation
by Nikolaos Kilis and Nikolaos Mitianoudis
Acoustics 2019, 1(3), 711-725; https://doi.org/10.3390/acoustics1030042 - 5 Sep 2019
Cited by 8 | Viewed by 4462
Abstract
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal [...] Read more.
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal matching pursuit on overcomplete bases, trained by the K-SVD algorithm. Our method includes an estimation of reverberation and mixing time from a recorded hand clap or a simulated room impulse response, which are used to create a time-domain envelope. Late reverberation is suppressed at the second stage by estimating its energy from the previous envelope and removed with spectral subtraction. Further speech enhancement is applied on minimizing the background noise, based on optimal smoothing and minimum statistics. Experimental results indicate favorable quality, compared to two state-of-the-art methods, especially in real reverberant environments with increased reverberation and background noise. Full article
Show Figures

Figure 1

Back to TopTop