MDPI - Publisher of Open Access Journals

20 pages, 19399 KB

Open AccessArticle

Speech Inpainting Based on Multi-Layer Long Short-Term Memory Networks

by Haohan Shi, Xiyu Shi and Safak Dogan

Future Internet 2024, 16(2), 63; https://doi.org/10.3390/fi16020063 - 17 Feb 2024

Cited by 3 | Viewed by 2623

Audio inpainting plays an important role in addressing incomplete, damaged, or missing audio signals, contributing to improved quality of service and overall user experience in multimedia communications over the Internet and mobile networks. This paper presents an innovative solution for speech inpainting using Long Short-Term Memory (LSTM) networks, i.e., a restoring task where the missing parts of speech signals are recovered from the previous information in the time domain. The lost or corrupted speech signals are also referred to as gaps. We regard the speech inpainting task as a time-series prediction problem in this research work. To address this problem, we designed multi-layer LSTM networks and trained them on different speech datasets. Our study aims to investigate the inpainting performance of the proposed models on different datasets and with varying LSTM layers and explore the effect of multi-layer LSTM networks on the prediction of speech samples in terms of perceived audio quality. The inpainted speech quality is evaluated through the Mean Opinion Score (MOS) and a frequency analysis of the spectrogram. Our proposed multi-layer LSTM models are able to restore up to 1 s of gaps with high perceptual audio quality using the features captured from the time domain only. Specifically, for gap lengths under 500 ms, the MOS can reach up to 3~4, and for gap lengths ranging between 500 ms and 1 s, the MOS can reach up to 2~3. In the time domain, the proposed models can proficiently restore the envelope and trend of lost speech signals. In the frequency domain, the proposed models can restore spectrogram blocks with higher similarity to the original signals at frequencies less than 2.0 kHz and comparatively lower similarity at frequencies in the range of 2.0 kHz~8.0 kHz. Full article

(This article belongs to the Special Issue Deep Learning and Natural Language Processing II)

► Show Figures

Figure 1

8 pages, 219 KB

Open AccessEditorial

Deep Learning Applications with Practical Measured Results in Electronics Industries

by Mong-Fong Horng, Hsu-Yang Kung, Chi-Hua Chen and Feng-Jang Hwang

Electronics 2020, 9(3), 501; https://doi.org/10.3390/electronics9030501 - 19 Mar 2020

Cited by 8 | Viewed by 3938

Abstract

This editorial introduces the Special Issue, entitled “Deep Learning Applications with Practical Measured Results in Electronics Industries”, of Electronics. Topics covered in this issue include four main parts: (I) environmental information analyses and predictions, (II) unmanned aerial vehicle (UAV) and object tracking applications, (III) measurement and denoising techniques, and (IV) recommendation systems and education systems. Four papers on environmental information analyses and predictions are as follows: (1) “A Data-Driven Short-Term Forecasting Model for Offshore Wind Speed Prediction Based on Computational Intelligence” by Panapakidis et al.; (2) “Multivariate Temporal Convolutional Network: A Deep Neural Networks Approach for Multivariate Time Series Forecasting” by Wan et al.; (3) “Modeling and Analysis of Adaptive Temperature Compensation for Humidity Sensors” by Xu et al.; (4) “An Image Compression Method for Video Surveillance System in Underground Mines Based on Residual Networks and Discrete Wavelet Transform” by Zhang et al. Three papers on UAV and object tracking applications are as follows: (1) “Trajectory Planning Algorithm of UAV Based on System Positioning Accuracy Constraints” by Zhou et al.; (2) “OTL-Classifier: Towards Imaging Processing for Future Unmanned Overhead Transmission Line Maintenance” by Zhang et al.; (3) “Model Update Strategies about Object Tracking: A State of the Art Review” by Wang et al. Five papers on measurement and denoising techniques are as follows: (1) “Characterization and Correction of the Geometric Errors in Using Confocal Microscope for Extended Topography Measurement. Part I: Models, Algorithms Development and Validation” by Wang et al.; (2) “Characterization and Correction of the Geometric Errors Using a Confocal Microscope for Extended Topography Measurement, Part II: Experimental Study and Uncertainty Evaluation” by Wang et al.; (3) “Deep Transfer HSI Classification Method Based on Information Measure and Optimal Neighborhood Noise Reduction” by Lin et al.; (4) “Quality Assessment of Tire Shearography Images via Ensemble Hybrid Faster Region-Based ConvNets” by Chang et al.; (5) “High-Resolution Image Inpainting Based on Multi-Scale Neural Network” by Sun et al. Two papers on recommendation systems and education systems are as follows: (1) “Deep Learning-Enhanced Framework for Performance Evaluation of a Recommending Interface with Varied Recommendation Position and Intensity Based on Eye-Tracking Equipment Data Processing” by Sulikowski et al. and (2) “Generative Adversarial Network Based Neural Audio Caption Model for Oral Evaluation” by Zhang et al. Full article

(This article belongs to the Special Issue Deep Learning Applications with Practical Measured Results in Electronics Industries)

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI