Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (23)

Search Parameters:
Keywords = audio denoising

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
9 pages, 1717 KiB  
Proceeding Paper
Generative AI Respiratory and Cardiac Sound Separation Using Variational Autoencoders (VAEs)
by Arshad Jamal, R. Kanesaraj Ramasamy and Junaidi Abdullah
Comput. Sci. Math. Forum 2025, 10(1), 9; https://doi.org/10.3390/cmsf2025010009 - 1 Jul 2025
Viewed by 178
Abstract
The separation of respiratory and cardiac sounds is a significant challenge in biomedical signal processing due to their overlapping frequency and time characteristics. Traditional methods struggle with accurate extraction in noisy or diverse clinical environments. This study explores the application of machine learning, [...] Read more.
The separation of respiratory and cardiac sounds is a significant challenge in biomedical signal processing due to their overlapping frequency and time characteristics. Traditional methods struggle with accurate extraction in noisy or diverse clinical environments. This study explores the application of machine learning, particularly convolutional neural networks (CNNs), to overcome these obstacles. Advanced machine learning models, denoising algorithms, and domain adaptation strategies address challenges such as frequency overlap, external noise, and limited labeled datasets. This study presents a robust methodology for detecting heart and lung diseases from audio signals using advanced preprocessing, feature extraction, and deep learning models. The approach integrates adaptive filtering and bandpass filtering as denoising techniques and variational autoencoders (VAEs) for feature extraction. The extracted features are input into a CNN, which classifies audio signals into different heart and lung conditions. The results highlight the potential of this combined approach for early and accurate disease detection, contributing to the development of reliable diagnostic tools for healthcare. Full article
Show Figures

Figure 1

19 pages, 2225 KiB  
Article
A Bird Vocalization Classification Method Based on Bidirectional FBank with Enhanced Robustness
by Chizhou Peng, Yan Zhang, Jing Lu, Danjv Lv and Yanjiao Xiong
Appl. Sci. 2025, 15(9), 4913; https://doi.org/10.3390/app15094913 - 28 Apr 2025
Viewed by 397
Abstract
Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes [...] Read more.
Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes the iWAVE-BiFBank method, an innovative approach combining improved wavelet adaptive denoising (iWAVE) and a bidirectional Mel-filter bank (BiFBank) for effective birdsong classification with enhanced robustness. The iWAVE method achieves adaptive optimization using the autocorrelation coefficient and peak-sum-ratio (PSR), overcoming the manual adjustments required with and incompleteness of traditional methods. BiFBank combines FBank and inverse FBank (iFBank) to enhance feature representation. This fusion addresses the shortcomings of FBank and introduces novel transformation methods and filter designs to iFBank, with a focus on high-frequency components. The iWAVE-BiFBank method creates a robust feature set, which can effectively reduce the noise of audio signals and capture both low- and high-frequency information. Experiments were conducted on a dataset of 16 species of birds, and the proposed method was verified with a random forest (RF) classifier. The results show that iWAVE-BiFBank achieves an accuracy of 94.00%, with other indicators, including the F1 score, exceeding 93.00%, outperforming all other tested methods. Overall, the proposed method effectively reduces audio noise, comprehensively captures the characteristics of bird vocalization, and provides improved classification performance. Full article
Show Figures

Figure 1

22 pages, 4759 KiB  
Article
An Improved Nonnegative Matrix Factorization Algorithm Combined with K-Means for Audio Noise Reduction
by Yan Liu, Haozhen Zhu, Yongtuo Cui, Xiaoyu Yu, Haibin Wu and Aili Wang
Electronics 2024, 13(20), 4132; https://doi.org/10.3390/electronics13204132 - 21 Oct 2024
Viewed by 1304
Abstract
Clustering algorithms have the characteristics of being simple and efficient and can complete calculations without a large number of datasets, making them suitable for application in noise reduction processing for audio module mass production testing. In order to solve the problems of the [...] Read more.
Clustering algorithms have the characteristics of being simple and efficient and can complete calculations without a large number of datasets, making them suitable for application in noise reduction processing for audio module mass production testing. In order to solve the problems of the NMF algorithm easily getting stuck in local optimal solutions and difficult feature signal extraction, an improved NMF audio denoising algorithm combined with K-means initialization was designed. Firstly, the Euclidean distance formula of K-means has been improved to extract audio signal features from multiple dimensions. Combined with the initialization strategy of K-means decomposition, the initialization dictionary matrix of the NMF algorithm has been optimized to avoid getting stuck in local optimal solutions and effectively improve the robustness of the algorithm. Secondly, in the sparse coding part of the NMF algorithm, feature extraction expressions are added to solve the problem of noise residue and partial spectral signal loss in audio signals during the operation process. At the same time, the size of the coefficient matrix is limited to reduce operation time and improve the accuracy of feature extraction in high-precision audio signals. Then, comparative experiments were conducted using the NOIZEUS and NOISEX-92 datasets, as well as random noise audio signals. This algorithm improved the signal-to-noise ratio by 10–20 dB and reduced harmonic distortion by approximately −10 dB. Finally, a high-precision audio acquisition unit based on FPGA was designed, and practical applications have shown that it can effectively improve the signal-to-noise ratio of audio signals and reduce harmonic distortion. Full article
Show Figures

Figure 1

15 pages, 9401 KiB  
Article
Sound Sensing: Generative and Discriminant Model-Based Approaches to Bolt Loosening Detection
by Liehai Cheng, Zhenli Zhang, Giuseppe Lacidogna, Xiao Wang, Mutian Jia and Zhitao Liu
Sensors 2024, 24(19), 6447; https://doi.org/10.3390/s24196447 - 5 Oct 2024
Cited by 2 | Viewed by 1285
Abstract
The detection of bolt looseness is crucial to ensure the integrity and safety of bolted connection structures. Percussion-based bolt looseness detection provides a simple and cost-effective approach. However, this method has some inherent shortcomings that limit its application. For example, it highly depends [...] Read more.
The detection of bolt looseness is crucial to ensure the integrity and safety of bolted connection structures. Percussion-based bolt looseness detection provides a simple and cost-effective approach. However, this method has some inherent shortcomings that limit its application. For example, it highly depends on the inspector’s hearing and experience and is more easily affected by ambient noise. In this article, a whole set of signal processing procedures are proposed and a new kind of damage index vector is constructed to strengthen the reliability and robustness of this method. Firstly, a series of audio signal preprocessing algorithms including denoising, segmenting, and smooth filtering are performed in the raw audio signal. Then, the cumulative energy entropy (CEE) and mel frequency cepstrum coefficients (MFCCs) are utilized to extract damage index vectors, which are used as input vectors for generative and discriminative classifier models (Gaussian discriminant analysis and support vector machine), respectively. Finally, multiple repeated experiments are conducted to verify the effectiveness of the proposed method and its ability to detect the bolt looseness in terms of audio signal. The testing accuracy of the trained model approaches 90% and 96.7% under different combinations of torque levels, respectively. Full article
Show Figures

Figure 1

15 pages, 5957 KiB  
Article
Deformer: Denoising Transformer for Improved Audio Music Genre Classification
by Jigang Wang, Shuyu Li and Yunsick Sung
Appl. Sci. 2023, 13(23), 12673; https://doi.org/10.3390/app132312673 - 25 Nov 2023
Cited by 7 | Viewed by 3120
Abstract
Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in longer training times and convergence difficulties. To overcome these problems, [...] Read more.
Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in longer training times and convergence difficulties. To overcome these problems, a traditional transformer-based approach was introduced. However, this approach employs pre-training based on momentum contrast (MoCo), a technique that increases computational costs owing to its reliance on extracting many negative samples and its use of highly sensitive hyperparameters. Consequently, this complicates the training process and increases the risk of learning imbalances between positive and negative sample sets. In this paper, a method for audio music genre classification called Deformer is proposed. The Deformer learns deep representations of audio music data through a denoising process, eliminating the need for MoCo and additional hyperparameters, thus reducing computational costs. In the denoising process, it employs a prior decoder to reconstruct the audio patches, thereby enhancing the interpretability of the representations. By calculating the mean squared error loss between the reconstructed and real patches, Deformer can learn a more refined representation of the audio data. The performance of the proposed method was experimentally compared with that of two distinct baseline models: one based on S3T and one employing a residual neural network-bidirectional gated recurrent unit (ResNet-BiGRU). The Deformer achieved an 84.5% accuracy, surpassing both the ResNet-BiGRU-based (81%) and S3T-based (81.1%) models, highlighting its superior performance in audio classification. Full article
Show Figures

Figure 1

26 pages, 5971 KiB  
Article
An Urban Acoustic Rainfall Estimation Technique Using a CNN Inversion Approach for Potential Smart City Applications
by Mohammed I. I. Alkhatib, Amin Talei, Tak Kwin Chang, Valentijn R. N. Pauwels and Ming Fai Chow
Smart Cities 2023, 6(6), 3112-3137; https://doi.org/10.3390/smartcities6060139 - 16 Nov 2023
Cited by 6 | Viewed by 2364
Abstract
The need for robust rainfall estimation has increased with more frequent and intense floods due to human-induced land use and climate change, especially in urban areas. Besides the existing rainfall measurement systems, citizen science can offer unconventional methods to provide complementary rainfall data [...] Read more.
The need for robust rainfall estimation has increased with more frequent and intense floods due to human-induced land use and climate change, especially in urban areas. Besides the existing rainfall measurement systems, citizen science can offer unconventional methods to provide complementary rainfall data for enhancing spatial and temporal data coverage. This demand for accurate rainfall data is particularly crucial in the context of smart city innovations, where real-time weather information is essential for effective urban planning, flood management, and environmental sustainability. Therefore, this study provides proof-of-concept for a novel method of estimating rainfall intensity using its recorded audio in an urban area, which can be incorporated into a smart city as part of its real-time weather forecasting system. This study proposes a convolutional neural network (CNN) inversion model for acoustic rainfall intensity estimation. The developed CNN rainfall sensing model showed a significant improvement in performance over the traditional approach, which relies on the loudness feature as an input, especially for simulating rainfall intensities above 60 mm/h. Also, a CNN-based denoising framework was developed to attenuate unwanted noises in rainfall recordings, which achieved up to 98% accuracy on the validation and testing datasets. This study and its promising results are a step towards developing an acoustic rainfall sensing tool for citizen-science applications in smart cities. However, further investigation is necessary to upgrade this proof-of-concept for practical applications. Full article
Show Figures

Figure 1

20 pages, 8635 KiB  
Article
Hiding Full-Color Images into Audio with Visual Enhancement via Residual Networks
by Hwai-Tsu Hu and Tung-Tsun Lee
Cryptography 2023, 7(4), 47; https://doi.org/10.3390/cryptography7040047 - 29 Sep 2023
Cited by 1 | Viewed by 2174
Abstract
Watermarking is a viable approach for safeguarding the proprietary rights of digital media. This study introduces an innovative fast Fourier transform (FFT)-based phase modulation (PM) scheme that facilitates efficient and effective blind audio watermarking at a remarkable rate of 508.85 numeric values per [...] Read more.
Watermarking is a viable approach for safeguarding the proprietary rights of digital media. This study introduces an innovative fast Fourier transform (FFT)-based phase modulation (PM) scheme that facilitates efficient and effective blind audio watermarking at a remarkable rate of 508.85 numeric values per second while still retaining the original quality. Such a payload capacity makes it possible to embed a full-color image of 64 × 64 pixels within an audio signal of just 24.15 s. To bolster the security of watermark images, we have also implemented the Arnold transform in conjunction with chaotic encryption. Our comprehensive analysis and evaluation confirm that the proposed FFT–PM scheme exhibits exceptional imperceptibility, rendering the hidden watermark virtually undetectable. Additionally, the FFT–PM scheme shows impressive robustness against common signal-processing attacks. To further enhance the visual rendition of the recovered color watermarks, we propose using residual neural networks to perform image denoising and super-resolution reconstruction after retrieving the watermarks. The utilization of the residual networks contributes to noticeable improvements in perceptual quality, resulting in higher levels of zero-normalized cross-correlation in cases where the watermarks are severely damaged. Full article
Show Figures

Figure 1

12 pages, 655 KiB  
Article
Filtering of Audio Signals Using Discrete Wavelet Transforms
by H. K. Nigam and H. M. Srivastava
Mathematics 2023, 11(19), 4117; https://doi.org/10.3390/math11194117 - 28 Sep 2023
Cited by 10 | Viewed by 2798
Abstract
Nonlinear diffusion has been proved to be an indispensable approach for the removal of noise in image processing. In this paper, we employ nonlinear diffusion for the purpose of denoising audio signals in order to have this approach also recognized as a powerful [...] Read more.
Nonlinear diffusion has been proved to be an indispensable approach for the removal of noise in image processing. In this paper, we employ nonlinear diffusion for the purpose of denoising audio signals in order to have this approach also recognized as a powerful tool for audio signal processing. We apply nonlinear diffusion to wavelet coefficients obtained from different filters associated with orthogonal and biorthogonal wavelets. We use wavelet decomposition to keep signal components well-localized in time. We compare denoising results using nonlinear diffusion with wavelet shrinkage for different wavelet filters. Our experiments and results show that the denoising is much improved by using the nonlinear diffusion process. Full article
(This article belongs to the Section E4: Mathematical Physics)
Show Figures

Figure 1

19 pages, 882 KiB  
Article
Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
by Caleb Rascon
Sensors 2023, 23(9), 4394; https://doi.org/10.3390/s23094394 - 29 Apr 2023
Cited by 12 | Viewed by 3684
Abstract
Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in [...] Read more.
Deep learning-based speech-enhancement techniques have recently been an area of growing interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio-processing scenarios (i.e., feeding the model, in one go, a complete audio recording, which may extend several seconds). It is of significant interest to evaluate and characterize the current state-of-the-art in applications that process audio online (i.e., feeding the model a sequence of segments of audio data, concatenating the results at the output end). Although evaluations and comparisons between speech-enhancement techniques have been carried out before, as far as the author knows, the work presented here is the first that evaluates the performance of such techniques in relation to their online applicability. This means that this work measures how the output signal-to-interference ratio (as a separation metric), the response time, and memory usage (as online metrics) are impacted by the input length (the size of audio segments), in addition to the amount of noise, amount and number of interferences, and amount of reverberation. Three popular models were evaluated, given their availability on public repositories and online viability, MetricGAN+, Spectral Feature Mapping with Mimic Loss, and Demucs-Denoiser. The characterization was carried out using a systematic evaluation protocol based on the Speechbrain framework. Several intuitions are presented and discussed, and some recommendations for future work are proposed. Full article
Show Figures

Figure 1

20 pages, 4667 KiB  
Article
Design and Implementation of Machine Tool Life Inspection System Based on Sound Sensing
by Tsung-Hsien Liu, Jun-Zhe Chi, Bo-Lin Wu, Yee-Shao Chen, Chung-Hsun Huang and Yuan-Sun Chu
Sensors 2023, 23(1), 284; https://doi.org/10.3390/s23010284 - 27 Dec 2022
Cited by 3 | Viewed by 2267
Abstract
The main causes of damage to industrial machinery are aging, corrosion, and the wear of parts, which affect the accuracy of machinery and product precision. Identifying problems early and predicting the life cycle of a machine for early maintenance can avoid costly plant [...] Read more.
The main causes of damage to industrial machinery are aging, corrosion, and the wear of parts, which affect the accuracy of machinery and product precision. Identifying problems early and predicting the life cycle of a machine for early maintenance can avoid costly plant failures. Compared with other sensing and monitoring instruments, sound sensors are inexpensive, portable, and have less computational data. This paper proposed a machine tool life cycle model with noise reduction. The life cycle model uses Mel-Frequency Cepstral Coefficients (MFCC) to extract audio features. A Deep Neural Network (DNN) is used to understand the relationship between audio features and life cycle, and then determine the audio signal corresponding to the aging degree. The noise reduction model simulates the actual environment by adding noise and extracts features by Power Normalized Cepstral Coefficients (PNCC), and designs Mask as the DNN’s learning target to eliminate the effect of noise. The effect of the denoising model is improved by 6.8% under Short-Time Objective Intelligibility (STOI). There is a 3.9% improvement under Perceptual Evaluation of Speech Quality (PESQ). The life cycle model accuracy before denoising is 76%. After adding the noise reduction system, the accuracy of the life cycle model is increased to 80%. Full article
Show Figures

Figure 1

19 pages, 4410 KiB  
Article
Cicada Species Recognition Based on Acoustic Signals
by Wan Teng Tey, Tee Connie, Kan Yeep Choo and Michael Kah Ong Goh
Algorithms 2022, 15(10), 358; https://doi.org/10.3390/a15100358 - 28 Sep 2022
Cited by 8 | Viewed by 3315
Abstract
Traditional methods used to identify and monitor insect species are time-consuming, costly, and fully dependent on the observer’s ability. This paper presents a deep learning-based cicada species recognition system using acoustic signals to classify the cicada species. The sound recordings of cicada species [...] Read more.
Traditional methods used to identify and monitor insect species are time-consuming, costly, and fully dependent on the observer’s ability. This paper presents a deep learning-based cicada species recognition system using acoustic signals to classify the cicada species. The sound recordings of cicada species were collected from different online sources and pre-processed using denoising algorithms. An improved Härmä syllable segmentation method is introduced to segment the audio signals into syllables since the syllables play a key role in identifying the cicada species. After that, a visual representation of the audio signal was obtained using a spectrogram, which was fed to a convolutional neural network (CNN) to perform classification. The experimental results validated the robustness of the proposed method by achieving accuracies ranging from 66.67% to 100%. Full article
(This article belongs to the Special Issue Machine Learning for Time Series Analysis)
Show Figures

Figure 1

20 pages, 2415 KiB  
Article
Defending against FakeBob Adversarial Attacks in Speaker Verification Systems with Noise-Adding
by Zesheng Chen, Li-Chi Chang, Chao Chen, Guoping Wang and Zhuming Bi
Algorithms 2022, 15(8), 293; https://doi.org/10.3390/a15080293 - 17 Aug 2022
Cited by 7 | Viewed by 2754
Abstract
Speaker verification systems use human voices as an important biometric to identify legitimate users, thus adding a security layer to voice-controlled Internet-of-things smart homes against illegal access. Recent studies have demonstrated that speaker verification systems are vulnerable to adversarial attacks such as FakeBob. [...] Read more.
Speaker verification systems use human voices as an important biometric to identify legitimate users, thus adding a security layer to voice-controlled Internet-of-things smart homes against illegal access. Recent studies have demonstrated that speaker verification systems are vulnerable to adversarial attacks such as FakeBob. The goal of this work is to design and implement a simple and light-weight defense system that is effective against FakeBob. We specifically study two opposite pre-processing operations on input audios in speak verification systems: denoising that attempts to remove or reduce perturbations and noise-adding that adds small noise to an input audio. Through experiments, we demonstrate that both methods are able to weaken the ability of FakeBob attacks significantly, with noise-adding achieving even better performance than denoising. Specifically, with denoising, the targeted attack success rate of FakeBob attacks can be reduced from 100% to 56.05% in GMM speaker verification systems, and from 95% to only 38.63% in i-vector speaker verification systems, respectively. With noise adding, those numbers can be further lowered down to 5.20% and 0.50%, respectively. As a proactive measure, we study several possible adaptive FakeBob attacks against the noise-adding method. Experiment results demonstrate that noise-adding can still provide a considerable level of protection against these countermeasures. Full article
Show Figures

Figure 1

16 pages, 7680 KiB  
Article
Audio Denoising Coprocessor Based on RISC-V Custom Instruction Set Extension
by Jun Yuan, Qiang Zhao, Wei Wang, Xiangsheng Meng, Jun Li and Qin Li
Acoustics 2022, 4(3), 538-553; https://doi.org/10.3390/acoustics4030033 - 29 Jun 2022
Cited by 4 | Viewed by 3958
Abstract
As a typical active noise control algorithm, Filtered-x Least Mean Square (FxLMS) is widely used in the field of audio denoising. In this study, an audio denoising coprocessor based on Retrenched Injunction System Computer-V (RISC-V), a custom instruction set extension was designed and [...] Read more.
As a typical active noise control algorithm, Filtered-x Least Mean Square (FxLMS) is widely used in the field of audio denoising. In this study, an audio denoising coprocessor based on Retrenched Injunction System Computer-V (RISC-V), a custom instruction set extension was designed and a software and hardware co-design was adopted; based on the traditional pure hardware implementation, the accelerator optimization design was carried out, and the accelerator was connected to the RISC-V core in the form of coprocessor. Meanwhile, the corresponding custom instructions were designed, the compiling environment was established, and the library function of coprocessor acceleration instructions was established by embedded inline assembly. Finally, the active noise control (ANC) system was built and tested based on Hbird E203-Core, and the test data were collected through an audio analyzer. The results showed that the audio denoising algorithm can be realized by combining a heterogeneous System on Chip (SoC) with a hardware accelerator, and the denoising effect was approximately 8 dB. The number of instructions consumed by testing custom instructions for specific operations was reduced by approximately 60%, and the operation acceleration effect was significant. Full article
(This article belongs to the Special Issue Acoustics, Speech and Signal Processing)
Show Figures

Figure 1

14 pages, 4863 KiB  
Article
Mesh Denoising Based on Recurrent Neural Networks
by Yan Xing, Jieqing Tan, Peilin Hong, Yeyuan He and Min Hu
Symmetry 2022, 14(6), 1233; https://doi.org/10.3390/sym14061233 - 14 Jun 2022
Cited by 6 | Viewed by 2517
Abstract
Mesh denoising is a classical task in mesh processing. Many state-of-the-art methods are still unable to quickly and robustly denoise multifarious noisy 3D meshes, especially in the case of high noise. Recently, neural network-based models have played a leading role in natural language, [...] Read more.
Mesh denoising is a classical task in mesh processing. Many state-of-the-art methods are still unable to quickly and robustly denoise multifarious noisy 3D meshes, especially in the case of high noise. Recently, neural network-based models have played a leading role in natural language, audio, image, video, and 3D model processing. Inspired by these works, we propose a data-driven mesh denoising method based on recurrent neural networks, which learns the relationship between the feature descriptors and the ground-truth normals. The recurrent neural network has a feedback loop before entering the output layer. By means of the self-feedback of neurons, the output of a recurrent neural network is related not only to the current input but also to the output of the previous moments. To deal with meshes with various geometric features, we use k-means to cluster the faces of the mesh according to geometric similarity and train neural networks for each category individually in the offline learning stage. Each network model, acting similar to a normal regression function, will map the geometric feature descriptor of each facet extracted from the mesh to the denoised facet normal. Then, the denoised normals are used to calculate the new feature descriptors, which become the input of the next similar regression model. In this system, three normal regression modules are cascaded to generate the last facet normals. Lastly, the model’s vertex positions are updated according to the denoised normals. A large number of visual and numerical results have demonstrated that the proposed model outperforms the state-of-the-art methods in most cases. Full article
(This article belongs to the Special Issue Symmetry and Applications in Cognitive Robotics)
Show Figures

Figure 1

14 pages, 6264 KiB  
Article
Defending Adversarial Attacks against DNN Image Classification Models by a Noise-Fusion Method
by Lin Shi, Teyi Liao and Jianfeng He
Electronics 2022, 11(12), 1814; https://doi.org/10.3390/electronics11121814 - 8 Jun 2022
Cited by 10 | Viewed by 4309
Abstract
Adversarial attacks deceive deep neural network models by adding imperceptibly small but well-designed attack data to the model input. Those attacks cause serious problems. Various defense methods have been provided to defend against those attacks by: (1) providing adversarial training according to specific [...] Read more.
Adversarial attacks deceive deep neural network models by adding imperceptibly small but well-designed attack data to the model input. Those attacks cause serious problems. Various defense methods have been provided to defend against those attacks by: (1) providing adversarial training according to specific attacks; (2) denoising the input data; (3) preprocessing the input data; and (4) adding noise to various layers of models. Here we provide a simple but effective Noise-Fusion Method (NFM) to defend adversarial attacks against DNN image classification models. Without knowing any details about attacks or models, NFM not only adds noise to the model input at run time, but also to the training data at training time. Two l-attacks, the Fast Gradient Signed Method (FGSM) and the Projected Gradient Descent (PGD), and one l1-attack, the Sparse L1 Descent (SLD), are applied to evaluate defense effects of the NFM on various deep neural network models which used MNIST and CIFAR-10 datasets. Various amplitude noises with different statistical distribution are applied to show the defense effects of the NFM in different noise. The NFM also compares with an adversarial training method on MNIST and CIFAR-10 datasets. Results show that adding noise to the input images and the training images not only defends against all three adversarial attacks but also improves robustness of corresponding models. The results indicate possibly generalized defense effects of the NFM which can extend to other adversarial attacks. It also shows potential application of the NFM to models not only with image input but also with voice or audio input. Full article
Show Figures

Figure 1

Back to TopTop