Anomalous Sound Detection by Fusing Spectral Enhancement and Frequency-Gated Attention
Abstract
1. Introduction
- SpecNet spectral enhancement branch. Built upon existing Log-Mel spectrograms and machine ID self-supervised classification frameworks, a dedicated spectral enhancement branch SpecNet is developed to explicitly model spectral features and enrich discriminative information in spectral representations.
- Frequency-Gated Attention (FGA) Module. The FGA module adaptively adjusts the weights of Log-Mel spectrograms and SpecNet spectral features across time–frequency units based on temporal context provided by TAgram. This highlights critical frequency bands and temporal intervals associated with anomalies, yielding more discriminative spectral representations.
- SoftTriple metric learning loss. SoftTriple metric learning loss is jointly optimized with Noisy-ArcMix classification loss. This further compresses sample embeddings within the same machine ID through multi-center prototype constraints while widening the gap between different machine IDs. This clarifies the category structure in the embedding space, enhancing the ability to distinguish machines with highly similar acoustic features.
- Comprehensive evaluation. Systematic experiments conducted on the DCASE 2020 Task 2 dataset demonstrate that the proposed FGASpecNet achieves 95.04% average AUC and 89.68% pAUC, representing improvements of 0.97% and 0.72% over the baseline, respectively, validating the effectiveness of the proposed approach.
2. Materials and Methods
2.1. Acoustic Feature Enhancement
2.1.1. Tgram
2.1.2. Log-Mel
2.1.3. TAgram
2.1.4. Specgram
2.2. Frequency-Gated Attention
2.3. Joint Loss Optimization
2.3.1. Noisy-ArcMix
2.3.2. SoftTriple
2.4. Dataset
3. Results
3.1. Experimental Setup
3.1.1. Implementation Details
3.1.2. Anomaly Score Computation
3.1.3. Evaluation Metrics
3.2. Performance Comparison
3.3. Ablation Analysis
3.4. Hyperparameter Analysis
3.5. Representation Analysis and Visualization
3.6. Parameter Count Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Nomenclature
| ASD | Anomalous sound detection |
| Machine ID | Machine identifier |
| Log-Mel | Log-Mel spectrogram |
| FGASpecNet | Proposed model Fusing Spectral Enhancement and Frequency-Gated Attention |
| DCASE | Detection and classification of acoustic scenes and events |
| AUC | Area under the receiver operating characteristic (ROC) curve |
| pAUC | Partial area under the receiver operating characteristic (ROC) curve |
| STFT | Short-time Fourier transform |
| FFT/IFFT | (Inverse) fast Fourier transform |
| 2D-FFT/2D-IFFT | Two-dimensional (inverse) fast Fourier transform |
| MIMII | Malfunctioning Industrial Machine Investigation and Inspection (dataset) |
| ToyADMOS | Toy Acoustic Anomaly Detection in Machine Operating Sounds (dataset) |
References
- Thoben, K.D.; Wiesner, S.; Wuest, T. “Industrie 4.0” and smart manufacturing—A review of research issues and application examples. Int. J. Autom. Technol. 2017, 11, 4–16. [Google Scholar] [CrossRef]
- Zhou, J.; Li, P.; Zhou, Y.; Wang, B.; Zang, J.; Meng, L. Toward new-generation intelligent manufacturing. Engineering 2018, 4, 11–20. [Google Scholar] [CrossRef]
- Tran, M.Q.; Doan, H.P.; Vu, V.Q.; Vu, L.T. Machine learning and IoT-based approach for tool condition monitoring: A review and future prospects. Measurement 2023, 207, 112351. [Google Scholar] [CrossRef]
- Nunes, E.C. Anomalous sound detection with machine learning: A systematic review. arXiv 2021, arXiv:2102.07820. [Google Scholar] [CrossRef]
- Koizumi, Y.; Kawaguchi, Y.; Imoto, K.; Nakamura, T.; Nikaido, Y.; Tanabe, R.; Purohit, H.; Suefusa, K.; Endo, T.; Yasuda, M.; et al. Description and discussion on DCASE2020 challenge task2: Unsupervised anomalous sound detection for machine condition monitoring. arXiv 2020, arXiv:2006.05822. [Google Scholar] [CrossRef]
- Suefusa, K.; Nishida, T.; Purohit, H.; Tanabe, R.; Endo, T.; Kawaguchi, Y. Anomalous sound detection based on interpolation deep neural network. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 4–8 May 2020; IEEE: New York, NY, USA, 2020; pp. 271–275. [Google Scholar] [CrossRef]
- Kapka, S. ID-conditioned auto-encoder for unsupervised anomaly detection. arXiv 2020, arXiv:2007.05314. [Google Scholar] [CrossRef]
- Alam, J.; Boulianne, G.; Gupta, V.; Fathan, A. An ensemble approach to unsupervised anomalous sound detection. In Proceedings of the 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), Tokyo, Japan, 2–4 November 2020; pp. 2–4. [Google Scholar]
- Hayashi, T.; Yoshimura, T.; Adachi, Y. Conformer-based id-aware autoencoder for unsupervised anomalous sound detection. In DCASE2020 Challenge; Technical Report; DCASE Community: Barcelona, Spain, 2020. [Google Scholar]
- Park, J.; Yoon, S.; Yoo, S. Unsupervised detection of anomalous machine sound using various spectral features and focused hypothesis test in the reverberant and noisy environment. In Proceedings of the Detection and Classification of Acoustic Scenes and Events, DCASE2020, Tokyo, Japan, 2–4 November 2020. [Google Scholar]
- Wang, Y.; Zhang, Q.; Zhang, W.; Zhang, Y. A lightweight framework for unsupervised anomalous sound detection based on selective learning of time-frequency domain features. Appl. Acoust. 2025, 228, 110308. [Google Scholar] [CrossRef]
- Giri, R.; Tenneti, S.; Cheng, F.; Helwani, K.; Isik, U.; Krishnaswamy, A. Self-supervised classification for detecting anomalous sounds. In Proceedings of the Detection and Classification of Acoustic Scenes and Events, DCASE2020, Tokyo, Japan, 2–4 November 2020. [Google Scholar]
- Dohi, K.; Endo, T.; Purohit, H.; Tanabe, R.; Kawaguchi, Y. Flow-based self-supervised density estimation for anomalous sound detection. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 6–11 June 2021; IEEE: New York, NY, USA, 2021; pp. 336–340. [Google Scholar] [CrossRef]
- Chen, H.; Ran, L.; Sun, X.; Cai, C. SW-WAVENET: Learning representation from spectrogram and wavegram using wavenet for anomalous sound detection. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Wu, J.; Yang, F.; Hu, W. Unsupervised anomalous sound detection for industrial monitoring based on ArcFace classifier and gaussian mixture model. Appl. Acoust. 2023, 203, 109188. [Google Scholar] [CrossRef]
- Liu, Y.; Guan, J.; Zhu, Q.; Wang, W. Anomalous sound detection using spectral-temporal information fusion. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; IEEE: New York, NY, USA, 2022; pp. 816–820. [Google Scholar] [CrossRef]
- Zeng, X.M.; Song, Y.; Zhuo, Z.; Zhou, Y.; Li, Y.-H.; Xue, H.; Dai, L.-R.; McLoughlin, I. Joint generative-contrastive representation learning for anomalous sound detection. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Wilkinghoff, K. Sub-cluster AdaCos: Learning representations for anomalous sound detection. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Choi, S.; Choi, J.W. Noisy-arcmix: Additive noisy angular margin loss combined with mixup for anomalous sound detection. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: New York, NY, USA, 2024; pp. 516–520. [Google Scholar] [CrossRef]
- Kong, D.; Yu, H.; Yuan, G. Multi-Spectral and Multi-Temporal Features Fusion with SE Network for Anomalous Sound Detection. IEEE Access 2024, 12, 167262–167277. [Google Scholar] [CrossRef]
- Guan, J.; Xiao, F.; Liu, Y.; Zhu, Q.; Wang, W. Anomalous sound detection using audio representation with machine id based contrastive learning pretraining. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Huang, S.; Fang, Z.; He, L. Noise supervised contrastive learning and feature-perturbed for anomalous sound detection. arXiv 2025, arXiv:2509.13853. [Google Scholar] [CrossRef]
- Wilkinghoff, K.; Yang, H.; Ebbers, J.; Germain, F.G.; Wichern, G.; Le Roux, J. Keeping the Balance: Anomaly Score Calculation for Domain Generalization. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, R.; Qiao, Y.; Wang, X.; Li, H. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10823–10832. [Google Scholar] [CrossRef]
- Wilkinghoff, K.; Kurth, F. Why do angular margin losses work well for semi-supervised anomalous sound detection? IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 32, 608–622. [Google Scholar] [CrossRef]
- Qian, Q.; Shang, L.; Sun, B.; Hu, J.; Tacoma, T.; Li, H.; Jin, R. Softtriple loss: Deep metric learning without triplet sampling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 6450–6458. [Google Scholar] [CrossRef]
- Pham, L.; Phan, H.; Nguyen, T.; Palaniappan, R.; Mertins, A.; McLoughlin, I. Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digit. Signal Process. 2021, 110, 102943. [Google Scholar] [CrossRef]
- Rippel, O.; Snoek, J.; Adams, R.P. Spectral representations for convolutional neural networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2449–2457. [Google Scholar]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar] [CrossRef]
- Purohit, H.; Tanabe, R.; Ichige, T.; Endo, T.; Nikaido, Y.; Suefusa, K.; Kawaguchi, Y. MIMII Dataset: Sound dataset for malfunctioning industrial machine investigation and inspection. arXiv 2019, arXiv:1909.09347. [Google Scholar] [CrossRef]
- Koizumi, Y.; Saito, S.; Uematsu, H.; Harada, N.; Imoto, K. ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection. In Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; IEEE: New York, NY, USA, 2019; pp. 313–317. [Google Scholar] [CrossRef]
- Neri, M.; Carli, M. Low-complexity attention-based unsupervised anomalous sound detection exploiting separable convolutions and angular loss. IEEE Sens. Lett. 2024, 8, 6014404. [Google Scholar] [CrossRef]









| Layer | c | k | s | p | n |
|---|---|---|---|---|---|
| Conv1D | 128 | 1024 | 512 | 512 | |
| LayerNorm | - | - | - | - | |
| Leaky ReLU | - | - | - | - | |
| Conv1D | 128 | 3 | 1 | 1 |
| Methods | Fan | Pump | Slider | Valve | ToyCar | ToyConveyor | Average | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | pAUC | AUC | pAUC | AUC | pAUC | AUC | pAUC | AUC | pAUC | AUC | pAUC | AUC | pAUC | |
| AE | 66.20 | 53.20 | 72.90 | 60.30 | 85.50 | 67.80 | 66.30 | 51.20 | 80.90 | 79.90 | 73.40 | 61.10 | 74.20 | 60.58 |
| IDNN | 67.71 | 52.90 | 73.76 | 61.07 | 86.45 | 67.58 | 84.09 | 64.94 | 78.69 | 69.22 | 71.07 | 59.70 | 76.96 | 62.57 |
| MobileNetV2 | 80.19 | 74.40 | 82.53 | 76.50 | 95.27 | 85.22 | 88.65 | 87.98 | 87.66 | 85.92 | 69.71 | 56.43 | 84.34 | 77.74 |
| Glow-Aff | 74.90 | 65.30 | 84.30 | 73.80 | 94.60 | 82.80 | 91.40 | 75.00 | 92.20 | 84.10 | 71.50 | 59.00 | 85.20 | 73.79 |
| STgram-MFN | 94.04 | 88.97 | 91.94 | 81.75 | 99.55 | 97.61 | 99.64 | 98.44 | 94.44 | 87.68 | 74.57 | 63.60 | 92.36 | 86.34 |
| TASTgram | 97.82 | 94.78 | 94.05 | 85.40 | 99.50 | 97.39 | 99.97 | 99.85 | 96.40 | 89.91 | 76.58 | 66.43 | 94.07 | 88.96 |
| FGASpecNet | 98.05 | 94.73 | 96.03 | 88.24 | 99.49 | 97.32 | 99.93 | 99.61 | 96.99 | 90.43 | 79.75 | 67.72 | 95.04 | 89.68 |
| SpecNet | FGA | SoftTriple | AUC | pAUC |
|---|---|---|---|---|
| 94.07 | 88.96 | |||
| ✓ | 94.42 | 89.15 | ||
| ✓ | 94.57 | 88.95 | ||
| ✓ | ✓ | 94.68 | 89.38 | |
| ✓ | ✓ | 93.90 | 88.80 | |
| ✓ | ✓ | ✓ | 95.04 | 89.68 |
| K | AUC | pAUC | |||||
|---|---|---|---|---|---|---|---|
| 0.05 | 20 | 0.02 | 3 | 0 | 0.15 | 94.94 | 89.61 |
| 0.1 | 10 | 0.02 | 3 | 0 | 0.15 | 94.46 | 88.99 |
| 0.1 | 20 | 0.025 | 3 | 0 | 0.15 | 94.38 | 89.00 |
| 0.1 | 20 | 0.02 | 4 | 0 | 0.15 | 93.99 | 88.82 |
| 0.1 | 20 | 0.02 | 3 | 0.02 | 0.15 | 90.17 | 84.25 |
| 0.1 | 20 | 0.02 | 3 | 0 | 0.12 | 93.80 | 88.75 |
| 0.1 | 20 | 0.02 | 3 | 0 | 0.15 | 95.04 | 89.68 |
| Method | Parameters | AUC | pAUC |
|---|---|---|---|
| IDNN | 46 k | 76.96 | 62.57 |
| MobileNetV2 | 1.1 M | 84.34 | 77.74 |
| Glow-Aff | 30 M | 85.20 | 73.90 |
| STgram-MFN | 1.1 M | 92.36 | 86.34 |
| TASTgram | 1.2 M | 94.07 | 88.96 |
| FGASpecNet | 1.3 M | 95.04 | 89.68 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bi, Z.; Jiang, J.; Zhang, W.; Shan, M. Anomalous Sound Detection by Fusing Spectral Enhancement and Frequency-Gated Attention. Mathematics 2026, 14, 530. https://doi.org/10.3390/math14030530
Bi Z, Jiang J, Zhang W, Shan M. Anomalous Sound Detection by Fusing Spectral Enhancement and Frequency-Gated Attention. Mathematics. 2026; 14(3):530. https://doi.org/10.3390/math14030530
Chicago/Turabian StyleBi, Zhongqin, Jun Jiang, Weina Zhang, and Meijing Shan. 2026. "Anomalous Sound Detection by Fusing Spectral Enhancement and Frequency-Gated Attention" Mathematics 14, no. 3: 530. https://doi.org/10.3390/math14030530
APA StyleBi, Z., Jiang, J., Zhang, W., & Shan, M. (2026). Anomalous Sound Detection by Fusing Spectral Enhancement and Frequency-Gated Attention. Mathematics, 14(3), 530. https://doi.org/10.3390/math14030530

