Next Article in Journal
A New Cost-Efficient Design of a Reversible Gate Based on a Nano-Scale Quantum-Dot Cellular Automata Technology
Next Article in Special Issue
Automatic Melody Harmonization via Reinforcement Learning by Exploring Structured Representations for Melody Sequences
Previous Article in Journal
Time Series Segmentation Using Neural Networks with Cross-Domain Transfer Learning
Previous Article in Special Issue
User-Driven Fine-Tuning for Beat Tracking
 
 
Article

Improving Semi-Supervised Learning for Audio Classification with FixMatch

1
Fraunhofer IDMT, Industrial Media Applications (IMA), 98693 Ilmenau, Germany
2
TU Ilmenau, Institute for Media Technology, 98693 Ilmenau, Germany
3
Songquito UG, 91052 Erlangen, Germany
*
Authors to whom correspondence should be addressed.
Academic Editors: Alexander Lerch and Peter Knees
Electronics 2021, 10(15), 1807; https://doi.org/10.3390/electronics10151807
Received: 16 June 2021 / Revised: 19 July 2021 / Accepted: 21 July 2021 / Published: 28 July 2021
(This article belongs to the Special Issue Machine Learning Applied to Music/Audio Signal Processing)
Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement. View Full-Text
Keywords: semi-supervised learning; deep learning; industrial sound analysis; music information retrieval; acoustic scene classification semi-supervised learning; deep learning; industrial sound analysis; music information retrieval; acoustic scene classification
Show Figures

Figure 1

MDPI and ACS Style

Grollmisch, S.; Cano, E. Improving Semi-Supervised Learning for Audio Classification with FixMatch. Electronics 2021, 10, 1807. https://doi.org/10.3390/electronics10151807

AMA Style

Grollmisch S, Cano E. Improving Semi-Supervised Learning for Audio Classification with FixMatch. Electronics. 2021; 10(15):1807. https://doi.org/10.3390/electronics10151807

Chicago/Turabian Style

Grollmisch, Sascha, and Estefanía Cano. 2021. "Improving Semi-Supervised Learning for Audio Classification with FixMatch" Electronics 10, no. 15: 1807. https://doi.org/10.3390/electronics10151807

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop