Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture
Abstract
:1. Introduction
1.1. Research Questions and Hypotheses
1.2. Background in the Field of Music Information Retrieval (MIR)
2. Related Work
3. Methodology
3.1. Mel-Spectrogram Model
3.2. ECA Module
3.3. Model Design
4. Experiments
4.1. Experimental Environment Platforms and Datasets
4.2. Evaluation Indicators
4.3. Results and Analysis
5. Conclusions
- (1)
- Integration of Channel Attention Mechanism: By integrating an effective channel attention mechanism into the traditional CNN architecture, ECAS-CNN enhances feature extraction capabilities, leading to superior classification performance.
- (2)
- High Performance on the GTZAN Dataset: The ECAS-CNN model achieved a high accuracy of 95.26% on the GTZAN dataset, along with a precision of 96.22% and a recall of 95.23%. These metrics collectively reflect the model’s outstanding classification ability.
- (3)
- Comparison with Advanced Models: When compared to other advanced models such as BiLSTM and 2D CNN, ECAS-CNN exhibited balanced and efficient performance across all key metrics, including accuracy, precision, recall, and F1-Score.
- (4)
- Effectiveness in Handling Complex Features: The results validate the effectiveness of ECAS-CNN in processing complex musical features and improving classification performance, particularly in reducing misclassification and enhancing the model’s generalization capability.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tzanetakis, G.; Cook, P. Musical Genre Classification of Audio Signals. IEEE Trans. Speech Audio Process. 2002, 10, 293–302. [Google Scholar] [CrossRef]
- Yoshioka, T.; Nakatani, T.; Miyoshi, M.; Okuno, H.G. Rhythmic Similarity of Music Based on Dynamic Periodicity Analysis. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 69–84. [Google Scholar] [CrossRef]
- Kumar, M.K.; Sujanasri, K.; Neha, B.; Akshara, G.; Chugh, P.; Haindavi, P. Automated Music Genre Classification through Deep Learning Techniques. E3S Web Conf. 2023, 430, 01033. [Google Scholar] [CrossRef]
- Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Convolutional Recurrent Neural Networks for Music Classification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 2392–2396. [Google Scholar] [CrossRef]
- Won, M.; Choi, K.; Lee, J. Evaluation of Deep Learning Models for Music Genre Classification. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Virtual, 11–16 October 2020; pp. 484–491. [Google Scholar]
- Yang, Y.H.; Chen, H.H. Music Emotion Recognition; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar] [CrossRef]
- Zhao, Z.; Xie, Z.; Fu, J.; Tian, X. Music Genre Classification: Machine Learning on GTZAN. Appl. Comput. Eng. 2024, 79, 219–233. [Google Scholar] [CrossRef]
- Shah, M.; Pujara, N.; Mangaroliya, K.; Gohil, L.; Vyas, T.; Degadwala, S. Music Genre Classification Using Deep Learning. In Proceedings of the 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–30 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 974–978. [Google Scholar] [CrossRef]
- Jahnavi, M.; Satapathy, A.; Lokesh, C.; Likhitha, P.B. A Comparative Performance Evaluation of Machine Learning Approaches for Spectrogram-based Music Genre Classification. In Proceedings of the 2023 IEEE 3rd International Conference on Technology, Engineering, Management for Societal Impact using Marketing, Entrepreneurship and Talent (TEMSMET), Mysuru, India, 10–11 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar] [CrossRef]
- Deng, X. Music Genre Classification and Recognition Using Improved Deep Convolutional Neural Network-DenseNet-II. In Proceedings of the 2024 Second International Conference on Data Science and Information System (ICDSIS), Singapore, 17–18 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Pillai, R.; Sharma, N.; Upadhyay, D.; Dangi, S.; Gupta, R. Sonic Signatures: Sequential Model-driven Music Genre Classification with Mel Spectograms. In Proceedings of the 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 11–12 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
- SuriyaPrakash, J.; Kiran, S. Obtain Better Accuracy Using Music Genre Classification System on GTZAN Dataset. In Proceedings of the 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Vijayapura, India, 20–21 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Liu, Z.; Bian, T.; Yang, M. Locally Activated Gated Neural Network for Automatic Music Genre Classification. Appl. Sci. 2023, 13, 5010. [Google Scholar] [CrossRef]
- Patil, S.A.; Pradeepini, G.; Komati, T.R. Novel Mathematical Model for the Classification of Music and Rhythmic Genre Using Deep Neural Network. J. Big Data 2023, 10, 108. [Google Scholar] [CrossRef]
- Srivastava, N.; Ruhil, S.; Kaushal, G. Music Genre Classification Using Convolutional Recurrent Neural Networks. In Proceedings of the 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, India, 18–20 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Chaudhury, M.; Karami, A.; Ghazanfar, M.A. Large-scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark. Electronics 2022, 11, 2567. [Google Scholar] [CrossRef]
- de Pinto, M.G.; Polignano, M.; Lops, P.; Semeraro, G. Emotions Understanding Model from Spoken Language Using Deep Neural Networks and Mel-frequency Cepstral Coefficients. In Proceedings of the 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Bari, Italy, 27–29 May 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
- Vaibhavi, M.; Krishna, P.R. Music Genre Classification Using Neural Networks with Data Augmentation: A Make in India Creation. J. Innov. Sci. Sustain. Technol. 2021, 1, 21–37. [Google Scholar]
- Yang, R.; Feng, L.; Wang, H.; Yao, J.; Luo, S. Parallel Recurrent Convolutional Neural Networks Based Music Genre Classification Method for Mobile Devices. IEEE Access 2020, 8, 19629–19637. [Google Scholar] [CrossRef]
- Li, T. Optimizing the configuration of deep learning models for music genre classification. Heliyon 2024, 10, e24892. [Google Scholar] [CrossRef] [PubMed]
- Wen, Z.; Chen, A.; Zhou, G.; Yi, J.; Peng, W. Parallel attention of representation global time–frequency correlation for music genre classification. Multimed. Tools Appl. 2024, 83, 10211–10231. [Google Scholar] [CrossRef]
- Prabhakar, S.K.; Lee, S.W. Holistic Approaches to Music Genre Classification using Efficient Transfer and Deep Learning Techniques. Expert Syst. Appl. 2023, 211, 118636. [Google Scholar] [CrossRef]
Layer | Feature Map | Size | Kernel Size | Stride | Activation | |
---|---|---|---|---|---|---|
Input | MFCC | 1 | (130, 13, 1) | - | - | - |
1 | Conv | 128 | (130, 13, 128) | (3, 3) | (1, 1) | Relu |
2 | ECA | 128 | (130, 13, 128) | - | - | - |
3 | Max Pool | 128 | (65, 7, 128) | (2, 2) | (2, 2) | |
4 | Conv | 128 | (65, 7, 128) | (3, 3) | (1, 1) | Relu |
5 | ECA | 128 | (65, 7, 128) | - | - | - |
6 | Max Pool | 128 | (17, 2, 256) | (2, 2) | (2, 2) | - |
7 | Conv | 256 | (17, 2, 256) | (3, 3) | (1, 1) | Relu |
8 | ECA | 256 | (17, 2, 256) | - | - | - |
9 | Max Pool | 256 | (9, 1, 256) | (2, 2) | (2, 2) | - |
10 | Flatten | - | 2304 | - | - | - |
11 | FC | - | 128 | - | - | Relu |
12 | Dropout | - | 128 | - | - | - |
13 | FC | - | 10 | - | - | Softmax |
Character Radical | Model Parameters |
---|---|
CPU | 13th Gen Intel(R) Core(TM) i9-13900KF |
RAM | 32 G |
GPU | NVIDIA GeForce RTX 4060 12 GB |
Programing Language | Python 3.8 |
Deeplearning Framework | Tensorflow 2.13.0 |
CUDA | 11.8 |
True Value | |||
---|---|---|---|
Positive | Negative | ||
Predicted value | Positive | TP | FP |
Negative | FN | TN |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
RCNN with Data Augmentation [18] | 69.49 | 97.94 | 88.84 | 92.7 |
4 Layers-2D CNN with Data Augmentation [18] | 81.55 | 83.50 | 89.01 | 86.17 |
BiLSTM [19] | 97.80 | 88 | 94 | 78 |
MFCC+STFT [20] | 95.20 | 95.23 | 95.20 | 95.20 |
CNN-5+DPA [21] | 91.40 | 90.60 | 96.00 | 93.20 |
Purposed ECAS-CNN | 95.26 | 96.22 | 95.23 | 95.41 |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Deep Learning BAG | 92.38 | 91.83 | 92.55 | 92.03 |
TSVM | 85.05 | 84.68 | 85.12 | 84.80 |
WVG-ELNSC | 92.89 | 91.55 | 92.62 | 89.87 |
RA based TSM-SVM [22] | 91.84 | 90.56 | 91.23 | 89.51 |
Purposed ECAS-CNN | 94.28 | 95.11 | 94.62 | 95.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, Y.; Zhang, H.; Huang, W.; Zhou, X.; Shi, Z. Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture. Sensors 2024, 24, 7021. https://doi.org/10.3390/s24217021
Ding Y, Zhang H, Huang W, Zhou X, Shi Z. Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture. Sensors. 2024; 24(21):7021. https://doi.org/10.3390/s24217021
Chicago/Turabian StyleDing, Yang, Hongzheng Zhang, Wanmacairang Huang, Xiaoxiong Zhou, and Zhihan Shi. 2024. "Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture" Sensors 24, no. 21: 7021. https://doi.org/10.3390/s24217021
APA StyleDing, Y., Zhang, H., Huang, W., Zhou, X., & Shi, Z. (2024). Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture. Sensors, 24(21), 7021. https://doi.org/10.3390/s24217021