Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (122)

Search Parameters:
Keywords = musical genres

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2230 KiB  
Article
Enhancing Diffusion-Based Music Generation Performance with LoRA
by Seonpyo Kim, Geonhui Kim, Shoki Yagishita, Daewoon Han, Jeonghyeon Im and Yunsick Sung
Appl. Sci. 2025, 15(15), 8646; https://doi.org/10.3390/app15158646 - 5 Aug 2025
Viewed by 50
Abstract
Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific [...] Read more.
Recent advancements in generative artificial intelligence have significantly progressed the field of text-to-music generation, enabling users to create music from natural language descriptions. Despite the success of various models, such as MusicLM, MusicGen, and AudioLDM, the current approaches struggle to capture fine-grained genre-specific characteristics, precisely control musical attributes, and handle underrepresented cultural data. This paper introduces a novel, lightweight fine-tuning method for the AudioLDM framework using low-rank adaptation (LoRA). By updating only selected attention and projection layers, the proposed method enables efficient adaptation to musical genres with limited data and computational cost. The proposed method enhances controllability over key musical parameters such as rhythm, emotion, and timbre. At the same time, it maintains the overall quality of music generation. This paper represents the first application of LoRA in AudioLDM, offering a scalable solution for fine-grained, genre-aware music generation and customization. The experimental results demonstrate that the proposed method improves the semantic alignment and statistical similarity compared with the baseline. The contrastive language–audio pretraining score increased by 0.0498, indicating enhanced text-music consistency. The kernel audio distance score decreased by 0.8349, reflecting improved similarity to real music distributions. The mean opinion score ranged from 3.5 to 3.8, confirming the perceptual quality of the generated music. Full article
Show Figures

Figure 1

15 pages, 415 KiB  
Article
Enhancing MusicGen with Prompt Tuning
by Hohyeon Shin, Jeonghyeon Im and Yunsick Sung
Appl. Sci. 2025, 15(15), 8504; https://doi.org/10.3390/app15158504 - 31 Jul 2025
Viewed by 241
Abstract
Generative AI has been gaining attention across various creative domains. In particular, MusicGen stands out as a representative approach capable of generating music based on text or audio inputs. However, it has limitations in producing high-quality outputs for specific genres and fully reflecting [...] Read more.
Generative AI has been gaining attention across various creative domains. In particular, MusicGen stands out as a representative approach capable of generating music based on text or audio inputs. However, it has limitations in producing high-quality outputs for specific genres and fully reflecting user intentions. This paper proposes a prompt tuning technique that effectively adjusts the output quality of MusicGen without modifying its original parameters and optimizes its ability to generate music tailored to specific genres and styles. Experiments were conducted to compare the performance of the traditional MusicGen with the proposed method and evaluate the quality of generated music using the Contrastive Language-Audio Pretraining (CLAP) and Kullback–Leibler Divergence (KLD) scoring approaches. The results demonstrated that the proposed method significantly improved the output quality and musical coherence, particularly for specific genres and styles. Compared with the traditional model, the CLAP score was increased by 0.1270, and the KLD score was increased by 0.00403 on average. The effectiveness of prompt tuning in optimizing the performance of MusicGen validated the proposed method and highlighted its potential for advancing generative AI-based music generation tools. Full article
Show Figures

Figure 1

29 pages, 9956 KiB  
Article
Improving the Acoustics of the Church of Saints Marcellino and Pietro in Cremona (Italy) for Musical Performances
by Sofia Parrinelli, Riccardo Giampiccolo, Angelo Giuseppe Landi and Fabio Antonacci
Acoustics 2025, 7(3), 42; https://doi.org/10.3390/acoustics7030042 - 8 Jul 2025
Viewed by 462
Abstract
Churches are spaces designed with a unique acoustic identity, which is intimately connected to the oratory and musical needs of the historical period in which they were built. For instance, their typically long reverberation time is appropriate to specific uses, such as liturgical [...] Read more.
Churches are spaces designed with a unique acoustic identity, which is intimately connected to the oratory and musical needs of the historical period in which they were built. For instance, their typically long reverberation time is appropriate to specific uses, such as liturgical functions and choral music performances, but it may impair the repurposing of the space for other functions. Indeed, an acoustic environment suitable for choral or sacred music may not be compatible with other musical genres such as chamber music, solo performances, or small instrumental ensembles, which require greater clarity and frequency-balanced acoustic properties. In such cases, careful analysis of the environment and specific acoustic conditioning become essential steps to enable the space to be used for novel purposes, without compromising its artistic and historical integrity. In this work, we analyze and improve the acoustics of the church of Saints Marcellino and Pietro through space-time acoustic measurements and simulations. After developing and validating our model, we propose various solutions to optimize the church acoustics, transforming it into a functional concert hall while preserving its original identity and artistic grandeur. Full article
Show Figures

Figure 1

25 pages, 1224 KiB  
Article
Generative Jazz Chord Progressions: A Statistical Approach to Harmonic Creativity
by Adriano N. Raposo and Vasco N. G. J. Soares
Information 2025, 16(6), 504; https://doi.org/10.3390/info16060504 - 17 Jun 2025
Viewed by 1008
Abstract
Jazz music has long been a subject of interest in the field of generative music. Traditional jazz chord progressions follow established patterns that contribute to the genre’s distinct sound. However, the demand for more innovative and diverse harmonic structures has led to the [...] Read more.
Jazz music has long been a subject of interest in the field of generative music. Traditional jazz chord progressions follow established patterns that contribute to the genre’s distinct sound. However, the demand for more innovative and diverse harmonic structures has led to the exploration of alternative approaches in music generation. This paper addresses the challenge of generating novel and engaging jazz chord sequences that go beyond traditional chord progressions. It proposes an unconventional statistical approach, leveraging a corpus of 1382 jazz standards, which includes key information, song structure, and chord sequences by section. The proposed method generates chord sequences based on statistical patterns extracted from the corpus, considering a tonal context while introducing a degree of unpredictability that enhances the results with elements of surprise and interest. The goal is to move beyond conventional and well-known jazz chord progressions, exploring new and inspiring harmonic possibilities. The evaluation of the generated dataset, which matches the size of the learning corpus, demonstrates a strong statistical alignment between distributions across multiple analysis parameters while also revealing opportunities for further exploration of novel harmonic pathways. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)
Show Figures

Graphical abstract

22 pages, 1588 KiB  
Article
An Eye-Tracking Study on Text Comprehension While Listening to Music: Preliminary Results
by Georgia Andreou and Maria Gkantaki
Appl. Sci. 2025, 15(7), 3939; https://doi.org/10.3390/app15073939 - 3 Apr 2025
Viewed by 2183
Abstract
The aim of the present study was to examine the effect of background music on text comprehension using eye-tracking technology. Ten Greek undergraduate students read four texts under the following four reading conditions: preferred music, non-preferred music, café noise, and in silence. Eye [...] Read more.
The aim of the present study was to examine the effect of background music on text comprehension using eye-tracking technology. Ten Greek undergraduate students read four texts under the following four reading conditions: preferred music, non-preferred music, café noise, and in silence. Eye movements were tracked to assess visual patterns, while reading performance and attitudes were also evaluated. The results showed that fixation measures remained stable across conditions, suggesting that early visual processing is not significantly influenced by auditory distractions. However, reading performance significantly declined under non-preferred music, highlighting its disruptive impact on cognitive processing. Participants also reported greater difficulty and fatigue in this condition, consistent with an increased cognitive load. In contrast, preferred music and silence were associated with enhanced understanding, confidence, and immersion, café noise also had a moderate but manageable effect on reading outcomes. These findings underscore the importance of tailoring reading environments to individual preferences in order to optimize reading performance and engagement. Future research studies should focus on the effects of different musical attributes, such as tempo and genre, and use more complex reading tasks, in order to better understand how auditory stimuli interact with cognitive load and visual processing. Full article
(This article belongs to the Special Issue Latest Research on Eye Tracking Applications)
Show Figures

Figure 1

13 pages, 388 KiB  
Article
Hillsong’s Swansong? On the Decline of Hillsong Within the Contemporary Congregational Song Genre
by Daniel Thornton
Religions 2025, 16(4), 427; https://doi.org/10.3390/rel16040427 - 27 Mar 2025
Cited by 1 | Viewed by 2249
Abstract
Contemporary Congregational Songs (CCS) are used for gathered musical worship in churches of diverse traditions and denominations all over the world. Christian Copyright Licensing International (CCLI) has measured the use of CCS in licensed churches in various global regions for over 30 years. [...] Read more.
Contemporary Congregational Songs (CCS) are used for gathered musical worship in churches of diverse traditions and denominations all over the world. Christian Copyright Licensing International (CCLI) has measured the use of CCS in licensed churches in various global regions for over 30 years. This article examines the trajectory of songs as they enter and exit the biannual CCLI top songs lists over a 10 year period from 2014–2023. Hillsong has been one of the most prominent producers of CCS, with dominant appearances in the CCLI top song lists for the last three decades. However, they have not released any new CCS since 2021. This article explores what has happened over the past few years to the void left by such a dominant producer of CCS, and what that might mean for the genre and its future. Full article
Show Figures

Figure 1

44 pages, 15045 KiB  
Perspective
Exploring the Creative Art of Sergei Kuriokhin—Avant-Garde Musician, Cultural Theorist, and Cineast: Four Sergei(s) and Two Memoir Interviews
by Sergei Chubraev
Arts 2025, 14(2), 23; https://doi.org/10.3390/arts14020023 - 1 Mar 2025
Viewed by 766
Abstract
This text explores the life and legacy of Sergei Kuriokhin, a multifaceted artist who profoundly impacted Soviet and post-Soviet culture. Known for his radical experimentation in music, theater, and film, Kuriokhin defied conventional genres through his groundbreaking project, ‘Pop Mechanics’, which blended jazz, [...] Read more.
This text explores the life and legacy of Sergei Kuriokhin, a multifaceted artist who profoundly impacted Soviet and post-Soviet culture. Known for his radical experimentation in music, theater, and film, Kuriokhin defied conventional genres through his groundbreaking project, ‘Pop Mechanics’, which blended jazz, classical music, rock, circus acts, and more. His provocative performances often included surreal elements and bizarre satire, challenging cultural norms and the boundaries of Soviet censorship. Kuriokhin’s influence extended into politics, where his satirical “Lenin was a Mushroom” program questioned historical and ideological narratives, stirring public debate. His charisma, intellectual depth, and penchant for the absurd made him a central figure in Leningrad’s avant-garde scene. Kuriokhin collaborated with prominent artists and philosophers, leaving an indelible mark on Russian art and political discourse. This work, presented through the reflections of his close associates, offers insights into his lasting impact on Russian culture, blending history with personal mythologies. Full article
Show Figures

Figure 1

11 pages, 877 KiB  
Article
Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space
by Jorge Perianez-Pascual, Juan D. Gutiérrez, Laura Escobar-Encinas, Álvaro Rubio-Largo and Roberto Rodriguez-Echeverria
Algorithms 2025, 18(2), 108; https://doi.org/10.3390/a18020108 - 16 Feb 2025
Viewed by 1575
Abstract
This paper presents a novel approach to audio classification leveraging the latent representation generated by Meta’s EnCodec neural audio codec. We hypothesize that the compressed latent space representation captures essential audio features more suitable for classification tasks than the traditional spectrogram-based approaches. We [...] Read more.
This paper presents a novel approach to audio classification leveraging the latent representation generated by Meta’s EnCodec neural audio codec. We hypothesize that the compressed latent space representation captures essential audio features more suitable for classification tasks than the traditional spectrogram-based approaches. We train a vanilla convolutional neural network for music genre, speech/music, and environmental sound classification using EnCodec’s encoder output as input to validate this. Then, we compare its performance training with the same network using a spectrogram-based representation as input. Our experiments demonstrate that this approach achieves comparable accuracy to state-of-the-art methods while exhibiting significantly faster convergence and reduced computational load during training. These findings suggest the potential of EnCodec’s latent representation for efficient, faster, and less expensive audio classification applications. We analyze the characteristics of EnCodec’s output and compare its performance against traditional spectrogram-based approaches, providing insights into this novel approach’s advantages. Full article
Show Figures

Figure 1

18 pages, 613 KiB  
Article
Multilingual Singing in Nigeria: Examining Roles, Meaning, and Function in Wazobia Gospel Music
by Adekunle Oyeniyi
Religions 2025, 16(1), 4; https://doi.org/10.3390/rel16010004 - 24 Dec 2024
Viewed by 1796
Abstract
This article presents an introductory historical, sociolinguistic, and ethnographic study of “Wazobia gospel music”, a twenty-first-century Nigerian congregational musical genre. The term ‘Wazobia’ signifies a fusion of the three regionally recognized local languages in Nigeria: Wa (Yorùbá), Zo (Hausa), and Bia (Igbo)—words that [...] Read more.
This article presents an introductory historical, sociolinguistic, and ethnographic study of “Wazobia gospel music”, a twenty-first-century Nigerian congregational musical genre. The term ‘Wazobia’ signifies a fusion of the three regionally recognized local languages in Nigeria: Wa (Yorùbá), Zo (Hausa), and Bia (Igbo)—words that mean ‘come’ in the respective languages. In the Nigerian context, the Wazobia concept could also symbolize the inclusion of more than one ethnicity or language. By dissecting three multilingual Nigerian congregational songs, I unveil the diverse perceptions of Wazobia gospel music and the associations of the musical genre in line with the influencing agencies, text, and performance practices. Furthermore, I provide a detailed description and analysis of the textual and sonic contents of Wazobia gospel music, emphasizing its roles, meanings, and functions in the Lagos congregations context. I argue that Wazobia gospel music—multilingual singing in Nigerian churches—embodies multilayered roles in negotiating identity and creating hospitality. The complexity of studying congregational singing in cosmopolitan cities (like Lagos, Nigeria) due to multiple ethnolinguistic and musical expressions within local and transnational links is also addressed. To tackle this complexity, this article adopts an interdisciplinary approach, combining historical research, oral history, and hybrid ethnography. This approach ensures a thorough and in-depth understanding of Wazobia gospel music, a topic of significant importance in the study of Nigerian music, linguistics, and cultural studies. By employing frameworks of musical localization and signification, I incorporate the results of my ethnographic studies of three Protestant churches in Lagos, Nigeria, to illustrate Wazobia gospel music’s continued importance. The article conceptualizes multilingual singing and offers fresh perspectives on studying Nigerian Christian congregational music in the twenty-first century. Full article
(This article belongs to the Special Issue Multilingualism in Religious Musical Practice)
Show Figures

Figure 1

14 pages, 2940 KiB  
Communication
Potential Note Degree of Khong Wong Yai Based on Rhyme Structure and Pillar Tone as a Novel Approach for Musical Analysis Using Multivariate Statistics: A Case Study of the Composition Sadhukarn from Thailand, Laos, and Cambodia
by Sumetus Eambangyung
Stats 2024, 7(4), 1513-1526; https://doi.org/10.3390/stats7040089 - 20 Dec 2024
Viewed by 920
Abstract
Diverse multivariate statistics are powerful tools for musical analysis. A recent study identified relationships among different versions of the composition Sadhukarn from Thailand, Laos, and Cambodia using non-metric multidimensional scaling (NMDS) and cluster analysis. However, the datasets used for NMDS and cluster analysis [...] Read more.
Diverse multivariate statistics are powerful tools for musical analysis. A recent study identified relationships among different versions of the composition Sadhukarn from Thailand, Laos, and Cambodia using non-metric multidimensional scaling (NMDS) and cluster analysis. However, the datasets used for NMDS and cluster analysis require musical knowledge and complicated manual conversion of notations. This work aims to (i) evaluate a novel approach based on multivariate statistics of potential note degree of rhyme structure and pillar tone (Look Tok) for musical analysis of the 26 versions of the composition Sadhukarn from Thailand, Laos, and Cambodia; (ii) compare the multivariate results obtained by this novel approach and with the datasets from the published method using manual conversion; and (iii) investigate the impact of normalization on the results obtained by this new method. The result shows that the novel approach established in this study successfully identifies the 26 Sadhukarn versions according to their countries of origin. The results obtained by the novel approach of the full version were comparable to those obtained by the manual conversion approach. The normalization process causes the loss of identity and uniqueness. In conclusion, the novel approach based on the full version can be considered as a useful alternative approach for musical analysis based on multivariate statistics. In addition, it can be applied for other music genres, forms, and styles, as well as other musical instruments. Full article
(This article belongs to the Section Multivariate Analysis)
Show Figures

Figure 1

17 pages, 3902 KiB  
Article
Dual-Path Beat Tracking: Combining Temporal Convolutional Networks and Transformers in Parallel
by Nikhil Thapa and Joonwhoan Lee
Appl. Sci. 2024, 14(24), 11777; https://doi.org/10.3390/app142411777 - 17 Dec 2024
Viewed by 1869
Abstract
The Transformer, a deep learning architecture, has shown exceptional adaptability across fields, including music information retrieval (MIR). Transformers excel at capturing global, long-range dependencies in sequences, which is valuable for tracking rhythmic patterns over time. Temporal Convolutional Networks (TCNs), with their dilated convolutions, [...] Read more.
The Transformer, a deep learning architecture, has shown exceptional adaptability across fields, including music information retrieval (MIR). Transformers excel at capturing global, long-range dependencies in sequences, which is valuable for tracking rhythmic patterns over time. Temporal Convolutional Networks (TCNs), with their dilated convolutions, are effective at processing local, temporal patterns with reduced complexity. Combining these complementary characteristics, global sequence modeling from Transformers and local temporal detail from TCNs enhances beat tracking while reducing the model’s overall complexity. To capture beat intervals of varying lengths and ensure optimal alignment of beat predictions, the model employs a Dynamic Bayesian Network (DBN), followed by Viterbi decoding for effective post-processing. This system is evaluated across diverse public datasets spanning various music genres and styles, achieving performance on par with current state-of-the-art methods yet with fewer trainable parameters. Additionally, we also explore the interpretability of the model using Grad-CAM to visualize the model’s learned features, offering insights into how the TCN-Transformer hybrid captures rhythmic patterns in the data. Full article
(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)
Show Figures

Figure 1

19 pages, 893 KiB  
Review
What Do We Know About the Energy Status and Diets of Pre-Professional and Professional Dancers: A Scoping Review
by Alessandra Rigoli, Emily Dang, Victoria Michael, Janelle Gifford and Alyse Davies
Nutrients 2024, 16(24), 4293; https://doi.org/10.3390/nu16244293 - 12 Dec 2024
Viewed by 2149
Abstract
Background/Objectives: Dancers require adequate nutrition support for growth and development during the pre-professional stage, as well as to fuel classes and rehearsals and to enhance performance for both pre-professional and professional dancers. The aim of this study is to understand the energy status [...] Read more.
Background/Objectives: Dancers require adequate nutrition support for growth and development during the pre-professional stage, as well as to fuel classes and rehearsals and to enhance performance for both pre-professional and professional dancers. The aim of this study is to understand the energy status and diet of pre-professional and professional dancers in the genres of ballet, contemporary, musical theatre, and opera. Methods: Electronic databases (n = 9) and grey literature were searched for primary studies with no time limit. Screening and data extraction were completed by two reviewers. Results: Twelve studies were included for pre-professional (n = 7) and professional (n = 5) dancers. The genres identified were ballet (n = 11) and contemporary (n = 1), with no studies on musical theatre or opera. Studies on pre-professional ballet and contemporary dancers indicated a negative energy balance and low energy availability. Pre-professional ballet dancers had lower energy intakes than professional dancers. Professional dancers had lower BMI and body fat percentages. Macronutrients were mostly reported using the acceptable macronutrient distribution range for carbohydrates (38–56%E), protein (12–17%E), and total fat (26–42%E). Iron and calcium were the main micronutrients of concern. Conclusions: Accredited sports dietitians are recommended to support pre-professional and professional dancers to optimize their diet for health and performance. Further investigation is needed to quantify and assess dancers’ dietary intake using sports nutrition guidelines for reference. Full article
Show Figures

Figure 1

12 pages, 1121 KiB  
Article
Destroying Vision, Destroying Hearing: Sergei Kuriokhin and Arkady Dragomoshchenko
by Evgeny Pavlov
Arts 2024, 13(6), 181; https://doi.org/10.3390/arts13060181 - 10 Dec 2024
Viewed by 900
Abstract
The article explores the unique friendship and creative synergy between two towering figures of late Soviet underground culture, the avant-garde jazz musician Sergei Kuriokhin and the poet Arkady Dragomoshchenko. Both outsiders in Leningrad, they shaped its literary and musical landscapes without aligning with [...] Read more.
The article explores the unique friendship and creative synergy between two towering figures of late Soviet underground culture, the avant-garde jazz musician Sergei Kuriokhin and the poet Arkady Dragomoshchenko. Both outsiders in Leningrad, they shaped its literary and musical landscapes without aligning with any movements. Dragomoshchenko, a seminal poet, defied categorization, while Kuriokhin, a polymath, challenged conventions across music, performance, and politics. Their collaboration epitomized innovation, blending Dragomoshchenko’s cerebral poetry with Kuriokhin’s avant-garde music. Despite linguistic barriers, their connection transcended verbal communication, rooted in shared modes of nonlinear thinking and creative experimentation. Kuriokhin’s revolutionary Pop Mekhanika, a chaotic fusion of genres and sensory experiences, mirrored Dragomoshchenko’s relentless poetic evolution. Their friendship catalyzed pivotal encounters, such as with the American poet Lyn Hejinian, expanding their artistic horizons. Dragomoshchenko’s poetic vision, centred on perception’s fleeting nature and the boundaries of possibility, echoed Kuriokhin’s multisensory assaults on audience expectations. Through their unconventional artistry, Kuriokhin and Dragomoshchenko navigated the shifting cultural landscape of late Soviet society, embodying a spirit of defiance and exploration. Their enduring influence transcends their untimely deaths, leaving an indelible mark on Russian avant-garde culture. Full article
Show Figures

Figure 1

17 pages, 2262 KiB  
Article
Neural Mechanism of Musical Pleasure Induced by Prediction Errors: An EEG Study
by Fuyu Ueno and Sotaro Shimada
Brain Sci. 2024, 14(11), 1130; https://doi.org/10.3390/brainsci14111130 - 8 Nov 2024
Cited by 1 | Viewed by 1992
Abstract
Background/Objectives: Musical pleasure is considered to be induced by prediction errors (surprise), as suggested in neuroimaging studies. However, the role of temporal changes in musical features in reward processing remains unclear. Utilizing the Information Dynamics of Music (IDyOM) model, a statistical model that [...] Read more.
Background/Objectives: Musical pleasure is considered to be induced by prediction errors (surprise), as suggested in neuroimaging studies. However, the role of temporal changes in musical features in reward processing remains unclear. Utilizing the Information Dynamics of Music (IDyOM) model, a statistical model that calculates musical surprise based on prediction errors in melody and harmony, we investigated whether brain activities associated with musical pleasure, particularly in the θ, β, and γ bands, are induced by prediction errors, similar to those observed during monetary rewards. Methods: We used the IDyOM model to calculate the information content (IC) of surprise for melody and harmony in 70 musical pieces across six genres; eight pieces with varying IC values were selected. Electroencephalographic data were recorded during listening to the pieces, continuously evaluating the participants’ subjective pleasure on a 1–4 scale. Time–frequency analysis of electroencephalographic data was conducted, followed by general linear model analysis to fit the power-value time course in each frequency band to the time courses of subjective pleasure and IC for melody and harmony. Results: Significant positive fits were observed in the β and γ bands in the frontal region with both subjective pleasure and IC for melody and harmony. No significant fit was observed in the θ band. Both subjective pleasure and IC are associated with increased β and γ band power in the frontal regions. Conclusions: β and γ oscillatory activities in the frontal regions are strongly associated with musical rewards induced by prediction errors, similar to brain activity observed during monetary rewards. Full article
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)
Show Figures

Figure 1

16 pages, 1401 KiB  
Article
Efficient Music Genre Recognition Using ECAS-CNN: A Novel Channel-Aware Neural Network Architecture
by Yang Ding, Hongzheng Zhang, Wanmacairang Huang, Xiaoxiong Zhou and Zhihan Shi
Sensors 2024, 24(21), 7021; https://doi.org/10.3390/s24217021 - 31 Oct 2024
Cited by 1 | Viewed by 2311
Abstract
In the era of digital music proliferation, music genre classification has become a crucial task in music information retrieval. This paper proposes a novel channel-aware convolutional neural network (ECAS-CNN) designed to enhance the efficiency and accuracy of music genre recognition. By integrating an [...] Read more.
In the era of digital music proliferation, music genre classification has become a crucial task in music information retrieval. This paper proposes a novel channel-aware convolutional neural network (ECAS-CNN) designed to enhance the efficiency and accuracy of music genre recognition. By integrating an adaptive channel attention mechanism (ECA module) within the convolutional layers, the network significantly improves the extraction of key musical features. Extensive experiments were conducted on the GTZAN dataset, comparing the proposed ECAS-CNN with traditional convolutional neural networks. The results demonstrate that ECAS-CNN outperforms conventional methods across various performance metrics, including accuracy, precision, recall, and F1-score, particularly in handling complex musical features. This study validates the potential of ECAS-CNN in the domain of music genre classification and offers new insights for future research and applications. Full article
Show Figures

Figure 1

Back to TopTop