You are currently viewing a new version of our website. To view the old version click .
Behavioral Sciences
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

4 December 2025

From Gaze to Music: AI-Powered Personalized Audiovisual Experiences for Children’s Aesthetic Education

,
and
1
Hainan University, Haikou 570228, China
2
Shijiazhuang University, Shijiazhuang 050035, China
3
Hainan Medical University, Haikou 571199, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Cognition

Abstract

The cultivation of aesthetic appreciation through engagement with exemplary artworks constitutes a fundamental pillar in fostering children’s cognitive and emotional development, while simultaneously facilitating multidimensional learning experiences across diverse perceptual domains. However, children in early stages of cognitive development frequently encounter substantial challenges when attempting to comprehend and internalize complex visual narratives and abstract artistic concepts inherent in sophisticated artworks. This study presents an innovative methodological framework designed to enhance children’s artwork comprehension capabilities by systematically leveraging the theoretical foundations of audio-visual cross-modal integration. Through investigation of cross-modal correspondences between visual and auditory perceptual systems, we developed a sophisticated methodology that extracts and interprets musical elements based on gaze behavior patterns derived from prior pilot studies when observing artworks. Utilizing state-of-the-art deep learning techniques, specifically Recurrent Neural Networks (RNNs), these extracted visual–musical correspondences are subsequently transformed into cohesive, aesthetically pleasing musical compositions that maintain semantic and emotional congruence with the observed visual content. The efficacy and practical applicability of our proposed method were validated through empirical evaluation involving 96 children (analyzed through objective behavioral assessments using eye-tracking technology), complemented by qualitative evaluations from 16 parents and 5 experienced preschool educators. Our findings show statistically significant improvements in children’s sustained engagement and attentional focus under AI-generated, artwork-matched audiovisual support, potentially scaffolding deeper processing and informing future developments in aesthetic education. The results demonstrate statistically significant improvements in children’s sustained engagement (fixation duration: 58.82 ± 7.38 s vs. 41.29 ± 6.92 s, p < 0.001, Cohen’s d ≈ 1.29), attentional focus (AOI gaze frequency increased 73%, p < 0.001), and subjective evaluations from parents (mean ratings 4.56–4.81/5) when visual experiences are augmented by AI-generated, personalized audio-visual experiences.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.