Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach

Benedict, Aileen; Ras, Zbigniew W.; Cylulko, Pawel; Gladyszewska-Cylulko, Joanna

doi:10.3390/info16080666

Open AccessArticle

Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach

¹

Computer Science Department, University of North Carolina, Charlotte, NC 28223, USA

²

Institute of Computer Science, Polish-Japanese Academy of Information Technology, 02-008 Warsaw, Poland

³

Department of Music Therapy, Karol Lipinski Music Academy, 50-043 Wrocław, Poland

⁴

Institute of Pedagogy, University of Wroclaw, 50-527 Wrocław, Poland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(8), 666; https://doi.org/10.3390/info16080666

Submission received: 5 March 2025 / Revised: 18 July 2025 / Accepted: 25 July 2025 / Published: 4 August 2025

(This article belongs to the Section Information Applications)

Download

Browse Figures

Versions Notes

Abstract

In the context of typhlo music therapy, personalized interventions can significantly enhance the therapeutic experience for visually impaired children. Leveraging a data-driven approach, we incorporate action-rule discovery to provide insights into the factors of music that may benefit individual children. The system utilizes a comprehensive dataset developed in collaboration with an experienced music therapist, special educator, and clinical psychologist, encompassing meta-decision attributes, decision attributes, and musical features such as tempo, rhythm, and pitch. By extracting and analyzing these features, our methodology identifies key factors that influence therapeutic outcomes. Some themes discovered through action-rule discovery include the effect of harmonic richness and loudness on expression and communication. The main findings demonstrate the system’s ability to offer personalized, impactful, and actionable insights, leading to improved therapeutic experiences for children undergoing typhlo music therapy. Our conclusions highlight the system’s potential to transform music therapy by providing therapists with precise and effective tools to support their patients’ developmental progress. This work shows the significance of integrating advanced data analysis techniques in therapeutic settings, paving the way for future enhancements in personalized music therapy interventions.

Keywords:

music; typhlo music therapy; visually impaired children; action-rule discovery; personalization; music recommendation system; data mining

1. Introduction

In the context of typhlo music therapy, personalized interventions can significantly enhance the therapeutic experience for visually impaired children. Music therapy has been acknowledged for its therapeutic benefits, including improved emotional expression, communication skills, and motor functions [1,2,3]. However, the effectiveness of these interventions can be enhanced through personalized approaches that cater to each child’s unique needs.

Recommendation systems have emerged as a powerful solution to the problem of information overload, helping users navigate vast amounts of data to find relevant and meaningful information [4]. These systems have been widely applied across various domains, including e-commerce [5], entertainment [6,7], and healthcare [8,9]. In healthcare, personalized recommendation systems have demonstrated their potential to aid decision-making processes for both patients and medical professionals [8]. However, the integration of such systems into music therapy, particularly for visually impaired children, remains underexplored, presenting a unique opportunity for our research to make a significant contribution.

In this study, we introduce a novel music recommendation system designed to aid music therapists in providing personalized therapeutic interventions for visually impaired children. Leveraging action-rule discovery, our system aims to generate actionable insights that can guide therapists in selecting music that aligns with the specific therapeutic goals for each child. Action rules are a type of decision rules used in machine learning and data mining to identify patterns and suggest changes that can lead to desired outcomes. By analyzing a comprehensive dataset developed in collaboration with an experienced music therapist, which includes meta-decision attributes, decision attributes, and various musical features such as tempo, loudness, and pitch, we aim to uncover the factors that influence therapeutic outcomes.

Our approach involves the extraction and analysis of musical features to generate action rules that provide insights into how different aspects of music can affect the therapy’s success. This data-driven methodology allows us to tailor music recommendations to the individual needs of each child, enhancing the overall effectiveness of typhlo music therapy. By empowering therapists with precise and personalized insights, our system has the potential to significantly improve the developmental progress and quality of life for visually impaired children undergoing music therapy. The focus of this study is not to generalize findings to all children or contexts but to explore the potential of action rules in creating personalized therapeutic interventions. By leveraging data-driven insights, we aim to provide tools that support therapists in adapting therapy to the unique and evolving needs of individual children.

The structure of this paper is organized as follows: Section 2 provides a background, focusing on relevant literature and positioning our study within existing research. Section 3 outlines the materials and methods, including details about the collaborative process, dataset structure, feature extraction techniques, and the methodology for action-rule discovery. Section 4 presents the findings from our data analysis, highlighting key patterns and action rules discovered. Section 5 offers a comprehensive discussion, covering the implications of our results for music therapy, a summary of findings, and potential directions for future research.

2. Background

2.1. Music Therapy for Visually Impaired Individuals

Music therapy has long been recognized for its ability to improve the emotional, cognitive, and social well-being of individuals [1,2,10]. Studies have demonstrated the positive effects of music therapy on enhancing motor skills [10,11], reducing anxiety [12,13], and improving social interactions [14]. For visually impaired children, music therapy can play a crucial role in their development by providing alternative means of expression, communication, and sensory stimulation [3,15,16].

One notable approach is the typhlo music therapy model, developed by [3,15,16]. This model is specifically designed to support visually impaired individuals by focusing on the tactile and auditory nature of music, rather than relying on visual cues. The typhlo music therapy model leverages the healing potential of music and emphasizes the rehabilitative and therapeutic dimensions of the human–music relationship. This approach distinguishes itself from other support interventions by structuring sessions and programs that do not require unimpaired visual perception, making it uniquely suited for individuals with serious visual impairments.

Developed in the early 1990s and refined over the past thirty years at the Karol Lipinski Academy of Music and the Maria Grzegorzewska Lower Silesian Special Educational Centre for Blind and Visually Impaired Children in Wroclaw, Poland, this model has been instrumental in stimulating the individual development of visually impaired persons. It helps improve impaired body functions and aids in their optimal adaptation to everyday life without sight. The model’s effectiveness is supported by the collective insights and experiences of professionals in clinical psychology, music therapy, music education, and special education, who have worked extensively with this approach. They have sought to standardize the basic terms, concepts, and definitions in the field of music therapy for visually impaired individuals, aligning them with the paradigms of special education.

2.2. Action Rules in Therapy

Action rules are a powerful tool in data mining and machine learning, used to suggest changes to achieve desired outcomes based on patterns identified in the data [17,18]. These rules have been applied in various domains, including medicine, where they help in reclassifying patients’ conditions and recommending appropriate treatments. In the context of therapy, action rules can provide insights into the factors that contribute to successful therapeutic interventions.

Action rules are defined as expressions having the format

A \land (B = B_{1} \to B_{2}) \Rightarrow (D = D_{1} \to D_{2})

where A represents the conjunction of stable attribute values,

(B = B_{1} \to B_{2})

represents the conjunction of changes of flexible attribute values, and

(D = D_{1} \to D_{2})

represents the conjunction of changes of decision attribute values (assuming that there is more than one decision attribute).

Let’s take the following example:

(A = a_{1}) \land (G = g_{2}) \land (B = b_{1} \to b_{2}) \land (H = ? \to h_{2}) \Rightarrow (D = d_{1} \to d_{2})

Terms

A = a_{1}

and

G = g_{2}

are stable attribute values, meaning they are attributes that do not change over time (their values cannot change). The next parts

(B = b_{1} \to b_{2})

and

(H = ? \to h_{2})

show the changes in flexible attribute values. These are the attributes that can change over time. The final term,

(D = d_{1} \to d_{2})

, is our consequent, the target value we wish to change. In the above example, it is stated that attribute A should equal

a_{1}

, attribute G should equal

g_{2}

, attribute B should change from

b_{1}

to

b_{2}

, and attribute H should change from any value to

h_{2}

. If all of these happen, then a change in our consequent D from

d_{1}

to

d_{2}

is expected.

In previous work, action rules have been employed to identify optimal treatment paths and to personalize medical recommendations. For instance, ref. [19] utilized action rules to extract knowledge from patient data, aiming to reduce hospital readmissions by recommending personalized treatment paths. Their approach involved grouping patients based on diagnosis similarities and providing actionable recommendations to physicians to optimize treatment outcomes and minimize readmissions. Similarly, ref. [20] applied action-rule discovery in the context of tinnitus retraining therapy (TRT), developing a clinical decision support system to guide clinicians in delivering personalized TRT. Their system used machine learning models and action rules to enhance the accuracy and explainability of diagnostic and treatment recommendations, achieving an average accuracy of 80% in preliminary testing. These studies highlight the potential of action rules to enhance decision-making processes in therapeutic settings by providing personalized and actionable insights.

2.3. Data-Driven Approaches in Therapeutic Settings

The integration of data-driven approaches in therapeutic settings has gained traction in recent years, driven by the increasing availability of clinical and behavioral data. Data analytics, machine learning, and data mining techniques are being used to uncover patterns and insights that can inform therapeutic practices. These approaches enable the development of personalized interventions that are more effective and tailored to individual needs.

Several studies have demonstrated the efficacy of data-driven approaches in therapy. For example, ref. [21] presented an overview of music recommendation systems, highlighting user-centric approaches that personalize recommendations by tracking the user’s context, including emotions and personality. This comprehensive review underscores the potential of music therapy in various medical contexts, such as reducing stress, improving mental health, and aiding in the treatment of chronic diseases. The work illustrates how personalized music therapy can benefit patients, particularly in aging societies and those with conditions like Alzheimer’s and Parkinson’s disease.

Similarly, ref. [22] developed an automatic music recommendation system designed to relieve stress by using real-time data from physiological sensors. Their system includes a small, portable device that fits on a finger and measures heartbeat signals. This device wirelessly sends data to a program that calculates the user’s stress level. By analyzing variations in heart rate, the system determines how stressed the user is while listening to music. The study found a strong link between the stress levels measured by the device and the music preferences of the participants, demonstrating that the system can effectively recommend music that helps reduce stress based on real-time physiological feedback.

These examples underscore the potential of data-driven methods to revolutionize therapeutic practices by providing personalized and actionable insights. By leveraging real-time data and machine learning techniques, these systems can tailor interventions to the specific needs of individuals, enhancing the overall effectiveness and efficiency of therapeutic processes.

3. Materials and Methods

3.1. Collaboration with the Domain Expert

This study greatly benefited from close collaboration with domain experts in music therapy, co-authors of this paper, whose expertise guided the design of our recommendation system for typhlo music therapy [16]. The insights they provided, were instrumental in shaping a patient-centered approach, ensuring that the system addressed specific therapeutic goals for visually impaired children. Their understanding of therapeutic processes informed the selection and definition of key features for our dataset, helping to align our methodology with the unique needs of the target population. This collaboration ensured that the therapeutic interventions developed through our system were both personalized and effective, highlighting the importance of integrating domain knowledge into the research process.

3.2. Data Overview

This study utilizes a dataset specifically curated to explore the impact of music on visually impaired children during therapy sessions. The dataset contains 2184 rows, each representing one child in a single therapy session. The data spans a period from 12 January 2016 to 1 September 2020 and includes 76 unique children.

The dataset attributes are grouped into four categories: stable classification attributes, flexible classification attributes, decision attributes, and meta-decision attributes. Among the flexible classification attributes, the music played during the therapy session is a central feature, with a total of 34 unique songs included. The most frequently played song was L’Éléphant by Camille Saint-Saëns, appearing 166 times, or in 7.60% of sessions. The least frequently played songs were Waltz in A-flat major, Op. 39, No. 15 by Johannes Brahms and Bourrée in B minor, BWV 1002 by Johann Sebastian Bach, appearing only in 16 sessions (or 0.73%) each. 11.76% of songs were played fewer than 30 times, and 5.88% of songs were played fewer than 20 times.

On average, each child participated in 28.74 sessions, with a standard deviation of 16.15. The number of sessions per child ranged from 7 to 61. The children’s ages ranged from 7 to 11 years, with a mean age of 8.05 years. These values are based on the age of each child at their first recorded session. On average, children stayed for 3.09 years (SD = 1.14), ranging from 1.02 to 4.64 years. These values are based on the data currently available and may not reflect the full duration for children who are still actively participating in sessions. Of the children included, 44.74% were female and 55.26% were male. This data provides a comprehensive overview of therapeutic goals and outcomes, supporting the exploration of action rules tailored to music therapy for visually impaired children.

The data used in this study is fully anonymized and contains no identifying information. Furthermore, we do not directly interact with or intervene in the lives of any participants. This work is therapist-facing: the action rules generated are shared with the therapist, not with the therapist’s patients. The goal is to provide additional insights to support and reflect on the therapist’s own practice. The original data was collected independently by a licensed practitioner in a professional context, and the therapist is an active research collaborator rather than an external data provider. Importantly, the insights derived from our analysis are intended to inform and augment existing therapeutic practices, not to replace or directly guide patient care.

3.2.1. Stable Classification Attributes

Stable classification attributes describe characteristics of each child that remain constant throughout the study (see Table 1). These attributes include demographic information such as age, sex, place of permanent residence, and type of visual disability. For example, the attribute ’Place of Permanent Residence (PPR)’ classifies whether a child lives in a large town, medium town, small town, or village. Other stable attributes like ’IQ’ and ’Foster Care’ provide further context about each child’s background and circumstances. A complete list of stable classification attributes is provided in Table 1.

3.2.2. Flexible Classification Attributes

Flexible classification attributes capture aspects of a child’s development and circumstances that may change over time (see Table 2). These include variables such as ’Emotional and Social Development-Beginning (ESD-b)’ and ’Emotional and Social Development-End (ESD-e),’ which measure the child’s progress in social skills during therapy. Specifically, ’beginning’ refers to the child’s emotional and social state before the start of each therapy session, while ’end’ refers to their state after the session concludes. Similarly, ’Motor Development-Beginning (MD-b)’ and ’Motor Development-End (MD-e)’ track changes in physical capabilities over the course of each session. Additionally, this category includes information about the music pieces played during therapy sessions, which can influence the child’s responses. These music selections are then used to extract detailed audio features, which are discussed in a subsequent section. For a detailed list of flexible classification attributes, refer to Table 2.

3.2.3. Decision and Meta-Decision Attributes

Meta-decision attributes are the primary targets for each child, as detailed in Table 3. For example, we may aim to improve the meta-decision attribute of “physiological reactions,” which measures how comfortable and open a child feels, by shifting it from “small” to “moderate.” Each meta-decision attribute is defined by associated decision attributes, as shown in Table 4. For instance, the decision attributes for “physiological reactions” include sweating (values: high, small, none), breathing (values: rapid, moderate, slow), and reddening (values: much, small, none).

The seven meta-decision attributes and their related decision attributes are as follows:

Physiological reactions: sweating, breathing, and reddening.
Motorics: changing body position, tightening of muscles, movement of legs, movement of hands, and performing co-movements.
Concentration of attention: expressing the desire to listen to music for longer, suggesting a desire to shorten the time to listen to music, and attention to acoustic phenomena occurring outside the music being listened to.
Experience of music: humming the melody of the presented music, performing the rhythm of presented music, rocking for presented music, and responding to changes in music.
Communication: communicating with words, communicating by gesticulation, and communicating with a mimic.
Blindism: rocking, head shaking, hands waving (not in front of eyes), hands waving in front of eyes, and eye rubbing.
Expression: the expression or manifestation of emotions and feelings, using gestures, using mimicry, verbalizing (expressing something with words), and vocalizing (expressing something, not necessarily with words).

These stable and flexible attributes, along with the meta-decision and decision attributes, provide a comprehensive understanding of each child’s characteristics and responses during music therapy. By capturing both long-term factors and short-term responses, they allow us to analyze both fixed factors, such as age and visual impairment type, and dynamic factors, such as emotional development and reaction to music. This layered approach ensures that individual differences are accounted for, enabling a nuanced analysis of therapeutic progress. This holistic perspective supports the creation of targeted, data-driven insights for improving therapy outcomes.

To better understand how music affects these meta-decision attributes, we then extracted new attributes, such as the music’s tempo (speed) and musical keys, further discussed in Section 3.3. These extracted features provide a more detailed view of the musical elements influencing a child’s response, allowing us to explore correlations between specific musical properties and therapeutic progress. This comprehensive dataset enables us to generate meaningful and actionable insights to support the personalized music therapy interventions, enhancing the ability of therapists to tailor sessions to the unique needs and preferences of each child.

3.3. Music Feature Extraction

To understand how different aspects of music affect the therapeutic outcomes, we extracted various features from the music used in our study. This process involved several steps, including data retrieval, basic information extraction, and audio feature analysis.

3.3.1. Data Retrieval and Basic Information Extraction

Initially, we compiled a list of music tracks used in the therapy sessions. Each track’s basic information, including the therapist-assigned music ID, title, artist, and URL, was collected and organized. We then split the title into separate columns for the artist and song name. Using the pyyoutube Python (version 3.8) library, we retrieved additional details from YouTube. First, we extracted the YouTube video ID from the URL, which differs from the therapist-assigned music ID. Next, we used the YouTube API to fetch video details, including the title, duration, and published date. Using these IDs, we downloaded the MP3 files of the YouTube videos for the next step in audio feature extraction.

3.3.2. Audio Feature Extraction

The librosa library was employed to analyze these audio files and extract relevant features. These features provide insight into various aspects of a song’s properties, which can impact a child’s emotional and physiological responses during therapy.

One key feature examined was tempo, which determines the speed of the music and can influence a child’s motor responses and emotional states during therapy. Tempo was estimated using a dynamic programming beat tracking algorithm inspired by Ellis [23] and implemented in the librosa Python library. Beat tracking in this framework proceeds in three stages, as stated in the documentation [24]:

Measure onset strength,
Estimate tempo from onset autocorrelation,
Select peaks in onset strength consistent with the estimated tempo.

The method balances two competing objectives: (1) maximizing onset strength at each hypothesized beat, and (2) maintaining a regular inter-beat interval consistent with a target tempo [23]. These are combined into an objective function:

C ({t_{i}}) = \sum_{i = 1}^{N} O (t_{i}) + α \sum_{i = 2}^{N} F (t_{i} - t_{i - 1}, t_{p})

where

O (t)

is the onset strength envelope at time t,

α

is a weighting parameter, and

F (Δ t, t_{p})

is a function that penalizes deviations from the target inter-beat interval

t_{p}

, defined as:

F (Δ t, t_{p}) = - {log}^{2} (\frac{Δ t}{t_{p}})

Dynamic programming is used to efficiently maximize this objective over all possible beat sequences, yielding a sequence of beat times that both align with high onset strengths and reflect a consistent rhythmic pattern. The onset strength envelope

O (t)

is derived from the audio signal via short-term Fourier transform and Mel bands mapping.

To intuitively describe this process, one can think of beat tracking as simulating how a listener might naturally tap along to the rhythm of a song. First, the algorithm identifies points in the audio where there are sudden increases in sound energy, such as the onset of a note or a drum hit. These are captured in an “onset strength envelope,” a curve that highlights moments where musical events are likely to occur. Next, the system analyzes the regularity of these onsets over time by comparing the envelope to delayed versions of itself, essentially checking how well the music aligns with a steady, repeating beat. This helps determine the overall tempo. Finally, using the estimated tempo, the algorithm selects beat positions that are both rhythmically regular and aligned with the strongest musical onsets.

Additionally, we extracted spectral features, including spectral centroid, which indicates the brightness of a sound by measuring the center of mass of the frequency spectrum. Higher centroid values suggest greater high-frequency content, contributing to a sharper or more piercing auditory experience, while lower values may signify a more dull sound [25]. The spectral centroid at time frame t is defined as:

centroid [t] = \frac{\sum_{k} S [k, t] \cdot freq [k]}{\sum_{j} S [j, t]}

where S is the magnitude spectrogram and

freq [k]

is the frequency in Hz corresponding to the k-th FFT bin [24,26]. Each frame of the spectrogram is treated as a distribution of energy over frequency, and the centroid is computed as the weighted average of those frequencies.

Spectral bandwidth (or spectral spread) quantifies the spread of the frequency spectrum around the spectral centroid, effectively measuring the range of frequencies present in a sound [26]. Intuitively, it is calculating a weighted average of how far each frequency is from the center. A narrow spectral bandwidth suggests that most of the sound’s energy is concentrated near the centroid, often indicating a more pure tone. A wider bandwidth suggests a sound with more distributed energy across a range of frequencies, often associated with noise, complex timbres, or certain instruments. Spectral bandwidth at time frame t is computed as:

{bandwidth}_{2} [t] = \sum_{k} {(freq [k] - centroid [t])}^{2} \cdot \tilde{S} [k, t]

In this equation:

$freq [k]$ is the frequency (in Hz) of bin k—i.e., what pitch this bin represents.
$centroid [t]$ is the spectral centroid at time t, representing the center of spectral energy.
${(freq [k] - centroid [t])}^{2}$ measures how far each bin is from the center, with squaring used to ensure positive values and emphasize larger deviations.
$\tilde{S} [k, t]$ is the normalized spectral magnitude at bin k and time t, indicating how much energy is present relative to the total.

The result is a weighted average of frequency distances from the centroid, where bins with more energy and farther distances contribute more. A lower bandwidth indicates a focused or pure sound, while a higher bandwidth reflects more spectral complexity or noise. This aligns with the definition of spectral spread provided by Klapuri and Davy ([26], p. 136).

Spectral contrast was also analyzed as a feature that captures the difference between spectral peaks (high energy) and valleys (low energy) within each frequency sub-band [27]. Each frame of the spectrogram is divided into several sub-bands (e.g., octave bands), and the contrast is computed as the log-ratio between the average energy in the upper quantile and the lower quantile of energy in each band. This metric reflects the relative distribution of harmonic (peak) and non-harmonic (valley) components in the spectrum [24,27]. High spectral contrast values could indicate narrow-band, harmonic-rich content (e.g., clear instrument tones), while lower values suggest broad-band, noise-like signals. The spectral contrast for sub-band k is calculated as [27]:

{SC}_{k} = {Peak}_{k} - {Valley}_{k}

where:

\begin{matrix} {Peak}_{k} & = log (\frac{1}{α N} \sum_{i = 1}^{α N} x_{k, i}^{'}) \end{matrix}

(1)

\begin{matrix} {Valley}_{k} & = log (\frac{1}{α N} \sum_{i = 1}^{α N} x_{k, N - i + 1}^{'}) \end{matrix}

(2)

In the above formulation, N denotes the total number of spectral bins in the k-th sub-band, with

x_{k, 1}^{'}, x_{k, 2}^{'}, \dots, x_{k, N}^{'}

representing the sorted magnitude spectrum values in descending order. The parameter

α

is a small positive constant (e.g.,

α = 0.2

), specifying the proportion of bins considered when computing the average. Specifically,

α N

represents the number of bins used to calculate both the spectral peak and valley. The term

x_{k, i}^{'}

refers to the i-th largest magnitude in the sub-band.

The spectral peak,

{Peak}_{k}

, is calculated as the logarithm of the average of the top

α N

highest-magnitude bins in sub-band k, capturing the energy of the most prominent spectral components. In contrast, the spectral valley,

{Valley}_{k}

, is computed as the logarithm of the average of the bottom

α N

lowest-magnitude bins, capturing the background energy or noise floor. A larger difference between peak and valley values indicates strong contrast between tonal and non-tonal components, which may suggest clearer harmonic content or instrumentation.

Intuitively,

{Peak}_{k}

measures the average strength of the strongest frequencies in sub-band k, likely corresponding to harmonic content.

{Valley}_{k}

measures the average strength of the weakest frequencies in that same band, often associated with noise or spectral gaps.

{SC}_{k}

then reflects the contrast between those two. The sharper the difference, the more tonal or structured the sound; the smaller the difference, the noisier or flatter the spectrum.

Spectral flatness, also known as the tonality coefficient, measures the extent to which a signal resembles noise versus being more tone-like [28]. It “indicates how flat the spectrum of a sound is” [26], and is defined as:

Flatness [t] = \frac{{(\prod_{k = 1}^{N} S [k, t])}^{1 / N}}{\frac{1}{N} \sum_{k = 1}^{N} S [k, t]}

where

S [k, t]

is the magnitude or power at frequency bin k and time frame t, and N is the number of frequency bins. This formulation reflects the classical spectral flatness measure, which compares the geometric and arithmetic means of the magnitude spectrum values. The standard (non-logarithmic) form is implemented in popular libraries, such as librosa, and is consistent with definitions provided in prior works [26,28].

Spectral flatness serves as an estimate of how structured or predictable a signal is. A value near 1.0 indicates a flat, noise-like spectrum (e.g., white noise), while lower values suggest a more tonal signal. Extensions such as the Generalized Spectral Flatness Measure (GSFM) incorporate information-theoretic corrections for non-Gaussian linear processes [28], but for this study, we use the standard non-logarithmic form available in librosa to characterize tonal versus noise-like qualities in audio.

Another crucial feature was zero-crossing rate (ZCR), which measures the frequency at which a signal changes polarity (crosses the zero axis). Higher ZCR values are typically found in noisier or percussive sounds, while lower values are associated with smoother, more harmonic tones [25,29]. For a frame t of length N, ZCR is computed as:

Z_{t} = \frac{1}{2} \sum_{n = 1}^{N} |sign (x [n]) - sign (x [n - 1])|

where

sign (x)

returns 1 for positive values and 0 for negative values. A higher ZCR indicates rapid signal fluctuations, characteristic of noise or high-frequency content, while a lower ZCR suggests smoother, more harmonic signals.

Additionally, we computed root mean square (RMS) energy, which provides a measure of the loudness or intensity of a song over time. RMS energy is calculated as the square root of the mean squared amplitude values within a frame:

RMS [t] = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} x_{t} {[n]}^{2}}

where

x_{t} [n]

is the audio signal in frame t of length N. This feature captures short-term energy dynamics and is particularly useful for identifying quiet versus energetic passages [29].

Furthermore, we analyzed chroma features, which represent the relative prominence of the 12 pitch classes (C, C#, D, etc.), allowing us to assess the harmonic and melodic content of the music. In other words, they are the distribution of spectral energy across the twelve semitone pitch classes of the musical octave (C, C#, D, …, B), collapsing across octaves to emphasize harmonic and melodic content [24,30]. This representation leverages the perceptual similarity of notes separated by one or more octaves (e.g., middle C and high C), and is useful for tasks like key estimation, chord recognition, and harmonic analysis.

To compute the chroma features, each frame of the short-time Fourier transform (STFT) is mapped to a 12-dimensional chroma vector using a predefined filter bank or projection matrix. Each STFT bin is associated with a chroma class, and the energy from that bin contributes to the corresponding pitch class:

{Chroma}_{c} [t] = \sum_{k \in K_{c}} | X [k, t] |

where

{Chroma}_{c} [t]

is the energy assigned to pitch class c at time frame t,

X [k, t]

is the complex STFT coefficient at frequency bin k, and

K_{c}

is the set of bins associated with chroma c. In practice, the implementation normalizes and smooths this energy to reduce octave bias and emphasize relative pitch content.

By aggregating spectral energy into chroma bins, this feature abstracts away absolute pitch and emphasizes tonal structure, which is critical in analyzing melodic and harmonic qualities relevant to music therapy outcomes.

By systematically extracting these features, we aimed to understand how various musical elements contribute to the effectiveness of music therapy for visually impaired children. These insights facilitate a deeper analysis of the relationship between specific musical properties and therapeutic progress.

3.3.3. Extracting Musical Keys from Chroma Features

To better understand the harmonic context of the music used in therapy sessions, we extracted musical keys from the chroma features. The chroma features represent the 12 different pitch classes of the musical octave and provide valuable information about the harmonic and melodic characteristics of the music.

The first method we used to analyze the chroma features is identifying the most frequently pressed pitch class. This method involves finding the pitch class with the highest value in the chroma vector, which indicates the note that appears most often in the music track.

Method: The chroma vector is converted from a string representation to a list of floats. The pitch class with the highest value is then identified using the np.argmax function.
Limitations: While this method provides insight into the most prominent note in the music, it does not necessarily reflect the overall key of the piece, as it does not consider the harmonic context.

To more accurately estimate the musical key, we implemented a method that compares the chroma vector against predefined templates for major and minor scales. This approach considers the overall harmonic structure rather than just the most frequently pressed pitch.

Method:
–
Major and Minor Scale Templates: We defined templates for all 23 major and minor scales. Each template is a chroma vector with 1s indicating the presence of notes in the scale and 0s otherwise.
–
Correlation Calculation: For each chroma vector, we calculate the correlation with each major and minor scale template using the np.correlate function. The scale with the highest correlation is considered the estimated key. Since there are relative scales, such as the C major scale and A minor scale being equivalent, we record multiple scales with the highest correlation if they are equal.
Benefits: This method provides a more accurate estimation of the musical key by taking into account the overall distribution of pitch classes in the chroma vector.

Examples

The following example illustrates the application of these methods to a music track:

Title: Für Elise
Estimated Scale: C major or A minor
Most Frequent Pitch: E

“Für Elise” is traditionally composed in the key of A minor. The most frequent pitch, E, is correctly identified as it is a dominant note in the piece. The extraction method estimated the scale as C major or A minor, which is accurate given their relative relationship. This correct estimation highlights the method’s ability to account for harmonic complexities and transitions present in the piece. The note E’s prominence in the melodic and harmonic structure of “Für Elise” might have reinforced its identification as the most frequent pitch. Despite the intricate harmonic elements, the method successfully captured the essential musical features, demonstrating its effectiveness in scale and pitch analysis.

Title: Moonlight Sonata
Estimated Scale: E Major or C# Minor
Most Frequent Pitch: G#

As another example, “Moonlight Sonata” was predicted to be in E major or C# minor scale, and the most frequent pitch was identified as G#. “Moonlight Sonata” is in C# minor, so this prediction aligns with the known key, further validating the method’s accuracy in estimating scales and identifying predominant pitches.

By combining both methods, we gain a comprehensive understanding of the music’s harmonic structure, which can influence the therapeutic outcomes.

3.4. Data Cleaning and Preparation

3.4.1. Handling Missing Values

In the initial data preprocessing stage, several steps were taken to clean and prepare the dataset for analysis. Handling missing values was a key part of this process. For attributes such as “Sib” (number of siblings), empty or unknown values (denoted by “?”) were replaced with “−1” to signify an unknown number of siblings, while valid values ranged from 0 to 4. Similarly, for attributes like “PDe” (which takes values “A” or “N”), any missing or null entries were replaced with “U” to indicate an unknown status. The same approach was applied to the “PHYSREACT” and “ESD-e” attributes, where null values were substituted with “U” to maintain consistency in the dataset.

During the music feature extraction process, missing or unavailable audio files were handled by setting the corresponding feature values to None. This ensured that any unavailable tracks did not introduce inconsistencies in the dataset. Additionally, all extracted features were systematically reviewed and cleaned to maintain accuracy and reliability.

3.4.2. Discretization of Continuous Audio Features

To facilitate classification analysis, continuous numerical audio features were discretized into categorical bins, allowing complex audio characteristics to be transformed into meaningful, interpretable categories.

To ensure consistency in the analysis of chroma features, we implemented a quantile-based binning approach. First, we calculated global quantile thresholds across all chroma features combined, defining uniform binning criteria to ensure that variations in tonal presence were captured in a standardized manner. Specifically, the 33rd and 66th percentiles were identified as the thresholds for categorizing chroma values into “Low,” “Medium,” and “High,” with threshold values determined at 0.23 and 0.34, respectively. These thresholds were then consistently applied to each individual chroma feature, ensuring that each bin represented the same range of values across different tonal components. By maintaining a consistent discretization method, this approach facilitated meaningful comparisons between chroma features and preserved interpretability in subsequent analyses.

The overall intensity of a song was categorized based on its RMS energy. Using dataset-wide quantile thresholds, loudness was divided into three levels: low, which falls below the 33rd percentile and corresponds to quiet or soft sounds; medium, which falls between the 33rd and 66th percentiles and represents moderate or normal sounds; and high, which is above the 66th percentile and corresponds to loud or intense sounds. Since RMS energy directly translates to “Loudness,” this categorization aligns with terminology familiar to music therapists.

Several spectral features were also categorized using the same 33rd and 66th percentile thresholds. The spectral centroid, which captures the frequency distribution, was renamed to “Brightness,” where higher values indicate a sharper or more treble-heavy sound. Low brightness is associated with dark, bass-heavy sounds, medium brightness represents balanced, mid-range sounds, and high brightness corresponds to bright, high-frequency sounds. Spectral bandwidth, renamed to “Timbre Complexity,” measures the spread of frequencies, with higher values indicating richer harmonic content. Low complexity corresponds to simple, pure sounds, medium complexity includes moderately complex sounds with some harmonics, and high complexity is associated with rich, layered frequencies, often perceived as complex or noisy.

Spectral contrast was renamed to “Harmonic Richness,” which measures the amplitude differences between spectral peaks and valleys, reflecting the diversity of harmonic content. Low richness is associated with smooth, uniform sounds such as white noise or simple tones, medium richness includes sounds with some harmonic variation, and high richness represents complex, harmonically rich audio. Spectral flatness was renamed to “Tonal Noise,” differentiating between tonal and noisy sounds. Low flatness values indicate tonal sounds with clear pitch, medium flatness represents a mix of tonal and noise-like qualities, and high flatness corresponds to noisy, unpitched sounds.

The zero-crossing rate (ZCR), which identifies percussiveness and the frequency of polarity changes in a signal, was renamed to “Sharpness.” Low sharpness is associated with smooth, less percussive sounds, moderate sharpness includes sounds with some transient qualities, and high sharpness corresponds to sharp, transient-rich sounds, often perceived as harsh or noisy.

Tempo values were categorized based on standard musical tempo ranges: andante (slow) for tempos between 56 and 108 BPM, moderato (medium) for tempos between 108 and 120 BPM, and allegro (fast) for tempos between 120 and 156 BPM. These categories allowed for a structured examination of tempo variations and their potential effects in therapy sessions.

The categorical names were chosen to be intuitive for music therapists, aligning with familiar terms that provide a meaningful and interpretable representation of the audio characteristics in the dataset.

3.4.3. Generate Classification Rules

After addressing missing values and discretizing continuous features, we ensured that all features were set to nominal (categorical) data types using the WEKA tool (version 3.8.4). Since our goal was to generate classification rules using the Rough Sets Exploration System (RSES), we filtered the dataset to include only the relevant columns: all flexible attributes and the specific decision attributes associated with the current meta-decision attribute (e.g., sweating, breathing, and reddening for “PHYSREACT”). The resulting dataset was exported as an ARFF file, which RSES can directly ingest.

We then used RSES to generate classification rules based on rough set theory. RSES performs key data analysis tasks such as identifying reducts, minimal subsets of attributes that preserve classification ability, and constructing rules that associate specific combinations of attribute values with outcome classes (classification rules). In our context, the input to RSES includes both musical and child-specific features (e.g., tempo, spectral contrast, IQ, motor development) as flexible attributes, and therapy-related decision outcomes as target labels.

For example, if a particular combination of musical features (e.g., tempo, spectral contrast) and child characteristics (e.g., IQ, motor development) frequently shows higher levels in motorics, this pattern is captured as a classification rule. These classification rules serve as the foundation for the action-rule generation process that follows.

3.5. Action-Rule Generation

In this study, we utilize an attribute correlation-based vertical partitioning method for generating action rules, which we introduced in our previous works [31,32]. This method is chosen for its ability to handle datasets with numerous flexible attributes efficiently. The process is illustrated in Figure 1 and involves the following steps:

Calculate Attribute Correlations: We begin by calculating the correlations between the flexible classification attributes in the dataset. Pearson’s correlation coefficient is used for continuous attributes, Cramer’s V for categorical attributes, and the Correlation Ratio for mixed attribute types. These associations are calculated using the Dython Python library.
Perform Agglomerative Clustering: Using the calculated correlations, we create a distance matrix, defined as:

$Distance Matrix = 1 - \frac{| associations | + 1}{2}$

We then perform agglomerative clustering with single linkage to generate a dendrogram, which helps us identify clusters of flexible attributes. An example dendrogram is provided in Figure 2.
Generate Action Rules for Clusters: For each cluster of attributes derived from the dendrogram, we generate action rules. This process can be executed in parallel for efficiency, utilizing the Python Ray library. First, classification rules are generated using the Rough Sets Exploration System (RSES) tool. These classification rules identify patterns and relationships between attributes and decision outcomes. Our custom program then uses these classification rules as input to generate action rules, which suggest changes in flexible attributes that can lead to desired transitions between decision classes. This process is illustrated further in Figure 3 and discussed in Section 3.5.3.
Combine Rules from Clusters: We create all possible combinations of action rules from each cluster, considering their support. The rule combination process enhances efficiency by introducing depth-controlled exploration and applying confidence and support thresholds for pruning. This ensures only the most relevant and supported action rules are retained.
Evaluate and Select Optimal Partition: Finally, we evaluate the sets of action rules generated at each dendrogram level by calculating their F-scores. The level with the highest F-score is selected as the optimal partition, containing the final set of action rules. While F-score is used for this study, other metrics like lightness, coverage, or the number of rules can also be considered for determining the best level.

While the primary goal of this paper is to apply a previously validated action-rule discovery method to the context of typhlo music therapy, our earlier work [31] provides a comparative analysis of this method against random and unstructured partitioning approaches. Specifically, our correlation-based vertical partitioning method consistently produced more robust and interpretable action rules in a single iteration. It achieved higher lightness and precision while reducing redundancy and variability across rule sets. By clustering attributes based on their correlations, this method enhances the relevance and consistency of the resulting rules, offering significant advantages when actionable insights are critical. For these reasons, we adopt this structured approach in the current work without further re-evaluating alternative extraction methods.

More detail for each of the steps is provided below.

3.5.1. Calculate Attribute Correlations

To calculate the correlations between flexible classification attributes, we rely on the Dython Python library, which provides a comprehensive toolkit for computing associations between different attribute types. In our prior work [31], we implemented a flexible correlation framework capable of handling continuous, categorical, and mixed attribute types:

Pearson’s Correlation Coefficient for continuous attributes.
Cramer’s V for categorical attributes.
Correlation Ratio for mixed types of attributes.

While the system supports all three measures, for the present study all features were discretized into categorical values as described in Section 3.4.2. As a result, only Cramer’s V was used to calculate attribute correlations for the clustering process.

3.5.2. Perform Agglomerative Clustering

The distance matrix, computed using the correlation values, is defined by the equation:

Distance Matrix = 1 - \frac{| associations | + 1}{2}

Agglomerative clustering is then performed using the SciPy library, employing single linkage to create a dendrogram. This dendrogram helps identify clusters of flexible attributes, which are then analyzed at various levels to find the optimal partition.

Clustering features prior to action-rule generation offers several advantages. By grouping highly correlated attributes into clusters, the system reduces redundancy and improves interpretability of the resulting rules. Rather than evaluating all possible combinations of flexible attributes (which can lead to an exponential number of irrelevant or low-quality rules), clustering focuses rule generation within semantically coherent subsets of features. This not only enhances computational efficiency but also ensures that the resulting action rules are more aligned with real-world therapeutic patterns. Our vertical partitioning method thus helps isolate meaningful changes across similar attributes and minimizes noise from unrelated feature interactions.

3.5.3. Generate Action Rules for Clusters

An action rule describes how a set of feature changes can lead to a desired outcome, such as improvement in developmental metrics. In this step, we aim to generate action rules for each cluster of attributes at different dendrogram levels to reveal insights into which changes in flexible attributes can influence therapeutic outcomes (see Figure 1).

First, RSES is used to extract classification rules from the dataset, as discussed in Section 3.4.3. These rules are first grouped into two sets based on the decision outcome of interest: those associated with the initial (less desirable) state and those representing the target (desired) state. For instance, if we are examining transitions in motor development, one group might contain rules corresponding to “moderate” outcomes and the other to “good” outcomes.

Next, we iterate over all combinations of rules from these two groups to identify pairs that are compatible. Compatibility is defined by matching stable attributes. That is, attributes that do not change between the two rules. This ensures that the resulting action rules isolate meaningful changes in flexible attributes. Pairs that meet predefined support thresholds and exhibit consistent stable attributes are retained for further analysis.

For each compatible rule pair, we then compare the flexible attributes and identify differences that could explain the shift in decision outcome. If at least one flexible attribute differs between the rules and other thresholds (e.g., confidence, support) are met, we construct an action rule capturing the suggested change. These action rules highlight how a transition from one decision state to another might be achieved by modifying a small set of attributes, while holding others constant.

To manage the computational demands of evaluating all rule combinations across multiple clusters and decision outcomes, this process is parallelized using the Ray library. This allows the system to efficiently scale across larger datasets and attribute sets, significantly reducing runtime.

3.5.4. Combine Rules from Clusters

The rules from each cluster are then combined by considering their support and confidence. This involves a depth-controlled approach, with thresholds applied to ensure only the most robust rules are retained. The process systematically explores combinations of rules, merging those that meet the established criteria.

3.5.5. Evaluate and Select Optimal Partition

The action rules generated for each dendrogram level are evaluated based on their F-scores, which are calculated as the harmonic mean of precision and recall. The level with the highest F-score is selected as the optimal partition. This process ensures a thorough exploration of potential rule combinations, optimizing the quality and efficiency of the resultant rule set.

By focusing on the attribute correlation-based vertical partitioning method, this study aims to generate high-quality action rules that can provide actionable insights for personalized music therapy interventions in typhlo music therapy.

4. Results

The dataset analyzed includes attributes such as child demographic details (e.g., ‘Sex’, ‘Sib’ for number of siblings, and ‘IQ’ level), therapy-specific attributes (e.g., ‘Motorics’, ‘PDe’, and ‘HD’ for health conditions), as well as music-related attributes (e.g., chroma features like ‘chroma_E’ and ‘chroma_A#’, tempo, and other tonal characteristics). These music attributes are particularly important, as they provide insight into how the tonal and structural elements of music may influence therapeutic outcomes.

In the following section, we will explore some of the action rules generated through our analysis. These rules provide insight into how changes in both demographic and music-specific attributes can lead to improved therapeutic outcomes. By interpreting these rules, we aim to better understand the key factors influencing progress in areas such as motorics, attention, and communication during music therapy. A summary of all the rules discussed here are shown in Table 5.

4.1. Validating Movement as a Driver of Motoric Improvement

Example 1.

In one of the instances of generated action rules aimed at improving the motorics attribute, we examined a rule with a confidence of 0.888 and a support of 16. This rule provides valuable insight into how certain demographic, cognitive, and health-related factors can influence motoric development in visually impaired children undergoing music therapy. The rule, shown in Equation (3) is interpreted as follows: If a male child (Sex = M) with no other dysfunctions and illnesses (DI = N) starts the therapy with above-average speech skills (US-b = C), follows the usual school program (SP = N), has 2 siblings (Sib = 2), has an average IQ (IQ = A), and is blind (TVD = B), and if their prenatal development was normal (PDe = N), with no other dysfunctions (OD = N), no intellectual disabilities (ID = N), and known family information (ICF = K), they are in a family foster care setting (FC = F), and have no physical disabilities (PD = N), no hearing disabilities (HD = N), and they start with no movement of hands (MH = N) but this changes to frequent movement (MH = F), then it is likely that their motorics can improve from moderate (MOTORICS = M) to good (MOTORICS = G).

\begin{matrix} (D I = N) \land (U S - b = C) \land (S e x = M) \land (S P = N) \land (S i b = 2) \\ \land (I Q = A) \land (T V D = B) \land (P D e = N) \land (O D = N) \land (I D = N) \\ \land (I C F = K) \land (P D = N) \land (F C = F) \land (H D = N) \land (M H = N \to F) \\ \Rightarrow (M O T O R I C S = M \to G) \end{matrix}

(3)

Example 2.

In another example, we see a rule shown in Equation (4) with a confidence of 0.623 and a support of 16. This reads as follows: If a male child (Sex = M) was born naturally (Childbirth = N), has low vision (TVD = L), no physical disabilities (PD = N), no hearing disabilities (HD = N), lives in a partial family setting (Family = P), with no other dysfunctions and illnesses (DI = N) and no intellectual disabilities (ID = N), and if their prenatal development was normal (PDe = N), and they live in a village (PPR = V), have 2 siblings (Sib = 2), with above-average IQ (IQ = C), and they begin therapy with correct emotional and social development (ESD-b = C), no other dysfunctions (OD = N), are in a family foster care situation (FC = F), are in the usual school program (SP = N), with known family information (ICF = K), and if their muscle tightening changes from none (TM = N) to frequent (TM = F), this indicates that their motorics can improve from poor (MOTORICS = P) to moderate (MOTORICS = M).

\begin{matrix} (S e x = M) \land (C h i l d b i r t h = N) \land (T V D = L) \land (P D = N) \land (H D = N) \\ \land (F a m i l y = P) \land (D I = N) \land (I D = N) \land (P D e = N) \land (P P R = V) \land (S i b = 2) \\ \land (I Q = C) \land (E S D - b = C) \land (O D = N) \land (F C = F) \\ \land (S P = N) \land (I C F = K) \land (T M = N \to F) \\ \Rightarrow (M O T O R I C S = P \to M) \end{matrix}

(4)

In the first example, the rule suggests that if hand movement increases from none to frequent, the child’s motorics are likely to improve from moderate to good. While this might seem intuitive, as increased physical activity is naturally associated with better motor development, it offers data-backed confirmation of this relationship, giving therapists a quantifiable basis for focusing on activities that promote movement. Similarly, in the second example, the rule indicates that if muscle tightening progresses from none to frequent, the child’s motorics can improve from poor to moderate. This aligns with established expectations that increased muscle engagement is linked to improved motoric function. Once again, this rule reinforces a practical, intuitive understanding with empirical support from the data, highlighting the importance of fostering movement during therapy.

4.2. Tonal Preferences and Their Impact on Attention

Example 3.

In one of the instances of generated action rules aimed at improving the concentration and attention (CONC_ATT) attribute, we examined a rule with a confidence of 0.714 and a support of 10. This rule provides valuable insight into how certain demographic, cognitive, and health-related factors, as well as music features, can influence concentration and attention in visually impaired children undergoing music therapy. The rule, shown in Equation (5), is interpreted as follows: If a child has no prenatal development issues (PDe = N), with a known family information context (ICF = K), no intellectual disabilities (ID = N), no dysfunctions (DI = N), and no other disabilities (OD = N), and they have low visual disabilities (TVD = L), an average IQ (IQ = A), are in family foster care (FC = F), have no hearing disabilities (HD = N), no speech issues (SP = N), no physical disabilities (PD = N), with average childbirth conditions (Childbirth = A), above-average speech abilities (US-b = A), are from a reconstructed family (Family = R) with both parents but only one being biological), and the chroma E music feature increases from low to high (chroma_E_quantile_binned = Low -> High), then it is likely that their concentration and attention (CONC_ATT) can improve from none (N) to small (S).

\begin{matrix} (P D e = N) \land (I C F = K) \land (I D = N) \land (D I = N) \land (O D = N) \land (T V D = L) \\ \land (I Q = A) \land (F C = F) \land (H D = N) \land (S P = N) \land (P D = N) \\ \land (C h i l d b i r t h = A) \land (U S - b = A) \land (F a m i l y = F) \land (E S D - e = C \to N) \\ \land (c h r o m a_E_q u a n t i l e_b i n n e d = Low \to High) \\ \Rightarrow (C O N C_A T T = N \to S) \end{matrix}

(5)

Example 4.

In another instance of generated action rules aimed at understanding changes in concentration and attention (CONC_ATT), we examined a rule with a confidence of 0.533 and a support of 8. This rule provides insight into how demographic, cognitive, and family-related factors, along with a specific music feature, can influence a shift in concentration and attention for visually impaired children in music therapy. The rule, shown in Equation (6), is interpreted as follows: If a male child (Sex = M) with no speech issues (SP = N), no dysfunctions (DI = N), no prenatal development issues (PDe = N), has an average speech ability at the beginning of the session (US-b = A), no other disabilities (OD = N), and is in family foster care (FC = F), no hearing disabilities (HD = N), no physical disabilities (PD = N), and has low vision (TVD = B), with the family being rated as reconstructed (both parents are present, but only one is biological) (Family = R), and has correct emotional and social development at the beginning of therapy (ESD-b = C), with no intellectual disabilities (ID = N), and has normal motor development at the beginning of therapy (MD-b = N), a known family information context (ICF = K), an average IQ (IQ = A), and where the music feature chroma A# changes from medium to low (chroma_A#_quantile_binned = Medium -> Low), then it is likely that their concentration and attention (CONC_ATT) can increase from small (S) to moderate (M).

\begin{matrix} (S P = N) \land (D I = N) \land (P D e = N) \land (U S - b = A) \land (O D = N) \\ \land (F C = F) \land (H D = N) \land (P D = N) \land (S e x = M) \land (T V D = B) \\ \land (F a m i l y = R) \land (E S D - b = C) \land (I D = N) \land (M D - b = N) \land (I C F = K) \\ \land (I Q = A) \land (c h r o m a_A #_q u a n t i l e_b i n n e d = Medium \to Low) \\ \Rightarrow (C O N C_A T T = S \to M) \end{matrix}

(6)

In music theory, chroma features refer to the energy distribution across the 12 different pitch classes, often corresponding to the 12 notes in the chromatic scale (C, C#, D, D#, E, F, F#, G, G#, A, A#, B). These features capture harmonic content by measuring how dominant each of these pitches is in a piece of music, without regard for the octave in which the notes occur.

For example, if a piece of music is primarily in the key of C major, the chroma feature for “C” will have a high value, while notes like “F#” might have a very low value. Chroma features are especially useful for understanding the harmonic or tonal structure of music, which is a fundamental aspect of how music feels to listeners.

Based on these rules, it is possible that some children may respond better to music in certain keys or tonalities. For instance, they might find music in major keys more engaging or comfortable, which could lead to increased concentration or focus. Conversely, they might find certain keys or harmonic structures (e.g., minor or diminished chords) less appealing, causing a decrease in attention.

Chroma features might indirectly indicate how well certain types of music resonate with a child’s emotional or cognitive state. If certain children respond better when the chroma E or chroma A# (as seen in the rules above) is higher, it could suggest that certain tonalities help them focus more effectively. In contrast, other tonalities might be less effective in maintaining attention.

Based on the rule shown in Equation (5), when chroma E increases from low to high, concentration and attention improve (CONC_ATT = N -> S). This might suggest that music with a stronger presence of E (or music in keys that heavily feature E) helps improve focus for these particular children. In the rule shown in Equation (6), when chroma A# decreases from medium to low, concentration and attention also increases (CONC_ATT = S -> M). Again, this could imply that music with less emphasis on A# (or related keys) leads to increased focus depending on the child.

This raises several important hypotheses for further exploration. First, it is possible that some children may inherently prefer music in specific keys, leading to improved focus and engagement when music emphasizes certain chroma features. For instance, children may concentrate better when exposed to music in major keys or keys that emphasize particular tonalities like E, while music in other keys may result in decreased attention. Second, chroma features might indirectly reflect the emotional or cognitive engagement of children during therapy, indicating that certain tonalities are more effective at eliciting a therapeutic response.

These insights offer promising directions for personalizing music therapy based on the tonal preferences of individual children. By tailoring music selections to emphasize chroma features that are associated with better concentration and attention, therapists may be able to enhance the therapeutic outcomes of their sessions. Future research could investigate whether consistent patterns of preference for specific tonalities exist among children in therapy, potentially leading to more effective and personalized therapeutic interventions.

4.3. Exploring the Role of Tonal and Mixed Sounds

Example 5.

In one of the action rules generated to improve concentration and attention (CONC_ATT), we examined a rule with a confidence of 1 and a support of 6. This rule sheds light on how changes in music attributes, particularly the transition from tonal to mixed sound environments, can influence therapeutic outcomes. The rule, shown in Equation (7), is interpreted as follows: If a child has no hearing disabilities (HD = N), is in family foster care (FC = F), has an average IQ (IQ = A), one sibling (Sib = 1), no physical disabilities (PD = N), a known family information context (ICF = K), no dysfunctions (DI = N), no speech problems (SP = N), above-average speech abilities (US-b = A), comes from a family that is full, or with both parents (Family = F), a Caesarean section or artificial conditions for childbirth (Childbirth = A), no intellectual disabilities (ID = N), and the tonal quality of the music changes from purely tonal to mixed (Tonal_Noise = Tonal -> Mixed), then their concentration and attention (CONC_ATT) can improve from moderate (M) to high (H).

\begin{matrix} (H D = N) \land (F C = F) \land (I Q = A) \land (S i b = 1) \land (P D = N) \\ \land (I C F = K) \land (D I = N) \land (S P = N) \land (U S - b = A) \land (F a m i l y = F) \\ \land (C h i l d b i r t h = A) \land (I D = N) \land (T o n a l_N o i s e = Tonal \to Mixed) \\ \Rightarrow (C O N C_A T T = M \to H) \end{matrix}

(7)

Example 6.

In one of the instances of generated action rules aimed at improving motoric abilities (MOTORICS), we examined a rule with a confidence of 0.6 and a support of 6. This rule provides valuable insight into how various demographic, cognitive, and music-related factors can influence motoric improvement in visually impaired children undergoing music therapy. The rule, shown in Equation (8), is interpreted as follows: If a child has no physical disabilities (PD = N), is male (Sex = M), has no intellectual disabilities (ID = N), no hearing disabilities (HD = N), an average IQ (IQ = A), no other disabilities (OD = N), is in family foster care (FC = F), has no speech problems (SP = N), has above-average speech abilities (US-b = A), has no prenatal development issues (PDe = N), no dysfunctions (DI = N), a known family information context (ICF = K), maintains engaging emotional and social development throughout therapy (ESD-e = C -> C), and if the timbre complexity of the music decreases from high to low (Timbre_Complexity = High -> Low), the tonal quality changes from tonal to noisy (Tonal_Noise = Tonal -> Noisy), and hand movement increases from none to frequent (MH = N -> F), then it is likely that their motoric abilities (MOTORICS) can improve from poor (P) to good (G).

\begin{matrix} (P D = N) \land (S e x = M) \land (I D = N) \land (H D = N) \land (I Q = A) \\ \land (O D = N) \land (F C = F) \land (S P = N) \land (U S - b = A) \land (P D e = N) \\ \land (D I = N) \land (I C F = K) \land (E S D - e = C \to C) \land (T i m b r e_C o m p l e x i t y = High \to Low) \\ \land (T o n a l_N o i s e = Tonal \to Noisy) \land (M H = N \to F) \Rightarrow (M O T O R I C S = P \to G) \end{matrix}

(8)

The shift from Tonal to Mixed in the Tonal_Noise feature, shown in Equation (7), suggests that moving from purely tonal, structured music to music that incorporates a mix of tonal and noisier elements can result in increased concentration and attention. This may indicate that children find purely tonal music less stimulating or engaging over time. A soundscape, which refers to the auditory environment encompassing all sounds present, can influence the cognitive and emotional engagement of the listener. While therapists may not have control over the entire soundscape—such as background noise or environmental sounds—they can intentionally integrate music with mixed tonal and noisier elements into therapy sessions. For example, therapists might alternate between pieces of tonal music and tracks with layered, complex textures or even include subtle, rhythmic environmental sounds like soft percussive effects or natural ambiances to create a more engaging auditory experience.

In the context of music therapy, this shift could be particularly useful for children who need a balance between predictability (in tonal music) and stimulation (from noisier or more abstract sound elements). The rule highlights the importance of adapting the auditory environment during therapy to maintain or improve attention levels, reinforcing the potential for using more dynamic and varied music selections to enhance therapeutic outcomes.

Additionally, in a related rule, shown in Equation (8), the transition from Tonal to Noisy soundscapes, combined with a decrease in Timbre Complexity from high to low, is associated with improvements in motoric abilities. This shift may suggest that, similar to attention, motoric development can be positively influenced by varying the auditory complexity of the music. As the music shifts to noisier elements and simpler timbres, it could create a different kind of sensory engagement that encourages physical movement. This is further reinforced by the increase in hand movement from none to frequent during the therapy sessions, indicating that such auditory adjustments might stimulate physical responses. The rule underscores the role of music complexity in not only maintaining cognitive engagement but also in fostering physical activity, which is crucial for motoric improvement.

4.4. Effects of Increased Harmonic Richness on Expression and Communication

Example 7.

In one of the instances of generated action rules aimed at improving the expression (EXPRESSION) attribute, we examined a rule with a confidence of 1 and a support of 6. This rule provides valuable insight into how various demographic, developmental, and music-related factors can influence the expressive abilities of visually impaired children undergoing music therapy. The rule, shown in Equation (9), is interpreted as follows: If a child has known family information context (ICF = K), has two siblings (Sib = 2), no physical disabilities (PD = N), is 10 years old (Age = 10), had no prenatal development issues (PDe = N), no intellectual disabilities (ID = N), is in family foster care (FC = F), is in fourth grade (Grade = 4), had delayed motor development at the beginning of therapy (MD-b = D), has no hearing disabilities (HD = N), normal conditions during childbirth (Childbirth = N), has correct emotional and social development at the beginning of therapy (ESD-b = C), has no speech problems (SP = N), no other disabilities (OD = N), an average IQ (IQ = A), no dysfunctions (DI = N), is male (Sex = M), and if the harmonic richness of the music increases from medium to high (Harmonic_Richness = Medium -> High), then it is likely that their expression (EXPRESSION) can improve from poor (P) to moderate (M).

\begin{matrix} (I C F = K) \land (S i b = 2) \land (P D = N) \land (A g e = 10) \land (P D e = N) \\ \land (I D = N) \land (F C = F) \land (G r a d e = 4) \land (M D - b = D) \land (H D = N) \\ \land (C h i l d b i r t h = N) \land (E S D - b = C) \land (S P = N) \land (O D = N) \land (I Q = A) \\ \land (D I = N) \land (S e x = M) \land (H a r m o n i c_R i c h n e s s = Medium \to High) \\ \Rightarrow (E X P R E S S I O N = P \to M) \end{matrix}

(9)

Example 8.

In one of the instances of generated action rules aimed at improving communication (COMMUNICATION), we examined a rule with a confidence of 0.5 and a support of 4. This rule provides valuable insight into how various demographic, developmental, and music-related factors can influence communication abilities in visually impaired children undergoing music therapy. The rule, shown in Equation (10), is interpreted as follows: If a child has correct emotional and social development at the beginning of therapy (ESD-b = C), normal conditions during childbirth (Childbirth = N), no other disabilities (OD = N), is in family foster care (FC = F), has known family information context (ICF = K), has one sibling (Sib = 1), delayed motor development at the beginning of therapy (MD-b = D), has physical disabilities (PD = Y), no dysfunctions (DI = N), no hearing disabilities (HD = N), is female (Sex = F), and if the chroma features change, such as an increase in chroma A# from low to high (chroma_A#_quantile_binned = Low -> High) and chroma F from low to medium (chroma_F_quantile_binned = Low -> Medium), along with a reduction in loudness (Loudness = High -> Medium) and an increase in harmonic richness (Harmonic_Richness = Medium -> High), then it is likely that their communication (COMMUNICATION) can improve from very poor (V) to moderate (M).

\begin{matrix} (E S D - b = C) \land (C h i l d b i r t h = N) \land (O D = N) \land (F C = F) \land (I C F = K) \\ \land (S i b = 1) \land (M D - b = D) \land (P D = Y) \land (D I = N) \land (H D = N) \\ \land (S e x = F) \land (c h r o m a_F_q u a n t i l e_b i n n e d = Low \to Medium) \\ \land (c h r o m a_A #_q u a n t i l e_b i n n e d = Low \to High) \\ \land (L o u d n e s s = High \to Medium) \land (H a r m o n i c_R i c h n e s s = Medium \to High) \\ \Rightarrow (C O M M U N I C A T I O N = V \to M) \end{matrix}

(10)

Harmonic richness is a measure of the complexity and diversity of harmonics present in a piece of music. It refers to the layering of frequencies and overtones that make up the overall texture and depth of sound. In the context of music therapy, increasing harmonic richness implies that the music shifts from simpler, less complex harmonic structures to more layered and intricate soundscapes.

The action rules suggest that an increase in harmonic richness can positively impact both expression and communication abilities in children undergoing music therapy. For example, the rule related to expression, shown in Equation (9), indicates that a transition from medium to high harmonic richness is associated with an improvement in expressive abilities, moving from poor to moderate expression. Similarly, for communication, shown in Equation (10), an increase in harmonic richness along with other musical adjustments contributes to a change from very poor to moderate communication skills.

One possible reason why increased harmonic richness is beneficial could be that it provides more stimulating and engaging auditory environments. Richer harmonic content can evoke a wider range of emotional responses, potentially encouraging children to express themselves more freely and communicate more effectively during sessions. This idea is supported by findings from [33], which suggest that harmonic structure plays a role in shaping emotional perceptions in music, even across different cultures. The study demonstrates that variations in harmonic content can evoke specific emotional responses, creating a deeper connection between the listener and the music. Such findings suggest that increased harmonic richness in music therapy could help children engage more deeply, facilitating the development of expressive skills as they connect more closely with the music being played.

These findings highlight the importance of customizing music therapy sessions with the appropriate level of harmonic richness to target specific therapeutic goals. Music therapists can use pieces with higher harmonic richness when aiming to enhance a child’s expressive or communication abilities, offering a tailored approach based on each child’s needs. The rules suggest that increasing harmonic richness is not a universal solution, but rather one that needs to be applied in specific contexts and tailored to the child’s individual progress. Understanding how changes in musical complexity impact cognitive and communicative skills could help therapists optimize their strategies and select music that best supports therapeutic goals for an individual child.

4.5. Loudness Adjustment and Communication Improvement

Example 9.

In one of the instances of generated action rules aimed at improving communication (COMMUNICATION), we examined a rule with a confidence of 1 and a support of 4. This rule provides valuable information on how demographic, cognitive, and music-related factors can influence communication abilities in visually impaired children undergoing music therapy. The rule, shown in Equation (11), is interpreted as follows: If a child is male (Sex = M), is in family foster care (FC = F), has no intellectual disabilities (ID = N), no other disabilities (OD = N), no speech problems (SP = N), no prenatal development issues (PDe = N), no physical disabilities (PD = N), a known family information context (ICF = K), no dysfunctions (DI = N), has low visual disabilities (TVD = B), has two siblings (Sib = 2), no hearing disabilities (HD = N), an average IQ (IQ = A), maintains engaging emotional and social development throughout therapy (ESD-e = C -> C), shows improvement in communication methods (CM = N -> R), and if various chroma features change, including an increase in chroma A# from low to high (chroma_A#_quantile_binned = Low -> High), and decreases or adjustments in other chroma values (chroma_F_quantile_binned = Low -> Medium, chroma_D_quantile_binned = High -> Medium, chroma_B_quantile_binned = High -> Low, chroma_F#_quantile_binned = Medium -> Low), along with a reduction in loudness (Loudness = High -> Medium) and an increase in brightness (Brightness = Medium -> High), then it is likely that their communication (COMMUNICATION) can improve from poor (P) to moderate (M).

\begin{matrix} (S e x = M) \land (F C = F) \land (I D = N) \land (O D = N) \land (S P = N) \\ \land (P D e = N) \land (P D = N) \land (I C F = K) \land (D I = N) \land (T V D = B) \\ \land (S i b = 2) \land (H D = N) \land (I Q = A) \land (E S D - e = C \to C) \land (C M = N \to R) \\ \land (c h r o m a_F_q u a n t i l e_b i n n e d = Low \to Medium) \\ \land (c h r o m a_D_q u a n t i l e_b i n n e d = High \to Medium) \\ \land (c h r o m a_A #_q u a n t i l e_b i n n e d = Low \to High) \\ \land (c h r o m a_B_q u a n t i l e_b i n n e d = High \to Low) \\ \land (L o u d n e s s = High \to Medium) \land (B r i g h t n e s s = Medium \to High) \\ \land (c h r o m a_F #_q u a n t i l e_b i n n e d = Medium \to Low) \\ \Rightarrow (C O M M U N I C A T I O N = P \to M) \end{matrix}

(11)

The transition from high to medium loudness, as seen in this rule, suggests that reducing the overall volume of the music during therapy can lead to improved communication abilities in visually impaired children. Loudness is a critical element in a child’s auditory environment; when the music is too loud, it may be overwhelming, potentially masking other sounds or making it difficult for the child to focus on social interactions or express themselves. High loudness levels might create an environment where the child feels less able to communicate effectively, either because the music is too dominant or because the high volume becomes a source of sensory overload.

By adjusting the loudness from high to medium, the therapy environment becomes less intense, potentially allowing children to feel more comfortable and confident in expressing themselves. A more moderate volume level may create an auditory space where the child’s own vocalizations and communication attempts are more noticeable and easier to make. This balance can help facilitate more engagement in the therapy sessions, allowing the child to be an active participant rather than feeling drowned out by the surrounding music.

The connection between loudness reduction and communication improvement may also highlight the importance of creating a sensory environment that supports rather than overwhelms. By carefully modulating the loudness, therapists can create sessions that are stimulating but not overpowering, allowing children to practice and improve their communication skills in a setting where their voices are not competing with excessively loud background music.

Table 5. Summary of example action rules.

Rule #	Stable Attributes	Flexible Attribute Changes	Decision Change	Support	Confidence
(3)	DI = N, US-b = C, Sex = M, SP = N, Sib = 2, IQ = A, TVD = B, PDe = N, OD = N, ID = N, ICF = K, PD = N, FC = F, HD = N	Movement of Hands: None → Frequent	MOTORICS: Moderate → Good	16	0.888
(4)	Sex = M, Childbirth = N, TVD = L, PD = N, HD = N, Family = P, DI = N, ID = N, PDe = N, PPR = V, Sib = 2, IQ = C, ESD-b = C, OD = N, FC = F, SP = N, ICF = K	Muscles Tightening = None → Frequent	MOTORICS = Poor → Moderate	16	0.623
(5)	PDe = N, ICF = K, ID = N, DI = N, OD = N, TVD = L, IQ = A, FC = F, HD = N, SP = N, PD = N, Childbirth = A, US-b = A, Family = F, ESD-e = C → N	chroma_E_quantile_binned = Low → High	CONC_ATT = None → Ssmall	10	0.714
(6)	SP = N, DI = N, PDe = N, US-b = A, OD = N, FC = F, HD = N, PD = N, Sex = M, TVD = B, Family = R, ESD-b = C, ID = N, MD-b = N, ICF = K, IQ = A	chroma_A#_quantile_binned = Medium → Low	CONC_ATT = Small → Moderate	8	0.533
(7)	HD = N, FC = F, IQ = A, Sib = 1, PD = N, ICF = K, DI = N, SP = N, US-b = A, Family = F, Childbirth = A, ID = N	Tonal_Noise = Tonal → Mixed	CONC_ATT = Moderate → High	6	1.000
(8)	PD = N, Sex = M, ID = N, HD = N, IQ = A, OD = N, FC = F, SP = N, US-b = A, PDe = N, DI = N, ICF = K, ESD-e = C → C	Timbre_Complexity = High → Low, Tonal_Noise = Tonal → Noisy, Movement of Hands = None → Frequent	MOTORICS = Poor → Good	6	0.600
(9)	ICF = K, Sib = 2, PD = N, Age = 10, PDe = N, ID = N, FC = F, Grade = 4, MD-b = D, HD = N, Childbirth = N, ESD-b = C, SP = N, OD = N, IQ = A, DI = N, Sex = M	Harmonic_Richness = Medium → High	EXPRESSION = Poor → Moderate	6	1.000
(10)	ESD-b = C, Childbirth = N, OD = N, FC = F, ICF = K, Sib = 1, MD-b = D, PD = Y, DI = N, HD = N, Sex = F	chroma_F_quantile_binned = Low → Medium, chroma_A#_quantile_binned = Low → High, Loudness = High → Medium, Harmonic_Richness = Medium → High	COMMUNICATION = Very Poor → Moderate	4	0.500
(11)	Sex = M, FC = F, ID = N, OD = N, SP = N, PDe = N, PD = N, ICF = K, DI = N, TVD = B, Sib = 2, HD = N, IQ = A, ESD-e = C → C, CM = N → R	chroma_F_quantile_binned = Low → Medium, chroma_D_quantile_binned = High → Medium, chroma_A#_quantile_binned = Low → High, chroma_B_quantile_binned = High → Low, chroma_F#_quantile_binned = Medium → Low, Loudness = High → Medium, Brightness = Medium → High	COMMUNICATION = Poor → Moderate	4	1.000

5. Discussion

The results of this study highlight the potential of using data-driven insights, particularly action rules, to personalize Typhlo music therapy for visually impaired children. By extracting key musical features and linking them to therapeutic outcomes, this research demonstrates the value of tailoring music selections to meet the unique needs of each child.

5.1. Impact of Action Rules on Therapeutic Outcomes

Our approach revealed several promising action rules that provide therapists with concrete strategies to improve therapeutic outcomes. For example, the identified rules suggest that increasing the tempo or adjusting the harmonic content of the music can positively influence concentration and attention (CONC_ATT) levels. By focusing on specific attributes such as chroma features or tonal balance, therapists can better align musical interventions with each child’s preferences and therapeutic needs.

5.2. Comparison with Existing Literature

While previous studies have demonstrated the effectiveness of music therapy in improving emotional, cognitive, and motor functions for visually impaired children [16], our study is among the first to apply a systematic action-rule-based approach to personalize these interventions. The ability to directly translate musical characteristics into actionable changes for therapy sessions represents a novel contribution to the field. This aligns with prior research highlighting the potential of personalized recommendation systems in healthcare settings, such as stress reduction through music [16], but extends the concept to a niche area—Typhlo music therapy for visually impaired children.

5.3. Strengths and Limitations

One of the strengths of this study is the detailed collaboration with domain experts, co-authors of this paper, which ensured that the selected musical features and therapeutic targets were aligned with real-world clinical practice. This collaboration was crucial in refining our dataset, selecting relevant attributes, and interpreting the generated action rules. Additionally, the use of a comprehensive dataset that includes both stable and flexible attributes provided a holistic understanding of each child’s characteristics and responses.

However, there are limitations to our approach. For example, while our method effectively captured changes in attributes like tempo, loudness, and harmonic content, it may not account for all the nuances of a child’s unique preferences or their day-to-day mood variations during therapy.

5.4. Implications for Practice

The insights gained from this study have practical implications for music therapists. By using action rules to guide music selection, therapists can make more informed decisions that are tailored to the evolving needs of each child. For example, knowing that a transition from tonal to mixed sound environments can enhance concentration may encourage therapists to vary the music used in sessions to maintain engagement. Additionally, the ability to focus on attributes like tempo or specific chroma features allows for a more targeted approach to therapy, potentially accelerating progress in key areas such as motorics or communication.

5.5. Future Research Directions

While our findings highlight the utility of action rules in identifying key factors like tempo, loudness, and harmonic content, we recognize that this work is only a first step. Future studies should aim to incorporate a broader range of features—such as emotional responses, cultural context, or therapist observations—and examine their effects over longer periods.

Future research could explore the integration of real-time data to enhance the responsiveness of the recommendation system. Incorporating physiological measures such as heart rate or galvanic skin response could provide a deeper understanding of how children respond to different musical elements in real time, further personalizing the intervention. Additionally, expanding the dataset to include more diverse musical genres and cultural contexts could broaden the applicability of the findings, allowing for more culturally sensitive therapy sessions. Given the subjective nature of music perception, future research could also investigate how to model individual preferences dynamically, using a combination of self-reported preferences, therapist observations, and physiological data to better capture these nuances.

Moreover, a deeper exploration of individual musical preferences could significantly enhance the personalization of therapy. While this study focused on attributes like tempo, tonality, and harmonic content, future work could delve into understanding each child’s unique musical tastes—such as their preferred genres, rhythms, or instrumental timbres. By identifying patterns in what specific children enjoy or respond to best, therapists could tailor sessions not only to the therapeutic goals but also to the music that resonates most with each child.

5.6. Conclusions

This study demonstrates the potential of action rules as a powerful tool for enhancing Typhlo music therapy through personalized music selections. By bridging the gap between data analysis and clinical practice, we provide a framework that can be adapted and refined to support the diverse needs of visually impaired children, contributing to more effective and engaging therapeutic experiences.

Parametrizing music is undeniably complex, as individual preferences and perceptions vary greatly. However, by identifying patterns in how specific features, such as tempo or harmonic content, influence therapy outcomes, we can begin to establish a framework for personalization. This framework acknowledges variability while offering therapists actionable insights tailored to individual children.

Beyond Typhlo music therapy, the approach of using action rules holds promise for broader applications in other therapeutic contexts. The ability to derive actionable insights from complex data can be extended to various therapeutic settings, such as cognitive–behavioral therapy, speech and language interventions, or physical therapy programs. By tailoring interventions to individual needs and preferences through data-driven insights, action rules can support more targeted and effective therapeutic outcomes across diverse populations and needs.

Author Contributions

A.B.: Conceptualization, Software, Investigation, Formal analysis, Validation, Visualization, Writing—original draft, Writing—review and editing. Z.W.R.: Conceptualization, Supervision, Data curation, Resources, Writing—review and editing. P.C.: Conceptualization, Data curation, Resources, Writing—review and editing. J.G.-C.: Conceptualization, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are provided by Pawel Cylulko, the co-author of this paper and Professor at the Karol Lipinski Music Academy in Wroclaw, Poland. The data are anonymized.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alvin, J.; Andrews, J. Music Therapy; Hutchinson London: London, UK, 1975. [Google Scholar]
Bunt, L.; Stige, B. Music Therapy: An Art Beyond Words; Routledge: London, UK, 2014. [Google Scholar]
Cylulko, P. Typhlo music therapy interventions supporting the motor development of a child with visual disability. Interdiscip. Context Spec. Pedagog. 2018, 22, 147–159. [Google Scholar] [CrossRef]
Aggarwal, C.C. Recommender Systems; Springer: Berlin/Heidelberg, Germany, 2016; Volume 1. [Google Scholar]
Schafer, J.B.; Konstan, J.; Riedl, J. Recommender Systems in E-Commerce. In Proceedings of the 1st ACM Conference on Electronic Commerce, ACM, Denver, CO, USA, 3–5 November 1999; pp. 158–166. [Google Scholar] [CrossRef]
Christensen, I.A.; Schiaffino, S. Entertainment Recommender Systems for Group of Users. Expert Syst. Appl. 2011, 38, 14127–14135. [Google Scholar] [CrossRef]
Blanco-Fernandez, Y.; Pazos-Arias, J.J.; Gil-Solla, A.; Ramos-Cabrer, M.; Lopez-Nores, M. Providing Entertainment by Content-Based Filtering and Semantic Reasoning in Intelligent Recommender Systems. IEEE Trans. Consum. Electron. 2008, 54, 727–735. [Google Scholar] [CrossRef]
Ivanova, M.; Raś, Z.W. Recommendation Systems in Healthcare. In Recommender Systems for Medicine and Music; Raś, Z.W., Wieczorkowska, A., Tsumoto, S., Eds.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2021; Volume 946, pp. 1–11. [Google Scholar] [CrossRef]
Tran, T.N.T.; Felfernig, A.; Trattner, C.; Holzinger, A. Recommender Systems in the Healthcare Domain: State-of-the-Art and Research Issues. J. Intell. Inf. Syst. 2021, 57, 171–201. [Google Scholar] [CrossRef]
Ragone, G.; Good, J.; Howland, K. How technology applied to music-therapy and sound-based activities addresses motor and social skills in autistic children. Multimodal Technol. Interact. 2021, 5, 11. [Google Scholar] [CrossRef]
Weller, C.M.; Baker, F.A. The role of music therapy in physical rehabilitation: A systematic literature review. Nord. J. Music Ther. 2011, 20, 43–61. [Google Scholar] [CrossRef]
Lu, G.; Jia, R.; Liang, D.; Yu, J.; Wu, Z.; Chen, C. Effects of music therapy on anxiety: A meta-analysis of randomized controlled trials. Psychiatry Res. 2021, 304, 114137. [Google Scholar] [CrossRef] [PubMed]
Wong, H.; Lopez-Nahas, V.; Molassiotis, A. Effects of music therapy on anxiety in ventilator-dependent patients. Heart Lung 2001, 30, 376–387. [Google Scholar] [CrossRef] [PubMed]
Nayak, S.; Wheeler, B.L.; Shiflett, S.C.; Agostinelli, S. Effect of music therapy on mood and social interaction among individuals with acute traumatic brain injury and stroke. Rehabil. Psychol. 2000, 45, 274–283. [Google Scholar] [CrossRef]
Cylulko, P. Therapy and upbringing of visually impaired children by the use of music. In Essays on Education Through Art Time Passing and Time Enduring; Samoraj, M., Ed.; University of Warsaw: Warsaw, Poland, 2002; pp. 256–260. [Google Scholar]
Cylulko, P.; Gladyszewska-Cylulko, J. A Model of Typhlo Music Therapy in Educational and Rehabilitation Work with Visually Impaired Persons. In Recommender Systems for Medicine and Music; Ras, Z.W., Wieczorkowska, A., Tsumoto, S., Eds.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2021; Volume 946, pp. 223–236. [Google Scholar] [CrossRef]
Ras, Z.W.; Wieczorkowska, A. Action-rules: How to increase profit of a company. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Lyon, France, 13–16 September 2020; Springer: Berlin/Heidelberg, Germany, 2000; pp. 587–592. [Google Scholar] [CrossRef]
Ras, Z.; Dardzinska, A. From Data to Classification Rules and Actions. Int. J. Intell. Syst. 2011, 26, 572–590. [Google Scholar] [CrossRef]
Mardini, M.T.; Raś, Z.W. Extraction of actionable knowledge to reduce hospital readmissions through patients personalization. Inf. Sci. 2019, 485, 1–17. [Google Scholar] [CrossRef]
Tarnowska, K.A.; Raś, Z.W.; Jastreboff, P.J. A Data-Driven Approach to Clinical Decision Support in Tinnitus Retraining Therapy. Front. Neuroinform. 2022, 16, 934433. [Google Scholar] [CrossRef] [PubMed]
Kleć, M.; Wieczorkowska, A. Music and Healthcare Recommendation Systems. In Recommender Systems for Medicine and Music; Ras, Z.W., Wieczorkowska, A., Tsumoto, S., Eds.; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2021; Volume 946, pp. 187–195. [Google Scholar] [CrossRef]
Shin, I.H.; Cha, J.; Cheon, G.W.; Lee, C.; Lee, S.Y.; Yoon, H.J.; Kim, H.C. Automatic Stress-Relieving Music Recommendation System Based on Photoplethysmography-Derived Heart Rate Variability Analysis. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Chicago, IL, USA, 26–30 August 2014; pp. 6402–6405. [Google Scholar] [CrossRef]
Ellis, D.P.W. Beat Tracking by Dynamic Programming. J. New Music Res. 2007, 36, 51–60. [Google Scholar] [CrossRef]
McFee, B.; McVicar, M.; Faronbi, D.; Roman, I.; Gover, M.; Balke, S.; Seyfarth, S.; Malek, A.; Raffel, C.; Lostanlen, V.; et al. librosa/librosa: 0.11.0. 2025. Available online: https://zenodo.org/records/15006942 (accessed on 31 May 2024).
Constantinescu, C.; Brad, R. An Overview on Sound Features in Time and Frequency Domain. Int. J. Adv. Stat. IT C Econ. Life Sci. 2023, 13, 45–58. [Google Scholar] [CrossRef]
Klapuri, A.; Davy, M. Signal Processing Methods for Music Transcription; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Jiang, D.N.; Lu, L.; Zhang, H.J.; Tao, J.H.; Cai, L.H. Music Type Classification by Spectral Contrast Feature. In Proceedings of the IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, 26–29 August 2002; Volume 1, pp. 113–116. [Google Scholar] [CrossRef]
Dubnov, S. Generalization of Spectral Flatness Measure for Non-Gaussian Linear Processes. IEEE Signal Process. Lett. 2004, 11, 698–701. [Google Scholar] [CrossRef]
Tzanetakis, G.; Cook, P. Musical Genre Classification of Audio Signals. IEEE Trans. Speech Audio Process. 2002, 10, 293–302. [Google Scholar] [CrossRef]
Ellis, D.P.W. Chroma Feature Analysis and Synthesis. 2007. Available online: https://www.ee.columbia.edu/~dpwe/resources/matlab/chroma-ansyn/ (accessed on 31 May 2024).
Benedict, A.C.; Ras, Z.W. Distributed Action-Rule Discovery Based on Attribute Correlation and Vertical Data Partitioning. Appl. Sci. 2024, 14, 1270. [Google Scholar] [CrossRef]
Benedict, A.; Ras, Z.W. Towards Scalable Action Rule Discovery: A Structured Vertical Partitioning Method. J. Intell. Inf. Syst. 2025, 1–31. [Google Scholar] [CrossRef]
Athanasopoulos, G.; Eerola, T.; Lahdelma, I.; Kaliakatsos-Papakostas, M. Harmonic organisation conveys both universal and culture-specific cues for emotional expression in music. PLoS ONE 2021, 16, e0244964. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of the correlation-based partitioning process for action-rule discovery. After extracting classification rules, flexible features are clustered using agglomerative clustering based on correlation. The purple block corresponds to the dendrogram level selection process (an example dendrogram provided in Figure 2). Here, at each dendrogram level, groups of related features are used to generate action rules in parallel. The green block represents the RSES-based action-rule generation step (expanded in Figure 3). Rule sets across levels are then compared to identify and return the most effective set.

Figure 2. Example dendrogram showing feature clusters derived from agglomerative clustering. Dashed red lines represent different cut levels used to produce feature groupings for action-rule generation. Each level corresponds to a different number of clusters.

Figure 3. Flowchart illustrating the generation of action rules using RSES classification rule outputs. Pairs of rules are evaluated for compatibility based on stable attributes and decision outcomes. If a valid flexible attribute change is detected and thresholds are met, an action rule is generated.

Table 1. Stable classification attributes.

Attribute	Description
Age	Child’s age: 7, 8, 9, above (A)
Sex	Male (M), Female (F)
Place of Permanent Residence (PPR)	Large town (L), Medium town (M), Small town (S), Village (V)
Information about Child’s Family (ICF)	Known (K), Unknown (U)
Family	Full (F), Partial (P), Reconstructed (R), Missing (M)
IQ	Average (A), Below Average (B), Above Average (C)
Foster Care (FC)	Family (F), Institutional (I)
Type of Visual Disability (TVD)	Blind (B), Residual Sight (R), Low Vision (L)
Other Dysfunctions (OD)	Yes (Y), No (N)
Hearing Disability (HD)	Yes (Y), No (N)
Intellectual Disability (ID)	Yes (Y), No (N)
Physical Disability (PD)	Yes (Y), No (N)
Other Dysfunctions and Illnesses (DI)	Yes (Y), No (N)
Siblings (Sib)	None (0), 1, 2, 3, More (M), No Data (?)
Prenatal Development (PDe)	Normal (N), Abnormal (A), No Data (?)
Childbirth	Natural (N), Caesarean (A), No Data (?)
Grade in School (Grade)	1, 2, 3, 4, 5, 6
School Program (SP)	Usual Core Curriculum (N), For Students with Moderate and Severe Mental Retardation (U)

Table 2. Flexible classification attributes.

Attribute	Description
Emotional and Social Development-Beginning (ESD-b)	Correct (C), Not Correct (N)
Emotional and Social Development-End (ESD-e)	Correct (C), Not Correct (N)
Motor Development-Beginning (MD-b)	Normal (N), Delayed (D), Accelerated (A)
Motor Development-End (MD-e)	Normal (N), Delayed (D), Accelerated (A)
Using Speech-Beginning (US-b)	Average (A), Below Average (B), Above Average (C)
Using Speech-End (US-e)	Average (A), Below Average (B), Above Average (C)
Music	The specific pieces of music the child is listening to (m01 to m34)

Table 3. Meta-decision attributes. Desired values as identified by the therapist are marked in bold.

Meta-Decision Attribute	Possible Values
Physiological Reactions (PHYS REACT)	Big (B), Moderate (M), Small (S), None (N)
Motorics	Good (G), Moderate (M), Poor (P), Very Poor (V)
Concentration of Attention (CONC ATT)	High (H), Moderate (M), Small (S), None (N)
Experience of Music (EXP MUSIC)	Big (B), Moderate (M), Small (S), Very Weak (V)
Communication	Good (G), Moderate (M), Poor (P), Very Poor (V)
Blindism	None (N), Small (S), Moderate (M), Big (B)
Expression	Rich (R), Moderate (M), Poor (P), Very Poor (V)

Table 4. Decision attributes. Desired values as identified by the therapist are marked in bold.

Decision Attribute	Possible Values
Sweating	High (H), Small (S), None (N)
Breathing	Rapid (R), Moderate (M), Slow (S)
Reddening	Big Red (B), Small Red (S), None (N)
Changing Body Position (BODY POS)	Frequent (F), Occasional (O), None (N)
Tightening of Muscles (TM)	Frequent (F), Occasional (O), None (N)
Movement of Legs (ML)	Frequent (F), Occasional (O), None (N)
Movement of Hands (MH)	Frequent (F), Occasional (O), None (N)
Performing Co-Movements (PCM)	None (N), Occasional (O), Frequent (F)
Expressing the desire to listen to music for longer (LLM)	Unambiguous (U), Ambiguous (A), None (N)
Suggesting a desire to shorten the time to listen to music (SLM)	Unambiguous (U), Ambiguous (A), None (N)
Attention to acoustic phenomena outside the music (AAP)	None (N), Occasional (O), Frequent (F)
Nucing melody of the presented music (NM)	Frequent (F), Occasional (O), None (N)
Performing the rhythm of presented music (PR)	Frequent (F), Occasional (O), None (N)
Rocking for presented music (RPM)	Frequent (F), Occasional (O), None (N)
Responding to changes in music (RCM)	Frequent (F), Occasional (O), None (N)
Communicating with words (CW)	Often (O), Rare (R), None (N)
Communication by gesticulation (CG)	Often (O), Rare (R), None (N)
Communicating with a mimic (CM)	Often (O), Rare (R), None (N)
Rocking	None (N), Occasional (O), Frequent (F)
Head Shaking	None (N), Occasional (O), Frequent (F)
Hands Waving (not in Front of Eyes)	None (N), Occasional (O), Frequent (F)
Hands Waving in Front of Eyes (HWFE)	None (N), Occasional (O), Frequent (F)
Eye Rubbing	None (N), Occasional (O), Frequent (F)
Expression/manifestation of emotions and feelings (EEF)	Rich (R), Poor (P), None (N)
Using gestures	Rich (R), Poor (P), None (N)
Using mimicry	Rich (R), Poor (P), None (N)
Vocalizing	Rich (R), Poor (P), None (N)
Verbalization	Rich (R), Poor (P), None (N)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Benedict, A.; Ras, Z.W.; Cylulko, P.; Gladyszewska-Cylulko, J. Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach. Information 2025, 16, 666. https://doi.org/10.3390/info16080666

AMA Style

Benedict A, Ras ZW, Cylulko P, Gladyszewska-Cylulko J. Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach. Information. 2025; 16(8):666. https://doi.org/10.3390/info16080666

Chicago/Turabian Style

Benedict, Aileen, Zbigniew W. Ras, Pawel Cylulko, and Joanna Gladyszewska-Cylulko. 2025. "Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach" Information 16, no. 8: 666. https://doi.org/10.3390/info16080666

APA Style

Benedict, A., Ras, Z. W., Cylulko, P., & Gladyszewska-Cylulko, J. (2025). Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach. Information, 16(8), 666. https://doi.org/10.3390/info16080666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Typhlo Music Therapy with Personalized Action Rules: A Data-Driven Approach

Abstract

1. Introduction

2. Background

2.1. Music Therapy for Visually Impaired Individuals

2.2. Action Rules in Therapy

2.3. Data-Driven Approaches in Therapeutic Settings

3. Materials and Methods

3.1. Collaboration with the Domain Expert

3.2. Data Overview

3.2.1. Stable Classification Attributes

3.2.2. Flexible Classification Attributes

3.2.3. Decision and Meta-Decision Attributes

3.3. Music Feature Extraction

3.3.1. Data Retrieval and Basic Information Extraction

3.3.2. Audio Feature Extraction

3.3.3. Extracting Musical Keys from Chroma Features

Examples

3.4. Data Cleaning and Preparation

3.4.1. Handling Missing Values

3.4.2. Discretization of Continuous Audio Features

3.4.3. Generate Classification Rules

3.5. Action-Rule Generation

3.5.1. Calculate Attribute Correlations

3.5.2. Perform Agglomerative Clustering

3.5.3. Generate Action Rules for Clusters

3.5.4. Combine Rules from Clusters

3.5.5. Evaluate and Select Optimal Partition

4. Results

4.1. Validating Movement as a Driver of Motoric Improvement

4.2. Tonal Preferences and Their Impact on Attention

4.3. Exploring the Role of Tonal and Mixed Sounds

4.4. Effects of Increased Harmonic Richness on Expression and Communication

4.5. Loudness Adjustment and Communication Improvement

5. Discussion

5.1. Impact of Action Rules on Therapeutic Outcomes

5.2. Comparison with Existing Literature

5.3. Strengths and Limitations

5.4. Implications for Practice

5.5. Future Research Directions

5.6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI