1. Introduction
Does music contain an informational structure that is interpreted uniquely by each individual? If such a structure exists, could it be analyzed from ontological perspectives to reveal natural or cultural characteristics? Understanding whether such informational structures exist is particularly relevant in the context of music therapy, where therapeutic effects are observed but not yet fully explained. Despite extensive empirical evidence supporting its benefits, the mechanisms underlying the therapeutic efficacy of music remain insufficiently characterized, particularly from a signal-structural and informational perspective. Most existing studies focus on subjective assessments, genre classifications, or emotional descriptors, leaving open the question of whether therapeutic music possesses objective, detectable structural properties.
Looking for empirical evidence on the therapeutic effect of music, researchers found positive outcomes for improving the well-being of patients suffering from dementia, autism spectrum disorder [
1], stress, and related illnesses [
2]. Some reviews described the therapeutic effect ([
3] or [
4]). Interesting empirical evidence has been found regarding melodies that had increased therapeutic effect compared to others. Among these findings, some melodies appear especially potent. One such example was the Mozart Sonata for Two Pianos in D major, K448 [
5].
To explain why certain melodies exert such effects, researchers have investigated the neural processes of music perception. Existing research explained this as a very complex process that involves brain areas responsible for interpreting sound (superior olivary complex and inferior colliculus) and also motor and pre-motor regions [
6,
7,
8]. When listening to music, the auditory cortex decodes the sound streams on a “segment-by-segment” and not on a “moment-by-moment” basis. There was also a possibility that the neural circuits for memory operate using firing-rate codes. The neurons in these neural circuits have the ability to synchronize with the fundamental frequency (carrier frequency) or the temporal structure (envelope) of a sound [
9]. Pitch processing occurs in a hierarchical manner. The auditory stream is perceived as a melody over a period of time [
10]. The following idea was established, suggesting that therapeutic effects may be tied to specific informational codes embedded in musical structure:
The frequency and temporal structure of the auditory information appears to contain a code that the auditory cortex has the ability to decode. The sound stream length is important for the brain to process its pitch and decode it as a melody.
If perception depends on how the brain decodes information, then therapeutic outcomes may also depend on individual differences. Continuing to investigate the mechanism by which music exerts a therapeutic effect, research papers were identified on the topic of personalized music playlist. The personalized playlists were able to enhance the efficacy of music-based interventions in clinical settings [
6]. Listening to pleasurable music activated brain regions associated with pleasure and reward-seeking behaviour. As such, this paper acknowledged that the type of music used in therapy should align with the therapeutic context, specific method, disorder or disease, and, crucially, the musical preferences of the listeners. Music preferences can be influenced by many factors, like demographics [
11], personality traits [
12] or social influences.
In parallel with clinical and neurobiological research, recent advances in computational music analysis have increasingly focused on quantifying musical features in order to predict measurable physiological or behavioural outcomes [
13]. Within the field of Music Information Retrieval (MIR), music is commonly represented through predefined acoustic descriptors capturing rhythm, spectral content, and temporal organization. Machine learning and neural network-based approaches have been successfully employed to map musical features to outcomes such as movement synchronization, arousal, or stress reduction. In these supervised frameworks, therapeutic effect is operationally defined through measurable responses, and model performance is evaluated using predictive metrics such as explained variance. While such approaches demonstrate strong predictive power within controlled experimental settings, they are inherently outcome-driven and depend on large labelled datasets. Moreover, the learned feature representations are often difficult to interpret, limiting insight into the intrinsic properties of the music itself [
14].
While personalized music playlist approaches have demonstrated benefits in enhancing therapeutic engagement and efficacy, they rely predominantly on subjective criteria such as self-reported preferences, demographic correlations, genre labels, or coarse emotional tags. These strategies provide limited insight into the intrinsic properties of the music itself and offer little capacity for systematic screening or comparison across musical pieces regarding their possible therapeutic impact. In particular, current personalization frameworks do not address whether certain musical sequences share objective structural characteristics that may contribute to their therapeutic potential, independent of individual taste or cultural context. This gap highlighted the need for analytical tools capable of identifying and comparing music based on its underlying informational structure, thereby complementing preference-driven approaches with an objective, structure-based perspective.
Recent advances in signal processing and non-linear analysis suggest that complex auditory stimuli can be characterized beyond conventional spectral descriptors, revealing persistent patterns related to structure, coherence, and organization. In this context, music may be approached not merely as a cultural artefact or aesthetic experience, but as a structured temporal signal whose internal organization can be quantitatively analysed. This perspective opens up the possibility that music associated with therapeutic effects may share common informational features that are independent of style, instrumentation, or cultural origin.
Several studies have hinted at this direction by reporting correlations between therapeutic outcomes and specific signal characteristics, such as 1/f noise behaviour [
15], fractal properties [
16], or reduced complexity relative to highly dynamic musical forms. However, a systematic and reproducible method for identifying and discriminating such structures across diverse musical corpora is still lacking. Addressing this gap requires a methodology capable of capturing global structural similarities rather than relying on local or surface-level features.
The present work adopted a complementary perspective. Rather than predicting therapeutic outcomes directly, it investigated whether music empirically associated with therapeutic effects exhibits identifiable intrinsic informational structures, independent of listener-specific or experimental conditions. This distinction motivates the development of an unsupervised, structure-oriented analytical method. This work assumed that therapeutic music is not defined primarily by genre or preference, but by persistent informational structures that are detectable in non-linear signal space.
In this work, we introduce the Discriminating Music Sequences Method (DiMuSes), a signal-based analytical framework that is designed to detect and compare structural similarities between music sequences in a reduced, non-linear feature space. The method combines robust signal descriptors with dimensionality reduction techniques, enabling the identification of persistent clustering patterns across large and heterogeneous datasets. Importantly, DiMuSes do not presuppose any specific musical genre, therapeutic label, or emotional annotation, relying instead on the intrinsic informational structure of the audio signals.
The central working hypothesis of this study is that music commonly associated with therapeutic effects exhibits stable structural regularities that can be operationally detected using appropriate signal representations. From this perspective, therapeutic efficacy is not attributed solely to subjective preference or contextual factors, but is hypothesized to correlate with persistent informational configurations embedded in the temporal organization of the signal. Ontological considerations are therefore introduced not as metaphysical claims, but as an operational assumption: if form carries information, then recurring therapeutic effects should correspond to detectable and reproducible signal structures.
To evaluate this hypothesis, DiMuSe is applied to multiple datasets, including curated collections of music labelled as therapeutic, control datasets of non-therapeutic music, and reference signals such as natural sounds and synthetic noise. The method’s robustness is assessed through repeated clustering analyses, external validation using independent studies, and comparative evaluation against known signal classes. The results demonstrate consistent separation between therapeutic and non-therapeutic music, as well as notable proximity between therapeutic music and natural sound patterns, suggesting the presence of shared structural characteristics.
The paper proposed the following research question:
RQ: Can objective, quantifiable ‘fingerprints’ related to therapeutic efficacy be extracted from the informational structure of music?
This study does not claim to establish causal mechanisms between signal structure and therapeutic outcomes. Rather, it provides a methodological contribution: a reproducible framework for identifying and comparing informational structures in music signals, offering new avenues for research on music therapy, auditory neuroscience, and complex systems analysis. By shifting the focus from genre-based classification to structural coherence, DiMuSe lays the groundwork for future investigations into how organized temporal patterns may support cognitive and physiological regulation.
2. Structured Information in Non-Linear Time Series like Music
This section introduces the concept of music’s informational structure, exploring it through the ontological perspective of information. It further examines the distinction between linear and non-linear systems, offering a rationale for selecting Principal Component Analysis (PCA) as the core research method.
2.1. Ontology, Phenomenology, and the Informational Structure of Music
The present study adopted an informational perspective on music signals, treating them as structured temporal processes rather than solely as aesthetic or cultural artefacts. In this framework, information was not reduced to semantic content or symbolic meaning, but was understood operationally as the organization and persistence of patterns within a signal across time and scale. Such an approach was well established in signal processing, complex systems theory, and non-linear dynamics, where structure is inferred from statistical regularities, correlations, and reduced-dimensional representations.
Music signals exhibit multiple layers of organization, ranging from local spectral features to long-range temporal correlations. While conventional analyses often emphasize short-time descriptors, recent work [
17] has shown that global structural properties—such as fractal scaling, coherence, and reduced complexity [
18]—may play a significant role in how auditory stimuli are perceived and processed. These properties are particularly relevant in contexts where music is associated with regulation, relaxation, or therapeutic effects, suggesting that structure, rather than stylistic elements alone, may be a key factor.
From an informational standpoint, form is treated as a carrier of organization that can be detected independently of semantic interpretation. This does not imply any intrinsic therapeutic quality of specific musical forms, but rather motivates the hypothesis that signals associated with similar functional effects may share common structural configurations. The role of analysis, therefore, is not to assign meaning to music, but to identify reproducible patterns that distinguish one class of signals from another in a data-driven manner.
In this work, informational structure is operationalized through a set of signal descriptors that are designed to capture both variability and organization in the temporal domain. Dimensionality reduction techniques are employed to reveal dominant modes of variation and to facilitate comparison across heterogeneous datasets. Importantly, these methods are used as exploratory operators: they do not impose predefined categories, but allow clusters and proximities to emerge from the data itself.
Ontology refers to the study of existence and the underlying structure of reality. These philosophical perspectives are important for understanding music as information, not just as sound. This subsection investigates the nature of information from a structural phenomenological standpoint, focusing on how information manifests and is experienced within conscious awareness. It does not aim to provide a comprehensive philosophical analysis of music, but rather to establish a conceptual basis for treating music as an information-carrying structure that can be analysed using computational methods.
Ontological considerations are introduced here in a minimal and pragmatic sense. The assumption that form carries information is not treated as a metaphysical claim, but as a working hypothesis guiding the analysis. If therapeutic effects are recurrent and robust across individuals and contexts, it is reasonable to expect that the corresponding signals will exhibit detectable regularities. The aim of this section is therefore to clarify the conceptual basis for treating music as an informational object that is amenable to structural analysis, providing the foundation for the DiMuSe methodology presented in the following sections.
2.2. Linear vs. Non-Linear System
Linear systems are fundamental in mathematics, engineering, and physics because of their predictable behaviour and simplicity in analysing them. A linear system respects the principles of
superposition and
homogeneity [
19]. This means that the outcome is proportional with the input-causing factors. The direct implication is that very small influences can be neglected as input factors, as their effect is very small.
From an application perspective, this means that the systems can be decomposed in parts, where each part can be investigated individually and summing the effect of the parts will result in the effect of the entire system.
The
superposition principle states that if a system responds to multiple inputs, the total response is equal to the sum of the individual responses to each input applied separately. Mathematically, if a system produces an output
for an input
and an output
for an input
, then the response to a combined input
is given by
This property ensures that linear systems exhibit additivity, meaning the system’s response to multiple stimuli is simply the sum of the responses to each stimulus individually.
The
Principle of Homogeneity states that if the input to a linear system is scaled by a constant factor, the output will be scaled by the same factor. If an input
produces an output
, then for a scaled input
, where
a is a constant, the system will produce
This principle ensures that linear systems exhibit scalability, meaning that multiplying the input by a factor results in a proportionally scaled output.
A system is considered linear if it satisfies both the superposition and homogeneity principles. Therefore, for any inputs
and
, and any scalar constants
a and
b, a linear system obeys
These properties are fundamental in analysing mechanical, predictable systems, such as electrical circuits, vibrations in structures, and signal processing. In contrast,
non-linear systems, including biological organisms or chaotic phenomena, do not obey these principles, leading to complex and unpredictable behaviours. As a consequence, the response of a non-linear system cannot be analysed as a sum of the responses of its components [
20]. The relationship between its components, together with the environment where the system is functioning, has to be considered for such an analysis.
A non-linear system is generally considered a black box, where no clear linear relationship can be established between the system’s input parameters and its output response. Unlike linear systems, which obey principles such as superposition and homogeneity, non-linear systems exhibit complex behaviours, including bifurcations, chaos, and emergent properties.
Since exact analytical solutions are often unavailable, data-driven identification methods are commonly used to model such systems. As music can be considered a non-linear system, such methods can be directly applied for analysing musical sequences. One effective approach would be the Principal Component Analysis (PCA) method for detecting patterns in the system’s behaviour.
Non-linear system identification is typically performed through experimental analysis, where the system’s response to a set of known inputs is recorded. A reliable methodology consists of the following steps:
Introduce input signals with known properties into the system.
Measure and record the system’s output responses.
Observe changes and correlations in system behaviour.
Machine learning techniques provide powerful tools for identifying non-linear system properties. Unsupervised neural networks offer a way to extract structure from unknown systems without requiring predefined output labels. These networks analyse patterns within the observed data and detect recurring features.
PCA is a mathematical technique that is used to reduce the dimensionality of data while retaining significant variance in the data set. It helps to identify the most relevant features of a non-linear system by
Transforming correlated variables into a set of linearly uncorrelated components.
Extracting dominant modes of behavior from system response data.
Improving computational efficiency by reducing complexity.
The mathematical representation of PCA is given by
where
X represents the original dataset,
W is the transformation matrix composed of eigenvectors, and
Z denotes the new transformed feature space.
To analyze unknown input data, the following approach could be taken:
- 1.
Compare previously recorded responses from empirical tests.
- 2.
Use PCA to extract dominant patterns and features.
- 3.
Use unsupervised neural networks like unsupervised clustering techniques to identify clusters of similar items.
- 4.
Compare results with existing models to estimate system characteristics.
Such a technique has been used in the current paper and has been described in the following subsections.
2.3. Existing Methods to Discriminate Non-Linear Time Series like Music
At the time of writing this paper, extracting information from music tracks was an established field known as
Music Information Retrieval (MIR), which has been widely used for various applications, including
context-based music retrieval (CB-MIR), artist identification, genre classification, query by humming, emotion recognition, instrument recognition, and music annotation [
21]. One major challenge in MIR is the classification of musical genres, where defining genre boundaries was an open problem [
22].
Closer to the current date advances have shifted MIR applications toward
machine learning (ML) approaches, where predefined datasets are used for training models in automatic classification and retrieval tasks. One example would be the
MediaEval Database for Emotional Analysis of Music (DEAM), which has been utilized for training deep learning networks aimed at mapping emotional content in music [
6].
Beyond conventional MIR techniques,
complex systems analysis provided additional insights into the discrimination of nonlinear time series, such as musical structures. Advances in
chaos theory have facilitated the identification of intricate patterns in dynamic systems, including applications in geophysical signal analysis. A similar approach has been successfully applied in geophysics, where non-linear signal processing tools—including PCA—were used to cluster precursor signals of earthquakes. This geophysical application directly inspired the present research: just as geophysical data contain hidden precursors for major events, therapeutic music may contain informational structures that precede and trigger therapeutic effects. Building on this precedent, our study adapts these nonlinear analysis methods to music, aiming to uncover intrinsic patterns that differentiate therapeutic melodies from other sequences. For instance, nonlinear signal-processing tools have been employed to discriminate signals generated by
geo-dynamic structures, aiding in the identification of earthquake precursor signals [
23]. One such approach involved
Principal Component Analysis (PCA) for extracting dominant patterns in highly complex data.
Given the complexity of non-linear systems, where input–output relationships cannot be decomposed into simple linear components, traditional statistical techniques struggled to reveal underlying patterns. To address this challenge, this paper leveraged Principal Component Analysis (PCA), a technique capable of identifying dominant structural variations within nonlinear datasets. By reducing dimensionality while preserving relevant features, PCA provided a systematic approach for detecting hidden patterns within music sequences.
The paper proposed the DiMuSe method, which is based on PCA, for identifying informational structures in musical sequences. By applying non-linear time series analysis techniques to music data, the aim was to develop a framework capable of detecting intrinsic patterns that may differentiate melodies with therapeutic properties from other sequences.
The researchers considered the proposed DiMuSe method, a different epistemological tool, and
Table 1 explains the difference between the Neural network approach (used in [
13]) and the proposed method.
3. DiMuSe Method
Building on the rationale from
Section 2, where non-linear analysis and Principal Component Analysis (PCA) were identified as suitable tools for uncovering hidden informational structures in music, this section introduces the
Discriminating
Music
Sequences method (DiMuSe). While existing approaches such as Music Information Retrieval (MIR) and machine learning–based genre detection have advanced the field, they often depend on predefined labels or handcrafted features. Such reliance makes it difficult to reveal the deeper, intrinsic patterns that may underlie the therapeutic effects of music. To address this limitation, the DiMuSe method was developed. It applies scalar evaluators drawn from diverse scientific domains, offering an unsupervised approach to analyzing musical sequences and enabling clusters to emerge naturally from their informational structure rather than from externally imposed categories.
3.1. Methodological Overview
The DiMuSe (Discriminating Music Sequences) method is designed as a data-driven framework for identifying structural similarities between music signals based on their informational organization. Rather than relying on genre labels, emotional annotations, or subjective assessments, the method operates directly on signal-derived descriptors, allowing patterns to emerge from the intrinsic properties of the data.
The methodological pipeline consists of three main stages: signal representation, dimensionality reduction, and structural comparison. Each stage is chosen to progressively reduce complexity while preserving information relevant to global organization and temporal structure.
In the first stage, audio signals are transformed into a set of numerical descriptors that capture both spectral and temporal characteristics. These descriptors are selected to balance sensitivity to variability with robustness to local fluctuations, enabling meaningful comparison across heterogeneous musical samples. All signals are processed using identical preprocessing steps to ensure consistency and reproducibility.
In the second stage, dimensionality reduction is applied to the resulting feature space in order to reveal dominant modes of variation. Principal Component Analysis (PCA) is employed as an exploratory operator, allowing correlated descriptors to be projected onto a reduced set of orthogonal components. Importantly, here, PCA is not used for classification, but as a means to expose latent structural relationships and facilitate visualization and clustering in a lower-dimensional space. The third stage involves the identification and analysis of clustering patterns within the reduced space. By examining the relative positions and groupings of different signal classes, DiMuSe enables the detection of persistent proximities and separations between musical corpora. Stability is assessed through repeated analyses and comparisons across datasets, ensuring that observed patterns are not artefacts of specific selections or parameter choices.
Throughout the method, no a priori assumptions are made regarding the therapeutic value of specific musical genres or styles. Instead, therapeutic relevance is evaluated post hoc by examining whether signals labelled as therapeutic exhibit consistent structural grouping distinct from control datasets. This approach allows the method to remain neutral with respect to interpretation while providing quantitative evidence for structural regularities.
The DiMuSe framework is thus intended as an exploratory and comparative tool rather than a predictive or diagnostic model. Its primary contribution lies in offering a reproducible methodology for mapping informational structures in music signals, creating a foundation for subsequent investigations into the relationship between signal organization and functional effects in music therapy and related domains.
3.2. Constructing the Principal Components Vector for DiMuSe
Evaluating structured sound sequences such as music—characterized by beat, harmony, timbre, and melody—using only conventional statistical evaluators proved insufficient, as these approaches do not preserve the sequential order of signals. Since harmony is established through ordered sequences, losing this structural information would compromise the analysis of musical features.
The novelty of the proposed method lies in the selection of the scalar evaluators.
The scalar evaluators employed in the DiMuSe method were not designed as music-specific descriptors, but as general-purpose measures for characterizing nonlinear time-series signals. Their selection was motivated by the objective of identifying intrinsic informational and dynamical properties of sound sequences, rather than extracting stylistic or semantic musical features tied to genre, instrumentation, or timbral identity.
From this perspective, musical signals were treated as instances of nonlinear temporal processes, similar in nature to other complex signals encountered in physics, biology, or geophysics. Consequently, the evaluators used in this study originate from statistical analysis, fractal geometry, nonlinear physics, and complex systems theory, and are applicable beyond the musical domain to a wide range of non-musical or non-auditory signals. This domain-agnostic design would allow DiMuSe to function as a structural screening tool, which is capable of revealing similarities among signals based on their underlying informational organization rather than on music-specific acoustic conventions.
Rather than relying solely on statistical measures, this paper analysed sound sequences from multiple scientific perspectives. To classify music sequences objectively, the researchers explored scalar evaluators available for PCA across various scientific domains. The primary validation criterion was that these evaluators should be sensitive to temporal variations in time series data.
In contrast to commonly used music information retrieval features—such as Mel-frequency cepstral coefficients, spectral centroid, or chroma vectors—which are primarily designed as short-term, perceptually motivated summaries and therefore exhibit limited sensitivity to nonlinear dynamical properties, the present study intentionally adopted scalar descriptors selected for their responsiveness to nonlinear signal behaviour. Specifically, the DiMuSe evaluators were chosen based on their ability to respond to non linear time-series dynamics as a first condition, and also capture long-range temporal dependencies, multiscale organization, and history-dependent structure. This design choice reflects the aim of investigating whether the data sequences exhibits distinctive informational patterns at a structural level that precedes sound-related features like perceptual categorization, cultural interpretation, or genre-based description.
However, in the future, the researchers intend to select a list of evaluators that are specific for sound sequences and add those as scalar evaluators alongside the DiMuSe evaluators to study their impact.
The scientific fields from which the suitable evaluators were sourced included
Statistics—standard deviation, autocorrelation, entropy measures.
Fractal Geometry—measures of self-similarity and complexity.
Nonlinear Physics—chaos theory and dynamical system indicators.
Complex Systems—network-based metrics and emergent pattern analysis.
The evaluators and their interpretation in the context of sound sequences are listed in
Table 2.
By integrating scalar evaluators from these diverse fields, the DiMuSe method aimed to detect intrinsic patterns in musical sequences, thereby enabling the clustering of melodies based on their informational structure.
3.2.1. Statistical Evaluators
Function med
The function med computes the arithmetic mean of the input signal and serves as a first-order statistical descriptor of its central tendency.
Theoretical background. Given a discrete signal
, the mean value is defined as
The mean characterizes the average level of the signal and is commonly used to describe its global offset.
DiMuSe implementation. In the MATLAB implementation, the mean is computed directly using the built-in function mean(A), applied to the original input signal prior to normalization. The returned scalar value corresponds to the estimator med.
Function sigma
The function sigma computes the sample standard deviation of the input signal and quantifies the dispersion of values around the mean.
Theoretical background. For a discrete signal
with mean
, the sample standard deviation is defined as
This formulation includes Bessel’s correction and provides an unbiased estimate of variance for finite samples.
DiMuSe implementation. In the MATLAB (
https://ww2.mathworks.cn/products/matlab.html) code, the standard deviation is computed using the default
std(A) function, which implements the sample standard deviation with
normalization. The resulting scalar value is returned as the estimator
sigma.
Functions M3 and M4
The functions M3 and M4 compute root-transformed central moment descriptors that characterize the global shape of the amplitude distribution of a signal. These evaluators are derived from the third- and fourth-order central moments but are modified to provide scale-consistent scalar measures suitable for multivariate analysis.
Theoretical background. Given a discrete signal
with mean
, the third- and fourth-order central moments are defined as
These moments capture asymmetry and tail heaviness of the distribution, respectively, but their magnitudes scale nonlinearly with signal amplitude.
DiMuSe implementation. In the MATLAB implementation, the central moments are first accumulated explicitly and normalized by the signal length. To obtain dimensionally comparable scalar descriptors, their absolute values are taken and root-transformed:
These transformations ensure that both descriptors have the same physical dimension as the original signal amplitude and reduce sensitivity to extreme outliers. The resulting scalars quantify the magnitude of distributional asymmetry (M3) and tail heaviness (M4), without encoding their sign.
3.2.2. Fractal Evaluators
The following fractal evaluators [
24] provided a quantitative characterization of fractal dimensions in signal analysis.
Function dnfft
The function dnfft implements a fractal estimator related to the Smoothing Dimension, intended to quantify multiscale roughness in nonlinear signals. The description below distinguishes the general principle of the estimator from the exact computational steps used in the DiMuSe MATLAB implementation.
Theoretical background. Let denote the original signal and let be its first derivative. The smoothing-dimension principle evaluates how the signal energy changes as the effective smoothing scale increases. In frequency-domain form, this can be expressed by considering the energy of the derivative after progressively restricting the bandwidth (equivalently, varying an effective cutoff frequency). Under fractal scaling assumptions, the resulting energy measure follows a power law with respect to the cutoff scale, and the slope in log–log coordinates defines a smoothing exponent.
DiMuSe implementation. The code computes the derivative and then uses the cumulative sum of the derivative power spectrum to implicitly represent a family of increasing cutoff frequencies: (i) The signal is truncated (
trunc). (ii) Differentiated numerically (
diff). (iii) Mean-centered. (iv) Transformed by FFT, and the one-sided power spectrum is formed as
. The cumulative energy curve is then computed as
and the analyzed quantity is
, which corresponds to the Euclidean norm of the derivative content up to an implicit cutoff determined by bin index
k.
Both the normalized frequency index and the cumulative energy are scaled to
, and a linear regression is performed in log–log space over a fixed index range that excludes the lowest-frequency bins:
where
l is the signal length after differentiation. The regression slope is taken as the smoothing exponent, and the corresponding fractal dimension estimate is computed as
.
The MATLAB implementation returns three scalar values (packed into a vector):
: the estimated smoothing exponent (slope of the log–log regression).
: the regression fit error returned by the function reg.
: the corresponding fractal dimension estimate computed as .
Function dnpfft
The function dnpfft estimates the smoothing dimension with extended frequency domain. This evaluator is conceptually related to dnfft, but instead of explicitly applying a bank of low-pass filters with selectable cutoff frequencies, it derives an equivalent multiscale relationship directly from the cumulative spectrum of the signal derivative.
Theoretical principle. After computing the first derivative of the input signal, the method analyzes how the accumulated spectral energy grows with increasing normalized frequency. Under fractal scaling assumptions, this growth follows a power law in log–log coordinates, whose slope is used as a smoothing exponent.
DiMuSe implementation (as used in this study). Given a signal x, the code performs (i) truncation (function trunc), (ii) mean removal, (iii) numerical differentiation (diff), followed by an FFT. The squared magnitude spectrum of the derivative is accumulated using a cumulative sum, producing a monotone energy curve. The regression is then performed in log–log space on the normalized pairs , where frequency and cumulative energy are normalized to .
Importantly, the “extended domain” aspect is implemented by fitting the regression over a broad range of FFT bins, excluding only the very-low-frequency and boundary bins. Specifically, the regression uses indices from to , where l is the signal length.
The function returns three scalar outputs packed into a vector:
: the estimated slope (smoothing exponent) from log–log regression.
: the regression fit error returned by reg (implementation-dependent).
: the corresponding fractal dimension estimate computed as .
Function dipfft
The function dipfft estimates a related fractal exponent that incorporates the energy accumulation of the derivative spectrum. In contrast to dnpfft, which accumulates squared spectral magnitude and then takes a square root, dipfft accumulates the magnitude spectrum directly, yielding a different scaling sensitivity.
Theoretical principle. The method assumes that the cumulative spectral magnitude of the signal derivative follows a power-law scaling with normalized frequency. The slope of the log–log regression provides an exponent , which can be mapped to a fractal dimension estimate via .
DiMuSe implementation. The effective parameters are fixed internally: min is set to 20 and max is set to , where l is the signal length (the input arguments are overwritten). The processing steps are truncation, mean removal, numerical differentiation, FFT of the derivative, magnitude extraction on the positive-frequency half-spectrum, cumulative sum, and normalization to . A linear regression is then performed in log–log space over indices 20 to .
The function returns three scalar outputs:
: the estimated slope from log–log regression.
: the regression fit error returned by reg.
: the corresponding fractal dimension estimate computed as .
Function dntimp
The function dntimp provides a time-domain variant of the smoothing-dimension concept. Instead of forming cumulative quantities from the FFT spectrum directly, it constructs a sequence of progressively smoothed signals by truncating the derivative spectrum and measuring the resulting time-domain energy. The scaling of this energy across smoothing levels yields a fractal exponent.
Theoretical principle. If a signal exhibits multiscale (fractal-like) structure, then its energy after progressive low-pass smoothing follows a power-law scaling with the effective bandwidth. Estimating the slope of this relationship in log–log space provides a smoothing exponent that can be mapped to a fractal dimension.
DiMuSe implementation. Given a signal x, the code performs truncation, numerical differentiation, mean removal, and FFT of the derivative. It then generates nrpoints progressively smoothed versions of the derivative by retaining only the lowest FFT bins (and mirroring them to preserve a real-valued inverse FFT), where the initial bandwidth is set by nmin = 32 and the bandwidth increases geometrically across nrpoints steps. For each step, the inverse FFT is computed and the time-domain energy is measured as . The resulting pairs are normalized to and fitted by linear regression in log–log space.
The function returns three scalar outputs:
: the estimated slope from log–log regression.
: the regression fit error returned by reg.
: the corresponding fractal dimension estimate computed as .
3.2.3. Complex Systems Evaluators
Function hhcor
The function hhcor implements the height–height correlation estimator, which quantifies scale-dependent roughness by analyzing how the root-mean-square (RMS) of signal differences grows with the time lag. The description below distinguishes the theoretical definition from the exact procedure used in the MATLAB implementation adopted in DiMuSe.
Theoretical background. Given a signal
, the height–height correlation function is defined through the scaling relation
where
is the lag and
is the roughness exponent. Under standard fractal assumptions, the exponent is related to the fractal dimension by
DiMuSe implementation. In the MATLAB code (
hhcor(a,pas)), the lag is implemented as an integer sample shift
n. For each lag
n, the code forms two aligned subsequences
and
, computes the mean squared difference, and then takes the RMS:
The regressionis performed in log–log space using the pairs
. In particular, the implementation evaluates lags over
and estimates
as the slope of the linear regression returned by
reg.
The function returns three scalar values (packed into a vector):
: the estimated exponent (slope of the log–log regression).
: the regression fit error returned by reg.
: the derived fractal dimension .
Function hurst
The function hurst estimates the Hurst exponent using the classical rescaled range (R/S) method, which quantifies long-range dependence in a time series. The description below distinguishes the theoretical definition from the exact MATLAB implementation used in DiMuSe.
Theoretical background. For a discrete signal
, the R/S method evaluates, for each window length
, the rescaled range
where
is the range of the cumulative deviations from the mean within the window and
is the standard deviation within the same window. For many self-affine processes, the expected scaling follows
and the Hurst exponent
H is obtained as the slope of a linear fit in log–log space.
DiMuSe implementation. In the MATLAB code (
hurst(a,nnn)), the analysis is applied to the first difference of the input signal (
a = diff(a)). The signal is then partitioned into non-overlapping segments of length
i, and for each segment, the rescaled range
is computed. For each window length
i, the implementation averages
R and
S across all segments (
segments), and stores the mean ratio
. The window length is swept as follows:
where
N is the length of the differenced signal. A linear regression in log–log coordinates is then performed by
reg1 on the pairs
, and the slope is taken as the estimated Hurst exponent.
The MATLAB function returns three scalar values (packed into a vector):
: the estimated Hurst exponent H (slope of the log–log regression).
: the regression fit quality returned by reg1 (implementation-dependent).
: the derived fractal dimension estimate computed in the code as .
3.2.4. Physics Evaluators
Function ttt
The function ttt computes the (normalized) power spectrum entropy, which summarizes how uniformly the signal’s spectral power is distributed across frequency bins. The description below distinguishes the general definition from the exact computation used in the DiMuSe MATLAB implementation.
Theoretical background. Given a discrete power spectrum that is normalized to form a probability mass function
(i.e.,
and
), the Shannon entropy is
Low entropyindicates that spectral energy is concentrated in a small number of bins (e.g., near-periodic signals), while high entropy indicates a more uniform distribution of spectral energy (e.g., broadband noise).
DiMuSe implementation. In the MATLAB code, the normalized spectral distribution is provided as the vector
qq. The entropy is computed as
where terms with
are skipped (equivalently,
). The implementation then returns a
normalized entropy by dividing by
, where
nn is the number of frequency bins used in the distribution:
With this normalization,
, where 0 corresponds to power concentrated in a single bin and 1 corresponds to a uniform distribution across
nn bins. The MATLAB output
ttt corresponds to
.
Function sum_q
The function sum_q returns a scalar spectral magnitude summary computed from the discrete Fourier transform (DFT) of the normalized signal. The description below distinguishes the general spectral interpretation from the specific MATLAB implementation used in DiMuSe.
Theoretical background. A frequency-domain representation of a discrete signal can be obtained via the DFT, and global scalar summaries of the spectrum can be used to characterize the overall spectral magnitude level. Such summaries reflect how much spectral content is present in the analyzed frequency range, but they are distinct from physical energy measures unless power () and Parseval-consistent normalization are used.
DiMuSe implementation. In the MATLAB code, the input signal
A is first centered and normalized to unit standard deviation. The one-sided magnitude spectrum is then computed as follows:
where
and
n is the signal length. The scalar returned by this evaluator is as follows:
i.e., the sum of the scaled one-sided FFT magnitudes. In addition, the normalized spectral distribution used for entropy estimation is computed as
, ensuring
.
3.3. Validation of the DiMuSe Method Using Generated Signals
Before applying the method to real-world music, it was essential to test its reliability in controlled conditions. To achieve this, Matlab-generated arrays were created using rules that produced persistent, anti-persistent, fractal, and cumulative signals. This validation step ensured that the method could detect meaningful clusters even when applied to synthetic data, providing confidence that the same analytical approach could later reveal hidden patterns in actual therapeutic melodies.
3.3.1. Array Generation Rules
The first step involved generating a natural (Gaussian) random array consisting of 8192 scalars, referred to as originalArray. This array served as the foundation for creating additional signals, except for fractal signals.
Persistent Array Generation Function
A 10-iteration loop was implemented, where each iteration produced a new array based on the previous one, following the transformation rule:
The resulting arrays were labeled as pers1, pers2, …, pers10.
Anti-Persistent Array Generation Function
An initial 10-element array of percentages, persArray = [10%, 20%, …, 100%], was defined.
From the original generated array, a 10-iteration loop was conducted, where each iteration generated a new array using the following transformation rule:
Each newly generated array progressively lost a certain amount of information from the initial Gaussian array due to the applied transformation.
The resulting arrays were labeled as antipers1, antipers2, …, antipers10.
Fractal Array Generation Function
To generate fractal sequences, a 10-element array of predefined fractal dimensions, fractalDimArray = [1.1, 1.2, …, 2.0], was initially constructed.
Without directly utilizing the original Gaussian array as input, 10 additional arrays were generated corresponding to the predefined fractal dimensions in
fractalDimArray using the
Takayasu method [
25]. These transformations ensured the generated signals exhibited fractal-like characteristics. The resulting arrays were labeled as df1.1, df1.2, …, df2.0.
Cumulative Sum Array Generation Function
A single array was generated, representing the cumulative sum over all elements in the initial generated array. This process is equivalent to computing the integral of the original array, providing insight into its cumulative behavior. The resulting array was labeled cumsum.
Differential Array Generation Function
To capture local variations in the signal, an array was generated that contained the difference between each pair of consecutive elements from the initial Gaussian array. This transformation effectively extracts high-frequency components, highlighting short-term fluctuations within the dataset. The resulting array was labeled diff.
3.3.2. Running DiMuSe on the Generated Arrays
Applying the DiMuSe method in Matlab on the 32 previously generated arrays produced a .csv file containing 32 vectors, with each vector consisting of 24 scalar values.
This file was used to compute the three Principal Component (PC) scalars for each vector, utilizing different combinations of generated array types. The results were subsequently visualized in scatter plots on the most representative PCA axis, PC1, PC2. The clustering was created using the k-Means Orange Data Mining Widget with 10 reruns and 300 maximum iterations. The optimal number of clusters was assesed using three indicators [
26]:
k number of clusters for the k-Means unsupervized clustering method;
Silhouette score (SIL), where values vary between −1 and +1, with the best score being between 0 and 1;
Davies–Bouldin Index (DBI), where values vary between 0 and ∞, best score being closer to 0.
The
Silhouette Score (SIL) [
27] quantifies how similar a point is to its own cluster compared to other clusters, the interpretation of the values is:
+1 means excellent cluster assignment.
0 means overlapping clusters.
<0 means misclassified points.
The
Davies–Bouldin Index (DBI) [
28] measures how compact clusters are relative to how far apart they are. The values can be interpreted as follows:
The results were presented in
Table 3.
The best clustering option we identified was k = 5, which matches the number of classes. Clusters were clearly divided in the plot, and the SIL and DBI indicators were well in the optimal values ranges.
These results validate that the DiMuSe method successfully clusters time-series signals based on the selected estimators applied in the PCA analysis.
3.4. Compiling a List of Sound Sequences for Clustering with the DiMuSe Method
Having confirmed that DiMuSe could distinguish between artificially generated signals, the next step was to apply it to actual music and sound recordings. To test its scope, a diverse dataset was assembled, covering multiple genres (classical, religious, therapeutic, folk, acapella, noise, and natural recordings). This diversity was intentional: by contrasting empirically validated therapeutic music with everyday or culturally significant sounds, the analysis could determine whether therapeutic pieces express unique informational structures that differentiate them from other categories.
A diverse dataset of sound sequences and melodies spanning various musical genres was created. The following genres were selected:
For musical pieces, approximately one minute of each selected song was recorded from the online platform YouTube. The noises and environmental sounds were sourced from the royalty-free Pixabay platform.
This process resulted in the creation of 32 MP3 files, organized into six folders corresponding to the defined genres.
3.4.1. Selected Sound Sequences
Noise (Colored)
The
Table 4 contains colored noise sound sequences that are generated using specific algorithms, classified as noise in the scatter plot.
Noise (Analog)
The
Table 5 contains recorded noise sound sequences from antropic or natural sources, classified as noise in the scatter plot.
Vocal (Acapella)
The
Table 6 contains recorded vocal performed music sequences, classified as vocal in the scatter plot.
Therapy
Table 7 contains music pieces empirically proven to have therapeutic effect, classified as Therapy in the scatter plot.
Classical
The
Table 8 contains music pieces from classical music, classified as Classical in the scatter plot.
Traditional
Table 9 contains music pieces from traditional folk music, instrumental only and also voice and instrumental, classified as Traditional in the scatter plot.
Religious
Table 10 contains music pieces from important five religions we identified, classified as Religious in the scatter plot.
3.5. Running DiMuSe on the Selected Sound Sequences
DiMuSe has been implemented in Matlab to generate vectors for each sound sequence, followed by PCA reduction and visualization using Orange Data Mining. This process made it possible to map musical sequences into a three-dimensional space where clustering could emerge naturally. The goal was not only to confirm whether genres separated meaningfully, but also to test whether validated therapeutic music clustered together, indicating shared structural patterns.
Prior to vectorization and evaluation by the DiMuSe framework, all sound sequences underwent a standardized pre-processing pipeline implemented in Matlab, which was designed to ensure signal comparability while preserving intrinsic structural and dynamical properties. For each audio file, the silent part was trimmed from the beginning and end of the sound sequences. Then, they were converted to single-channel (mono) time-domain signals by averaging the stereo channels when necessary, thereby eliminating spatial effects that were not relevant for structural analysis. The resulting sequences were normalized to unit standard deviation and mean-centered. To reduce data dimensionality while preserving global temporal structure, the effective sampling frequency was subsequently reduced by a factor of 100 using decimation, followed by a final centering step.
Following segmentation, the audio signals were normalized by removing the mean value and scaling by their standard deviation, thereby reducing the influence of recording gain and overall loudness. The normalized signal was subsequently transformed to its absolute value representation, emphasizing amplitude variations independently of signal polarity. To stabilize short-term fluctuations and highlight slower structural variations, a moving-average smoothing filter was applied. The resulting signal was then mean-centered again and temporally decimated, substantially reducing data dimensionality while preserving global temporal organization. A final mean-removal step was applied after decimation to eliminate residual offsets. The resulting pre-processed signal constituted a continuous, normalized nonlinear time series suitable for the computation of scalar evaluators and subsequent principal component analysis.
The functionalities were described in
Table 11.
The first step in executing the DiMuSe method was to generate evaluation vectors for each selected sound sequence using the Matlab function DimuseVectorise. This function produced the file Dimuse-MusicSequence-Mar25.csv, containing 32 rows of data.
Each row included
To achieve a representation similar to that used for the generated signals, a .csv file combining all melody classes was created. The DimusePCA function was applied to this dataset, extracting the three Principal Component (PC) scalars for each vector, with a conserved data variance of 71.52%.
The resulting file was imported into Orange Data Mining, where the k-Means unsupervised clustering method was applied, with 20 reruns and 300 maximum iterations. Similar to the method used for the Generated Signals (
Table 3), the SIL and DBI values were used, along with visual evaluation for choosing the optimal clustering
k value.
The
Table 12 contains combined sound sequences classes PCA plots.
The SIL and DBI indexes combinations revealed that the clustering was quite stable in all clustering configurations investigated in
Table 12. The researchers calculated the standard deviation for each of the values. The result from
Table 13 suggested that the optimal clusterization k factor was four.
An optimal clusterization would mean that most of the therapeutic class sounds would be contained in a single cluster, and none or a minimal number of other sound sequences for other classes. The k = 4 resulting cluster contained all therapeutic pieces alongside a small number of natural or noise-based sequences (e.g., pink noise and ocean sounds) that are already known in the literature to promote relaxation. Also, 2 sequences from the religious class were grouped in this cluster.
This outcome strengthened the interpretation that DiMuSe can indeed isolate structural properties associated with therapeutic effectiveness, while also highlighting possible overlaps with certain naturally soothing sounds.
3.6. Validating the Base Set as the Optimal Sound Sequences Configuration
The researchers intended in the current paragraph test other combinations of sound sequences in the PCA, in order to see if some other cluster configuration would be more appropriate. The optimal configuration would include most of the
therapeutic class sounds. Such a configuration was already identified in the
Table 12 for k = 4, and the intent was to try to find another similar cluster configuration by altering the base set of sound sequences.
First option, the sound sequences from
Table 14 were added to the base set of sounds before performing the PCA.
The clustering used the same values for k-Means configuration as
Table 12, and the results are displayed in
Table 15.
After this step, the base set sound sequences configuration identified remained as the optimal choice, with k = 4 clustering.
Another sound configuration was tested by removing one sound from each of the classes, and the results are displayed in
Table 16. The removed sounds were
Classical class, Bach.
Noise class, noise_blender.
Religious class, Aramaic.
Therapy class, heart.
Traditional class, folk-bg-instr.
Vocal class, Arabic.
Again, the base set sound sequences configuration identified remained as the optimal choice, with k = 4 clustering. The cluster configuration was saved under the name Base Set 4.
The next step in the DiMuSe process tried to identify an indicator that could reveal, for a new sound sequence, if it would be recommended to be used for its therapeutic effect.
The process consisted of vectorizing the sound sequence and than projecting it on the Base Set 4 PCA plot. This would reveal the projected sound sequences characteristics regarding similarity with the therapeutic cluster C2.
3.7. Correlation of DiMuSe Results Against Other Research
The clustering results reveal distinct groupings of sound sequences based on their intrinsic informational structures. However, to determine whether these classifications align with empirical observations of therapeutic music, it was necessary to compare our findings with previous studies. By evaluating established research on the effectiveness of specific musical pieces in stress reduction and relaxation, this paper assessed the accuracy and practical relevance of the DiMuSe method.
At the time of writing, two relevant studies were identified in which researchers conducted clinical tests to evaluate the immediate stress-relieving effects of specific songs, while specifying the song titles in the papers. The papers concluded the research with a ranking of the songs analyzed. Using the same songs, a therapeutic effect ranking method was defined on the DiMuSe results as follows.
- 1.
From each song, a 1 min. segment was extracted. Using the DiMuSe functions (
Table 11), themusic segments were vectorized and projected onto the DiMuSe PCA plane generated from the
base set. The scope of this step was to visualize how the different song segments were distributed in the DiMuSe
Base Set 4 clusters.
- 2.
For each of the Base Set 4 resulting point cloud clusters, a centroid point was calculated, resulting from the average of the coordinate values on each of the PCA axis, PC1, PC2, and PC3 for all the cluster points, creating a central point for each cluster: C1, C2, C3 and C4.
- 3.
As the
C2 cluster contained all of the songs considered to have therapeutic effect, the centroid point
C2_center was used as reference for creating the therapeutic effect ranking. The distances between each song sequence point and the
C2_center point were calculated in the PC1, PC2, and PC3 space using the following formula:
- 4.
A song ranking (songRank) was calculated for each song by identifying the maximum value from the list of songDistance of all the songs under evaluation, adding 1 to this value, and decreasing it by the corresponding songDistance. A ranking value was obtained between the songs, where the minimum rank is 1. The resulting ranking values were sorted in an ascending order, and displayed in a graphical chart.
3.7.1. The Korean Population Study
In the first study [
30], the researchers selected five of the most effective stress-relieving songs identified in prior research, using a trained neural network to assess their impact on Korean participants. Among the songs analyzed in the study, “Cozy Arirang” exhibited the strongest therapeutic effect, as displayed in
Table 17.
Applying the DiMuSe method to the songs used in the Korean paper [
30] by selecting one-minute sound sequences revealed that the song “01-Cozy Arirang” had the highest DiMuSe calculated therapeutic score. The result matched the result from the Korean paper by identifying the same song with the highest therapeutic effect. The researchers could not match the other songs’ therapeutic rankings with the DiMuSe calculated ranking, as this information was not available.
3.7.2. The Mindlab Study
In the second study [
31], a team of musicians and music therapy practitioners from the Mindlab Research Institute (named in this paper as
Mindlab) created a song named
“Weightless” that was measured to have a high stress-decreasing therapeutic effect. The researchers measured the therapeutic effect through biometric sensors and also through users’ subjective ratings by comparing the effect of listening to 17 different songs and the effect of a massage session. The
Mindlab research conclusion was that the song
“Weightless” had a stronger therapeutic effect compared to the other songs and to the massage session.
The same 17 songs were analyzed through this paper’s proposed ranking method, results being displayed in
Table 18.
The researchers investigated if there was correlation between the identified ranking and the
Mindlab paper’s reported ranking. The Spearman’s Rank Correlation
and Kendall’s
values were extracted from the ranking values from
Table 19. Google Sheets was used to compute the two indicators,
Considering all the 17 songs from the Mindlab paper, the following indicators were calculated.
- 1.
DiMuSe ranking vs. Mindlab Biometric ranking: and .
- 2.
DiMuSe ranking vs. Mindlab Subjective ranking: and .
- 3.
Mindlab Biometric ranking vs. Mindlab Subjective ranking: and .
The ranking indicators reported a weak correlation between the DiMuSe ranking and the Mindlab reported rankings. However the correlation between the Mindlab rankings was moderate, and .
When comparing only the first 10 songs from the DiMuSe rating, the indicators changed:
- 1.
DiMuSe ranking vs. Mindlab Biometric ranking: and .
- 2.
DiMuSe ranking vs. Mindlab Subjective ranking: and .
- 3.
Mindlab Biometric ranking vs. Mindlab Subjective ranking: and .
For the first 10 melodies ranked by DiMuSe in increasing order, the correlation between the DiMuSe ranking and the Mindlab one was calculated as moderate, with a higher correlation when comparing it with the Mindlab Biometric score ranking.
3.7.3. Using the “Weightless” Song as a Reference
The “Weightless” creators mentioned during an interview that, by design, the song was intended to have an increasing therapeutic effect during the first part of the song and gradually decrease it towards the end. In this paper, the researchers intended to further test the DiMuSe method to see if such behaviour could be detected by creating the following protocol:
- 1.
Create a new projection plan named Second Set by adding to the PCA analysis all the songs used by the Mindlab research, except “Weightless”, meaning another 16 one-minute song sequences.
- 2.
Split the “Weightless” song in up to 1 min sequences (named chronologically from W01 to W08), convert them in DiMuSe vectors, and project those on both Base Set and Second Set PCA plans.
- 3.
Interpret the projected plots.
For all plots, the k-Means clusterization method was used, with k = 4, and the SLI and DBI values indicating stable clusters.
Comparing the
Table 20 projections on the
Base Set and
Second Set, it was observed that even when the clusters were modified, the projected points representing the “Weightless” sequences were grouped in clusters containing
therapy class sound sequences, with the exception of segment W01.
The DiMuSe calculated therapeutic effect was computed in
Table 21, between the “Weightless” segment points and the clusters containing
therapy class sound sequences, for
Base Set and
Second Set. For the
Base Set, cluster C1 was considered. For the
Second Set, clusters C2 and C3 contained
therapy class sound sequences, so both were considered in the evaluation.
The results suggested that the DiMuSe calculated therapeutic effect score could capture the composers’ intention to gradually build a relaxing effect when the segments were projected on the Base Set. This case study illustrated that DiMuSe has the potential not only to detect the song with the highest therapeutic properties in a group, but also to track intra-song variations in effectiveness.
4. Conclusions and Future Work
The aim of this paper was to adapt a mathematical framework that was originally used for discriminating geophysical nonlinear signals to the analysis of music sequences, with the goal of identifying potential therapeutic effects. To this end, we introduced the novel DiMuSe method, which applies 24 scalar evaluators from diverse scientific domains—including statistics, fractal geometry, nonlinear physics, and complex systems—to transform each music sequence into a multidimensional vector.
By applying Principal Component Analysis (PCA), these vectors were reduced to three principal components (PC1, PC2, and PC3) and projected into a three-dimensional space. This procedure enabled the identification of clusters of sound sequences that shared similar informational structures. Importantly, the therapeutic music sequences consistently grouped together, suggesting that DiMuSe can reveal structural signatures linked to therapeutic efficacy. The method’s use of centroid points and distance-based ranking provided a systematic way to evaluate therapeutic similarity.
This study demonstrated, for the first time, that a compact set of interdisciplinary, non-music-specific non-linear scalar evaluators—drawn from statistics, fractal geometry, non-linear physics, and complex systems—can consistently separate music’s empirically associated with therapeutic effects from heterogeneous sound categories in an unsupervised feature space. Unlike conventional Music Information Retrieval approaches, which rely on stylistic, perceptual, or genre-dependent descriptors, the DiMuSe method identifies clustering patterns based solely on the intrinsic informational structure of sound sequences. The repeated observation that therapeutic music forms a coherent cluster across multiple configurations, while non-therapeutic genres disperse more broadly, provides empirical support for the hypothesis that therapeutic relevance correlates with persistent structural regularities rather than stylistic attributes.
An important and nontrivial outcome of the analysis is the proximity of certain natural and noise-based signals—most notably pink noise and ocean sounds—to the therapeutic music cluster. This convergence is consistent with independent literature reporting the regulatory and relaxation effects of such stimuli and strengthens the interpretation that DiMuSe captures structure related to regulation and coherence, rather than culturally learned musical categories. The inclusion of a small subset of religious vocal pieces within this cluster further suggests that slow temporal organization, reduced complexity, and long-range correlations may transcend genre boundaries and function as shared informational markers.
Limitations. At the same time, the study’s limitations have significant implications for the interpretation of these results. Most notably, DiMuSe has not been validated in clinical or experimental therapeutic settings, and therefore no causal claims can be made regarding its therapeutic efficacy. The present findings should be interpreted as identifying structural correlates of music commonly labeled as therapeutic, rather than as predictors of individual therapeutic outcomes. Furthermore, the method deliberately excludes personalization factors such as listener preference, cultural background, or personal musical history—variables known to exert strong influence on therapeutic engagement and effectiveness.
This exclusion represents both a limitation and a boundary condition of the current work. If individual preference dominates therapeutic response, it is conceivable that subjective factors could override or mask the influence of intrinsic informational structure. However, the observed structural clustering despite diverse cultural origins and genres suggests that informational structure may act as a baseline constraint within which personalization operates, rather than being entirely overwhelmed by it. In this sense, DiMuSe should be understood as identifying a structural potential for regulation or relaxation, which may be necessary but not sufficient for therapeutic impact at the individual level.
From a theoretical standpoint, the contribution of this work lies in reframing therapeutic music analysis away from outcome-driven prediction and toward structure-oriented exploration. DiMuSe provides a reproducible, interpretable framework for detecting latent informational patterns in music signals without reliance on labeled datasets or black-box models. This positions the method as a complementary tool to both clinical music therapy research and supervised machine learning approaches, offering a means of systematic screening and comparison prior to personalized or experimental validation.
Future Work. Future research will focus on extending DiMuSe from a purely structural screening tool toward experimentally grounded and personalized applications while preserving its interpretability.
First, clinical and non-clinical validation studies are required. Controlled experiments involving physiological and behavioral measurements—such as heart-rate variability, galvanic skin response, or cortisol levels—should be conducted while participants are exposed to music ranked by DiMuSe. This would allow direct testing of whether proximity to the therapeutic cluster correlates with measurable regulatory responses.
Second, the integration of biometric feedback will be pursued through two complementary pathways. In one approach, biometric signals such as EEG, ECG, or respiration patterns would be analyzed independently and used as outcome variables or labels, enabling supervised learning that maps DiMuSe-derived structural features to physiological responses. In an alternative approach, selected biometric descriptors—such as EEG spectral entropy, coherence, or long-range temporal correlations—could be incorporated directly into the PCA feature space alongside audio-derived scalars, allowing joint clustering of stimulus–response pairs. Comparing these two strategies would clarify whether biometric data are better suited as explanatory targets or as structural extensions of the signal space.
Third, personalization strategies will be explicitly addressed by modeling deviations from the DiMuSe baseline. Rather than replacing structural analysis with preference-based selection, future work will investigate how individual responses diverge from structurally similar stimuli. This may involve computing subject-specific distance metrics within the PCA space or adapting clustering thresholds based on listener sensitivity profiles, cultural background, or therapeutic context.
Finally, future iterations of the method will explore the inclusion of sound-specific evaluators—such as rhythm stability or tonal coherence—alongside the current domain-agnostic scalars, in order to assess how perceptual features interact with the deeper informational structure. This hybrid approach could further clarify the relationship between signal organization, perception, and therapeutic regulation.
By combining unsupervised structural analysis with biometric validation and personalization layers, future developments of DiMuSe aim to bridge the gap between theoretical signal analysis and applied, data-driven music therapy practice.