MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals

Alorf, Abdulaziz

doi:10.3390/fractalfract10030199

Open AccessArticle

MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals

by

Abdulaziz Alorf

Department of Electrical Engineering, College of Engineering, Qassim University, Buraydah 52571, Saudi Arabia

Fractal Fract. 2026, 10(3), 199; https://doi.org/10.3390/fractalfract10030199

Submission received: 24 January 2026 / Revised: 4 March 2026 / Accepted: 6 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Fractals in Physiology and Medicine)

Download

Browse Figures

Versions Notes

Abstract

Sleep disorders are prevalent in the world, and they lead to severe health issues such as cardiovascular disease and cognitive disabilities. Conventional polysomnography-based diagnosis is based on manual EEG analysis under the supervision of trained specialists, which is time-consuming and may have inter-rater variability. Although the predictions of deep learning (DL) models on the task of sleep classification of EEG have been promising, they, in many cases, do not explain the multiscale, temporal dynamics that physiological signals are characterized by. In this work, a hybrid model that is a combination of CNN and multifractal detrended fluctuation analysis (MF-DFA) was proposed to detect localized temporal features and long-term fractal-based dynamics of single-channel EEG recordings. The performance of the suggested model was tested using two separate polysomnographic datasets: the CAP Sleep Dataset of five-class sleep disorder classification (Healthy, Insomnia, Narcolepsy, PLM, and RBD) and the ISRUC Sleep Dataset on the three-class subject-independent validation. In the CAP dataset, the framework had an accuracy of 86.38%. Cross-dataset transfer to the ISRUC Sleep Dataset, where only the classification head was fine-tuned on a small labeled subset while all feature-extraction layers remained frozen from CAP training, achieved 87.50% accuracy, demonstrating that the learned representations generalize across differing recording protocols, sampling rates, and diagnostic label spaces. The experiments of ablation proved the paramount importance of the MF-DFA features, and the lack of them led to low classification rates. The findings demonstrate the clinical feasibility of applying fractal analysis in conjunction with DL to detect sleep disorders in an automated, generalizable manner, suitable for use in large-scale monitoring and resource-starved clinical environments.

Keywords:

electroencephalogram (EEG); multifractal detrended fluctuation analysis (MF-DFA); sleep disorder classification; cross-dataset generalization; machine learning techniques; deep learning

1. Introduction

Sleep disorders have been considered significant issues of the population that concern the health of millions of people worldwide and result in serious health problems, including cardiovascular disease, diabetes, and cognitive disabilities. In addition to the clinical burden, sleep disorders are also creating a consistently growing international market in diagnostic and therapeutic interventions. Sleep disorders include multiple conditions that range from insomnia and narcolepsy to periodic limb movement disorder and REM sleep behavior disorder, since each disorder displays specific neurophysiological patterns and clinical outcomes [1]. Multiple factors combine to cause sleep disorders, including genetic factors, neurochemical imbalances, irregular sleep patterns, excessive screen time, and existing medical and mental health conditions [2]. The world experiences increasing rates of these disorders, creating urgent demand for automated diagnostic systems that use objective measurements to assess sleep disorders in addition to traditional polysomnography evaluations.

Epidemiological evidence indicates that the prevalence and presentation of sleep disorders vary significantly with age. Insomnia prevalence increases with advancing age, affecting up to 50% of elderly populations, while narcolepsy onset typically peaks in adolescence (10–20 years) [3]. PLM prevalence rises sharply after age 50, and RBD predominantly affects males over 60 years of age and is increasingly recognized as a prodromal marker of neurodegenerative disease [4]. These age-related variations underscore the importance of developing classification models that are robust across diverse demographic profiles.

Traditionally, polysomnography (PSG) was the only method for accurately diagnosing sleep disorders, a process that involved the manual analysis of electroencephalography (EEG) with the help of trained sleep specialists, who identified particular sleep stages [5]. Manual scoring of EEG signals, however, was time-consuming, subjective, and prone to inter-rater variability—hence, the need for automated classification systems [6]. It was demonstrated that the nonstationary and complex characteristics of EEG signals limited the performance of traditional machine learning (ML) models that utilized handcrafted features [7]. Additionally, EEG has some clinical uses, such in as the detection of seizures and brain–computer interface systems [8].

The recent progress in the field of deep learning (DL), especially convolutional neural networks (CNNs), was revealed to provide considerable benefits to the process of automated analysis of EEG through allowing hierarchical learning of features directly from raw signals [9]. CNN-based architectures have demonstrated strong performance in modeling EEG temporal patterns for automated sleep analysis tasks. While many prior studies focused on sleep stage classification, the present work instead targets sleep disorder classification, which requires distinguishing diagnostic groups rather than epoch-level sleep stages [10]. However, individual deep learning models may struggle to fully capture complex transitional EEG dynamics and long-range temporal correlations that characterize different sleep disorder categories [11]. In order to address these weaknesses, ensemble learning schemes were proposed, in which a group of models was trained together to take advantage of their complementary capabilities along with better generalization and robustness [12]. At the same time, fractals were also shown to be useful in the scale-invariant time properties of EEG signals, and MF-DFA, in particular, was useful in long-range correlations of physiological signals [13].

Although each of the CNNs, ensemble learning, and fractal features had been successful on an individual basis, their combination as a single network in terms of classifying sleep disorders had scarcely been studied. Previous research studies used either DL or fractal analysis separately, and they did not utilize the possibility of incorporating the learned spatial-temporal representations and explicit fractal-based time-dependent characteristics. Also, although hybrid feature-integration methods have been shown to improve the robustness of models, the combination of learned CNN features with fractal properties in EEG classification was not well examined. To close these gaps, the framework based on CNNs was utilized for extraction of the temporal features. Moreover, MF-DFA was integrated to extract fractal properties, allowing the model to capture multiscale temporal dynamics and long-range correlations that CNNs alone cannot represent. This combined methodology provided a complete representation of EEG signals with enhanced classification performance across a variety of sleep disorder categories, with person-independent capabilities adding to its generalization for various clinical situations.

The major contributions of this study are summarized as follows:

To classify sleep disorders, including insomnia, narcolepsy, REM behavior disorder, and periodic limb movement disorder, using MF- DFA extracted features, a CNN-based approach was proposed.
To analyze the impact of fractal analysis, the proposed framework was trained and evaluated on the CAP Sleep dataset, and its performance was systematically compared across settings that incorporated fractal features and those without them.
Analysis was conducted on the ISRUC dataset to perform cross-validation and prove the robustness of the model across various clinical environments.

The rest of the paper is structured as follows: Section 2 is the literature review. Section 3 describes the proposed CNN-based approach with fractal features extraction and model architecture, along with the datasets and experimental setup. Section 4 presents the results, and finally, Section 5 provides the conclusion.

2. Literature Survey

EEG-based systems automation has emerged as a dynamic research field, as a wide range of more traditional methods struggle to model multiscale temporal dynamics and scale-invariant structure simultaneously. Fractal EEG analysis captures long-range temporal correlations, while deep learning architectures capture localized waveform morphology. In this respect, recent research that has explored the fusion of fractal analysis with deep learning to improve sleep-related EEG classification serves as the basis for the following literature review.

Several studies have tackled the challenge of EEG-based classification across two research areas: sleep studies and neurological research. Ding et al. [14] developed a Dynamic Graph Attention Network for emotion recognition, achieving 94.00% accuracy on SEED but failing to generalize across different subjects. Rosenblum et al. [15] developed a fractal slope-based method to identify sleep cycles, achieving 91–98% REM detection accuracy across 205 healthy adults, although the method did not demonstrate complete subject-independent reliability. Madhav et al. [16] assessed three classification methods, which included ML, DL, and Neural Tangent Kernel methods for multiclass sleep disorder diagnosis using the CAP dataset. Shazid et al. [17] compared traditional machine learning methods with hybrid CNN–BiLSTM systems on a synthetic dataset, achieving 92% accuracy, although their study used non-physiological data, which limited clinical applicability.

The study by Hu et al. [18] developed ST-GATv2 for automated sleep-stage classification, achieving 89.0% accuracy on the MASS-SS3 dataset. Wang et al. [19] developed MGANet, which reached 87.3% accuracy on the SHHS dataset. Chen et al. [20] developed TS-AGCMM, which achieved 89.1% accuracy by accurately identifying N2 sleep but failed to distinguish N1 sleep. Hazra and Ghosh [21] used ExPANet for depression detection by analyzing EEG features using fractal and complexity measurements, achieving an F1-score of 89.1%. Duan et al. [22] used a GAT–LSTM combination to achieve 99.80% accuracy on the Bonn dataset for epilepsy detection, and Wang et al. [23] developed Seizure-NGCLNet, which used weakly supervised methods to detect seizures, achieving an AUC of 0.988, but both methods struggled with cross-dataset generalization.

It is important to note that most of these studies focus on sleep stage classification. In contrast, the present study addresses the more clinically oriented task of sleep disorder diagnosis, which involves distinguishing pathological conditions rather than sleep architecture stages. Satapathy et al. [24] used ISRUC data to test their dual-channel 1D CNN, which achieved sleep-specific EEG classification accuracy of 78–79%. Wadichar et al. [3] developed a hierarchical CNN-LSTM model on the CAP dataset, achieving 93.31% accuracy in CAP Phase B testing. However, the study faced challenges because it used single-center data and did not include nonlinear features. Urtnasan et al. [25] proved that single-lead ECG-based classification could achieve F1-scores of 95–99% through their CNN-GRU system. Kolhar et al. [26] developed a PSO-optimized LSTM with SHAP-based interpretability, achieving 97% accuracy on CAP data, but the system lacked external validation. Table 1 presents complete details of the studies, including their research methods, obtained results, and study limitations.

Recent progress in EEG-based classification has adopted transformer-based and self-supervised learning systems as its main foundation. The AttnSleep [27] and SleepTransformer [28] models achieved sleep stage classification by maintaining long-range sequential dependencies, which they tracked via self-attention mechanisms. Self-supervised pre-training approaches that use contrastive learning on unlabeled EEG segments have proven effective in reducing the need for extensive annotated datasets, according to [29]. The methods offer benefits for sequential modeling and label efficiency, yet require organizations to acquire extensive training data and advanced computing power. At the same time, their systems fail to detect multifractal sleep EEG patterns that exhibit scale-invariant behavior. The MF-DFA–CNN framework established essential long-range temporal relationships through its fractal analysis method, which conventional transformers and self-supervised systems failed to capture.

Synthesis Overview

The reviewed literature reveals several converging trends and persistent gaps. Methodologically, the majority of studies used single-model frameworks employing CNNs, LSTMs, or graph attention networks to process either temporal or spatial EEG dynamics, but rarely both simultaneously. Graph-based attention methods improved spatial dependency modeling [14,18,20] while convolutional architectures excelled at temporal pattern extraction [3,16,17,24]. The research needs to explore how modern deep learning systems work with fractal and nonlinear EEG descriptors, as existing studies [15,16] used shallow classifiers that could not handle large datasets.

The standard benchmarks, including MASS-SS3, ISRUC, and SHHS, enabled reproducible testing across multiple datasets. Yet, they consistently demonstrated that subjects could not be tested without their specific identities and failed to identify N1-stage patterns. The research studies reported high accuracy rates ranging from 85% to 99%, but most findings emerged from laboratory tests that lacked sufficient external validation. The research faced three main obstacles: difficulties maintaining class balance, the use of patient-specific modeling [23], and the lack of techniques for combining features from multiple domains.

The present study uses MF-DFA-based fractal descriptors together with CNN-based temporal features to create a system that tracks both nonlinear dynamics and temporal patterns, thereby fulfilling the needs of EEG multiscale representation and cross-dataset performance. Although several reviewed studies focus on sleep stage classification, they are included here due to their methodological relevance to EEG-based modeling frameworks. The present study, however, specifically addresses sleep disorder diagnosis, which differs from sleep stage classification in its clinical objective and label structure.

3. Methodology

This study presents an end-to-end pipeline for automatic sleep disorder classification using single-channel EEG recordings with multi-domain feature extraction and hybrid deep learning to discriminate among sleep disorders, as shown in Figure 1. The framework was designed for cross-dataset generalization, applying a unified strategy to the CAP Sleep Dataset (5 classes: Healthy, Insomnia, Narcolepsy, PLM, RBD) and the ISRUC Sleep Dataset (3 classes: Healthy, PLM, RBD). Raw EEG signals underwent filtering, normalization, and segmentation into fixed-length windows, followed by the extraction of statistical multifractal (MF-DFA) and Wavelet Scattering Network (WSN) features. A hybrid two-branch architecture integrated two complementary modalities: (1) raw EEG temporal features extracted by a one-dimensional EfficientNet-style CNN backbone (producing a 1024-dimensional embedding via dual global pooling), and (2) handcrafted multi-domain descriptors including MF-DFA generalized Hurst exponents, spectral band powers, and statistical features (34 features total) processed through a three-layer fully connected fractal branch (producing a 256-dimensional embedding). The two embeddings are concatenated (1280 dimensions) and fused via a learned attention gate (Dense(1280→320) → ReLU → Dense(320→1280) → Sigmoid, applied element-wise), which adaptively weights the contribution of each modality before the multi-layer classification head produces class probabilities. The complete architecture is described in Section 3.4.4.

3.1. CAP Sleep Dataset

The CAP Sleep Dataset contains five diagnostic groups: Healthy, Insomnia, Narcolepsy, PLM, and RBD. We used a balanced EEG with a 2 s epoch and 1024 samples at 512 Hz. To ensure class balance, 9000 segments per group were retained, yielding 45,000 samples in total. Partitioning was performed at the recording (subject) level; all segments from a given recording were assigned exclusively to one of the 70% training, 15% validation, or 15% test partitions, ensuring subject-disjoint splits with no data leakage across partitions. Stratified sampling was applied at the subject level to maintain approximate class balance within each partition. The EEG frequency bands relevant to sleep disorder classification and their characteristic ranges are as follows: delta (0.5–4 Hz), which dominates during deep NREM sleep (N3); theta (4–8 Hz), prominent during light sleep (N1, N2) and associated with drowsiness; alpha (8–13 Hz), observed during relaxed wakefulness and suppressed during sleep onset; beta (13–30 Hz), associated with wakefulness and cortical arousal; and gamma (30–100 Hz), linked to cognitive processing and occasionally observed during REM sleep. Different sleep disorders exhibit characteristic EEG patterns within these bands: insomnia patients frequently show elevated beta-band activity reflecting hyperarousal, narcolepsy is characterized by rapid transitions into REM with increased theta activity, PLM is associated with periodic cortical arousals visible in delta–theta transitions, and RBD exhibits increased phasic EMG activity during REM with altered alpha–theta dynamics [1,5]. The balanced class distribution of the test set across all five diagnostic categories is shown in Figure 2. All segments were filtered and normalized. When required, sampling with replacement was applied to maintain equal class counts. Our data preprocessing and balancing approach is inspired by [3] the CAP Sleep Dataset and the balanced and pre-segmented CSV files used in that study.

To clarify that the five classes in the CAP dataset, Healthy, Insomnia, Narcolepsy, PLM, and RBD, correspond to clinical sleep disorder diagnoses, not to individual sleep stages (e.g., N1, N2, N3, REM, Wake) as defined by the American Academy of Sleep Medicine (AASM), each recording in the CAP dataset is labeled according to the patient’s diagnosed sleep condition, and the classification task addressed in this study is the identification of sleep disorder categories from EEG segments, rather than the epoch-by-epoch staging of sleep architecture. This distinction is maintained consistently throughout the manuscript.

3.2. ISRUC Sleep Dataset and Cross-Dataset Transfer Protocol

The ISRUC dataset consists of polysomnographic recordings from 10 subjects spanning three diagnostic categories: Healthy, PLM, and RBD. EEG channels were automatically identified (C3, C4, F3, F4, O1, O2, Cz, Pz). When multiple relevant channels were available, they were averaged to produce a single composite EEG signal per subject, ensuring compatibility with the single-channel pipeline used for CAP. The signals were divided into 1024-sample epochs at 256 Hz (4 s duration) using a 50% overlap sliding window, and flat-line segments (standard deviation < 10⁻⁶) were discarded. Minority classes were augmented via additive Gaussian noise (σ = 0.01), random amplitude scaling (factor 0.9–1.1), and temporal shifts (±50 samples).

To evaluate cross-dataset generalization, we adopted a transfer learning protocol rather than training a new model from scratch. The CAP-trained model’s CNN backbone and fractal feature branch were frozen, preserving the feature representations learned during CAP training. Only the final classification head was replaced (from 5 outputs to 3) and fine-tuned. A small labeled ISRUC subset comprising no more than 50 segments per class was used for this head-only fine-tuning over 35 epochs; these adaptation samples were drawn exclusively from the training partition. The rationale for this protocol is threefold: (1) the CAP and ISRUC datasets differ in sampling rate (512 Hz vs. 256 Hz), recording equipment, and available diagnostic classes (5 vs. 3), necessitating minimal supervised adaptation of the decision boundary; (2) the frozen backbone ensures that the discriminative power of the transferred features, not ISRUC-specific overfitting, drives classification; and (3) the small number of adaptation samples (≤150 total) prevents the model from memorizing ISRUC-specific patterns, preserving the integrity of the cross-dataset evaluation. Data partitioning was performed at the subject level: subjects used for fine-tuning were excluded from the test set, ensuring that no subject appeared in both partitions. The held-out ISRUC test set comprised 360 segments (120 per class) from subjects not included in the fine-tuning. The detailed configuration and stratified splits for both datasets are presented in Table 2.

3.3. Preprocessing and Normalization

The complete preprocessing pipeline is summarized as follows. First, a 4th-order Butterworth band-pass filter with cutoff frequencies of 0.3 Hz and 100 Hz was applied to remove baseline drift and high-frequency noise while retaining physiologically relevant frequency content up to the gamma band. A notch filter centered at 50 Hz (Q = 30) was subsequently applied to suppress power-line interference. For the ISRUC dataset, signals were resampled from their native sampling rate to 256 Hz using polyphase anti-aliasing filtering to ensure uniform temporal resolution; CAP signals were retained at 512 Hz.

The procedure for artifact rejection involved calculating the standard deviation of each divided epoch and discarding segments with a standard deviation below 10⁻⁶, as this threshold identified both flat-line and disconnected-electrode artifacts. The system required complete automation because its developers designed it to function without human involvement, leading them to avoid independent component analysis (ICA) and manual artifact rejection methods. The researchers applied z-score normalization to each retained epoch using Equation (1). They replaced all remaining NaN and infinite values with zeros to ensure numerical stability during feature extraction and model training.

x n o r m = \frac{x - μ}{σ + 10^{- 10}}

(1)

The CAP dataset required EEG signals to be divided into 2-s segments, each containing 1024 samples recorded at 512 Hz. The ISRUC dataset required continuous EEG recordings to be divided into 4-s segments, each containing 1024 samples recorded at 256 Hz, using a sliding window with 50% overlap. The researchers used stratified sampling to achieve class balance while augmenting underrepresented classes in the ISRUC dataset with additive Gaussian noise (σ = 0.01), random amplitude scaling (0.9–1.1), and temporal shift (±50 samples).

3.4. Feature Extraction Pipeline

A multi-domain feature extraction framework was designed to capture complementary EEG characteristics across time, frequency, and nonlinear dynamics. For each preprocessed EEG segment, spectral wavelet, multifractal, and scattering-based descriptors were computed and concatenated into a single feature vector. For CAP, an initial ~200-dimensional feature set was produced; after removing zero-variance features, mutual information-based SelectKBest was fitted exclusively on the training partition to retain the top 120 features most dependent on the class label; the identical feature mask was then applied to the validation and test sets without re-fitting, thereby preventing any test-label information leakage. For ISRUC, the pipeline produced a fixed 120-dimensional feature vector directly to maintain compatibility across datasets. It should be noted that the 120-dimensional selected feature vector served as the basis for feature importance analysis, while a compact subset of 34 key descriptors (13 MF-DFA descriptors, 8 spectral band powers, 1 Higuchi fractal dimension, and 12 statistical measures) was used as the direct input to the fractal branch of the hybrid deep learning model (Table 3). This distinction ensures that the fractal branch receives only the most relevant and non-redundant descriptors for effective fusion with CNN-based temporal embeddings. All features were subsequently normalized using a robust scaler. Robust scaling is defined as in Equation (2):

x_{s c a l e d} = \frac{(x - m e d i a n (x))}{I Q R (x)}

(2)

to reduce sensitivity to outliers. NaN/Inf values were replaced with zeros before model training.

3.4.1. Statistical and Time Domain Features

About 25 features of the time domain were extracted to describe the amplitude distribution and temporal variability. This included mean and median; measures of dispersion and shape; percentile markers and interquartile range; energy measures; measurement of temporal variation: line length, zero-crossing rate; Hjorth parameters: activity, mobility, complexity, and peak-density measures to reflect transient activity.

3.4.2. Spectral Features

Spectral features were calculated on Welch’s PSD with a window size of 256 and a half-overlap. Ability to extract absolute and relative band powers was carried out at the delta (0.5–4 Hz) and theta (4–8 Hz) range and alpha (8–13 Hz), beta (13–30 Hz), and gamma (30–100 Hz) range. Ratios of band power were theta/alpha, delta/beta, alpha/beta, theta/beta, and delta/gamma.

Other descriptors included spectral edge frequencies of 50, 75, 90, 95, spectral entropy (seen in Equation (3)), spectral moments (peak frequency, mean/centroid frequency, variance), and spectral shape (roll-off, bandwidth, flatness). Spectral entropy was calculated as in Equation (3):

H = - \sum_{f} P (f) \log P (f)

(3)

where

P (f)

denotes the normalization power spectral density.

3.4.3. Multifractal Detrended Fluctuation Analysis (MF-DFA)

MF-DFA was used to describe the multiscale temporal variation of sleep EEG. Long-range temporal correlations in sleep EEG contain intricate variations across sleep disorder categories, which are difficult to model with conventional linear signal processing or purely data-driven deep learning models. MF-DFA offers a mathematically sound approach to the quantification of such multifractal behavior and was incorporated alongside the existing temporal measures in the proposed classification framework.

MF-DFA was applied to each EEG segment to obtain generalized Hurst exponents that characterize the level of correlation and temporal complexity in the signal. With an EEG time series x(t) of length N, the first step of the analysis was to remove the mean of the signal and build a cumulative profile transforming the signal into a random-walk-like process to which fluctuation analysis was applicable, as shown in Equation (4):

Y (i) = \sum_{k = 1}^{i} [x (k) - \bar{x}]

(4)

where

\bar{x}

is the mean of the EEG signal. This process eliminates constant trends and preconditions the signal for multiscale analysis.

The profile Y(i) obtained was then cut into non-overlapping segments of equal length s, with each segment being defined as shown in Equation (5):

N_{s} = ⌊\frac{N}{s}⌋

(5)

The overall signal length was not necessarily a perfect multiple of the scale s, so the segmentation process was repeated at the other end of the signal; 2N_s segments were obtained, and no section of the EEG epoch was lost. The scale parameter s was set to range between 10 and 100 to represent changes in time across various levels, and this was especially needed in differentiating short transient trends against long-term associations of various sleep disorders.

In every segment v, least-squares linear regression was used to eliminate local trends due to slow drifts or artifacts. Detrended variance of every segment was calculated as shown in Equations (6) and (7):

F^{2} (s, v) = \frac{1}{s} \sum_{i = 1}^{s} {[Y ((v - 1) s + i) - y_{v} (i)]}^{2}

(6)

for v = 1, 2, …, N_s, and

F^{2} (s, v) = \frac{1}{s} \sum_{i = 1}^{s} {[Y (N - (v - N_{s}) s + i) - y_{v} (i)]}^{2}

(7)

where y_v (i) is the fitted local trend in segment v and v = N_s + 1, …, 2N_s. The detrending stage separated the inherent variations of the EEG signal by removing trends of polynomials that might otherwise bias the scaling behavior.

The q-order fluctuation function was calculated as an average of the detrended variances to measure the size of fluctuations across all segments, as defined in Equation (8):

F_{q} (s) = {[\frac{1}{2 N_{s}} \sum_{v = 1}^{2 N_{s}} (F^{2} (s, v))^{q / 2}]}^{1 / q}, q \neq 0

(8)

In the case of q = 0, numerical instability was avoided by a logarithmic averaging, as shown in Equation (9):

F_{0} (s) = e x p [\frac{1}{4 N_{s}} \sum_{v = 1}^{2 N_{s}} l n (F^{2} (s, v))]

(9)

The parameter q was used to regulate the sensitivity of the analysis to changes of various magnitudes, where positive values underscore large changes and negative values underscore small changes. In the case of q = 2, the analysis was reduced to standard detrended fluctuation analysis (DFA).

The dependence of the fluctuation function on the scale s followed power-law behavior, as shown in Equation (10):

F_{q} (s) \propto s^{h (q)}

(10)

Hurst generalized h(q) was approximated by the slope of the log-log relationship between F_q (s) and s. The moment order q was varied from −5 to +5 (step = 0.5), and the scale s ranged from 10 to 100. The full multifractal spectrum h(q) was computed for all classes and is shown in Figure 3B, which reveals that inter-class separation is strongest at positive q values, particularly at q = 1 where the curves for narcolepsy (h(1) = 0.58 ± 0.11) and Healthy (h(1) = 0.72 ± 0.08) are most divergent (see Section 4.2.1 for all class-wise values). Among all q values, h(1) yielded the highest between-class effect size (Cohen’s d > 1.0 for Healthy vs. Narcolepsy) while maintaining the lowest within-class variance, making it the most stable and discriminative scalar summary of the multifractal spectrum. Accordingly, h(1) was selected as the primary MF-DFA feature for each EEG segment. This choice is consistent with established MF-DFA practice, in which h(q = 1) corresponds to the standard Hurst exponent characterizing the overall long-range correlation strength [13,15]. These MF-DFA representations were concatenated with CNN-based temporal embeddings and fused via an attention gating mechanism, enabling the model to jointly leverage scale-invariant fractal dynamics and learned temporal features for enhanced sleep disorder classification. The selection of h(1) over alternative multifractal descriptors—including spectrum width Δh, multi-q subsets, and the full h(q) spectrum—was empirically validated through a controlled ablation study reported in Section 4.2.2. The two-dimensional t-SNE projection illustrating the class-wise structure and separation achieved by combined MF-DFA and CNN-based features is shown in Figure 3.

The parameter selection process used established MF-DFA literature [13,15] for its two criteria, which needed to be verified through testing to achieve maximum separation between different sleep disorder groups. The complete set of MF-DFA parameter settings used in this study is summarized in Table 4.

3.4.4. CNN-Based Temporal Feature Extraction

Raw EEG signals were fed through a one-dimensional CNN to learn the discriminative temporal features to classify sleep disorders. EEG waves are nonstationary and oscillatory in nature, with transient waveforms and sudden changes in time with the changes in sleep disorders and pathological processes. The convolutional operations were used over the temporal dimension to allow the automatic learning of local temporal patterns, but on raw EEG data, which removed the necessity to craft features by hand.

Considering a one-dimensional EEG signal

x (t) \in R^{T}

, the convolutional feature maps were calculated as shown in Equation (11):

y_{k} (t) = \sum_{i = 1}^{C} \sum_{l = 0}^{L - 1} w_{k, i, l} \cdot x_{i} (t - l) + b_{k}

(11)

where w_k,i,l represents the convolution kernel,

b_{k}

is the bias term, and C represents the input channels. Then, nonlinear activation functions were put into place to improve representational capacity, which was expressed as shown in Equation (12):

z_{k} (t) = σ (y_{k} (t))

(12)

This allowed complex temporal EEG dynamics to be properly modeled.

In order to enhance training stability and generalization, batch normalization was used, which is defined as shown in Equation (13):

{\hat{z}}_{k} = \frac{z_{k} - μ_{k}}{\sqrt{σ_{k}^{2} + ε}}

(13)

with

μ_{k}

and

σ_{k}^{2}

as the batch-wise mean and variance, respectively.

The deeper temporal representations were obtained through the convolutional layers that were stacked with residual links and formulated as shown in Equation (14):

y = F (x) + x

(14)

This stored the low-level temporal information and facilitated gradient propagation.

The global average pooling was used to perform temporal aggregation in order to obtain fixed-length feature vectors, which were computed as shown in Equation (15):

f_{k} = \frac{1}{T} \sum_{t = 1}^{T} z_{k} (t)

(15)

where T is the length of the time dimension following convolutional processing. The resulting CNN-based features represented oscillatory patterns, transient EEG events, and short- to medium-range interdependencies in the temporal interdependencies that are useful in the characterization of sleep disorders. The extracted features were further supplemented with multifractal features obtained via MF-DFA since CNNs mostly capture localized temporal variations, whereas MF-DFA explicitly quantifies long-range temporal correlations and multiscale dynamics that are inaccessible to convolutional operations alone.

The complete architecture of the proposed hybrid CNN + MF-DFA model is summarized in Table 3. The CNN branch (Modality 1) processes the raw 1024-sample EEG segment through a multiscale convolutional front-end (kernel sizes 7 and 31 for fine and coarse temporal resolution, respectively), three stacked residual blocks with Squeeze-and-Excitation (SE) channel recalibration (reduction ratio = 16), and dual global pooling (average + max, concatenated), producing a 1024-dimensional temporal embedding. The fractal branch (Modality 2) takes the 34-dimensional handcrafted feature vector (13 MF-DFA descriptors, 8 spectral features, 1 Higuchi fractal dimension, 12 statistical descriptors) through three Dense → BatchNorm → ReLU → Dropout layers, producing a 256-dimensional fractal embedding. Both embeddings are concatenated into a 1280-dimensional vector and passed through an attention gate: a two-layer MLP (1280 → 320 → 1280) with Sigmoid activation generates element-wise attention weights that modulate the fused representation, allowing the network to balance temporal and fractal contributions adaptively. The attended representation feeds a three-layer classification head (1280 → 512 → 256 → C, where C = 5 for CAP or 3 for ISRUC) with Softmax output.

All Conv1D layers used bias = False when followed by BatchNorm. The multiscale front-end (Stages 1a–1b) captured both fine-grained waveform morphology (kernel = 7) and broader oscillatory patterns (kernel = 31). The SE blocks performed channel-wise recalibration with a reduction ratio of 16. The fractal branch input comprised 38 features: MF-DFA generalized Hurst exponents and derived descriptors (13), spectral band powers and shape features (8), Higuchi fractal dimension (1), statistical descriptors (12), and segment-level Hurst estimates (4).

3.4.5. Wavelet Scattering Network (WSN) Features

For CAP, wavelet scattering features were computed for obtaining deformation-stable and translation-invariant representations. The scattering transform employed J = 6 and Q = 8 wavelets per octave, resulting in first- and second-order coefficients that represented the energy distribution and cross-scale modulation. Summary statistics were calculated per order and scaled to provide a ~55-dimensional vector, which was concatenated with other features and padded/clipped to provide a consistent dimension.

3.5. Software and Hardware Configuration

This section describes the computational environment employed to implement, train, and evaluate the proposed sleep disorder classification framework. All experiments were conducted on a GPU-enabled system using Python-based deep learning libraries. While the CPU handled tasks such as feature extraction and data preprocessing, the GPU was leveraged to accelerate model training and optimization, ensuring efficient execution of computationally intensive operations.

The framework’s computational efficiency was evaluated on the hardware in Table 5, with average inference times of ~12 ms per 2-s EEG epoch on an NVIDIA RTX 3090 and ~85 ms on a CPU (Intel Core i7), supporting real-time processing. The model contains 4.2 million parameters, which, together with its approximately 17 MB size, make it appropriate for edge and clinical applications. The system requires three essential elements for practical implementation: prospective clinical validation, regulatory compliance, and existing system integration.

4. Results and Discussion

4.1. Experimental Configuration

The suggested CNN framework and MF-DFA feature integration were tested on two stand-alone polysomnographic datasets to detect sleep disorders automatically. The CAP Sleep Dataset encompasses five diagnostic classifications: Healthy, Insomnia, Narcolepsy, PLM, and RBD, whereas the ISRUC Sleep Dataset includes only three classifications: Healthy, PLM, and RBD. All measures were calculated on stratified held-out tests.

The multi-domain features that were extracted by the framework included statistical measures, MF-DFA parameters, and nonlinear complexity descriptors. Training parameters were dataset-specific: CAP used a batch size of 32, number of classes = 5, learning rate = 2 × 10⁻⁵, 1 × 10⁻⁵ weight decay, 0.35 dropout, and 250 epochs of training, while ISRUC used the batch size 32, number of classes = 3, learning rate = 2 × 10⁻⁵, weight decay = 1 × 10⁻⁵, and dropout = 0.35, trained for 35 epochs. Both datasets were trained on NVIDIA computers. Table 6 provides a comprehensive summary of all hyperparameters and dataset configurations for both the CAP and ISRUC datasets.

4.2. CAP Sleep Dataset Performance

On CAP, the framework was found to have a test accuracy of 86.38% and macro-averaged precision of 88.05%, recall of 88.15%, and F1-score of 88.02%. Training leveled off at epoch 240 with validation accuracy 88.35% and F1-score 88.24%. The performance level was close to inter-rater variation (0.70- 0.85) in manual polysomnographic scoring. The training and validation accuracy curves across 250 epochs demonstrated stable convergence of the model, as shown in Figure 4.

Overall classification metrics on the CAP Sleep Dataset test set achieved an accuracy of 86.38%, as shown in Table 7. Per-class analysis results in Table 8 show that Narcolepsy had a 92.96% F1-score, 90.91% precision, 95.11% recall; Insomnia had a 92.53% F1-score, 91.27% precision, and 93.82% recall; and PLM had 85.04% precision, 78.31% recall, and 81.54% F1-score. The PLM and RBD, with reduced accuracy of 81.54% and 83.22%, reflected inherent EEG microarchitectural overlap in motor-related sleep disorders.

In addition to the metrics reported above, to further account for class imbalance, the following aggregate metrics were computed on the CAP test set: macro-averaged F1-score = 88.02% and Cohen’s Kappa = 0.8519. The Matthews Correlation Coefficient (MCC) was 0.8298, confirming strong multiclass discriminative performance beyond accuracy. The area under the receiver operating characteristic curve (AUC) for each class ranged from 0.964 (PLM, RBD) to 0.992 (Narcolepsy).

The analysis of the confusion metrics revealed that Healthy was the one with the true positive rate of 93.16%, with the major confusions being to PLM of 2.84% and to RBD of 2.13%, as shown in Figure 5. The highest level of accuracy was 93.82% in Insomnia. Narcolepsy had a recall of 95.11%. The bidirectional PLM–RBD confusion rates (7.16% and 6.84%) coincided with the epidemiological findings of 30–60% PLM prevalence in patients with RBD, which suggests similar motor manifestations to be verified by electromyography.

4.2.1. Interpretability of MF-DFA Features Across Sleep Disorder Classes

The discriminative power of MF-DFA features was evaluated via the generalized Hurst exponent h (q = 1) on the CAP test set: Healthy = 0.72(0.08), Insomnia = 0.65(0.09), Narcolepsy = 0.58(0.11), PLM = 0.69(0.10), and RBD = 0.67(0.10). Lower h(1) in Narcolepsy reflects fragmented sleep, while higher values in Healthy indicate more stable EEG dynamics, consistent with class separation observed in the t-SNE (Figure 3A). Figure 3B shows the full multifractal spectrum h(q) (q = −5 to +5), revealing stronger inter-class separation at positive q values and supporting true multifractal EEG dynamics. Spectral widths further differentiated classes: Healthy and PLM showed broad spectra (higher complexity), while Narcolepsy and Insomnia showed narrow spectra (lower complexity), demonstrating that MF-DFA captures physiologically meaningful, class-specific temporal patterns complementary to CNN features.

The sleep disorder classification model appears without MF-DFA in Figure 6. The training loss decreased from approximately 12 to 1 over 250 epochs, while the validation loss stabilized near 1, indicating mild overfitting yet preserved generalization. The system achieved training accuracy of approximately 85.12% and validation accuracy of 82.14%, while the F1-score improved from 0.55% to 0.8616%, demonstrating equal precision in measuring results and detecting items. The learning rate changed from 0.001 to 0.004 during the training process. The system exhibited minor overfitting, yet the validation results showed successful feature extraction because they created a reference point to assess the added value of MF-DFA features.

The graph in Figure 7 displays the performance results for each class when the MF-DFA method is not applied. The two sleep disorders Narcolepsy and Insomnia achieved performance above 90% in all evaluation criteria, whereas PLM and RBD presented greater obstacles that resulted in F1-scores of approximately 80%, indicating the difficulty in distinguishing these particular classes.

We employed McNemar’s test to evaluate the statistical significance of MF-DFA contribution by comparing the CAP test set predictions from the complete model that included MF-DFA features and the model that lacked MF-DFA features. The test produced a χ² of 14.73, with a p-value of 0.0001, indicating that the performance enhancement from MF-DFA features was statistically significant at the p < 0.001 threshold. The macro F1-score difference between the two configurations was 88.02% and 86.16%, exceeding the 95% bootstrap confidence interval margin, thus proving that fractal features had a significant impact on the study.

The testing results showed a macro F1 score of 88.02%, a weighted F1 score of 86.40%, and a Cohen’s Kappa value of 0.8519, which demonstrated a strong capacity to manage class imbalance on the CAP test set. The multiple classification system achieved strong performance with a 0.8298 MCC score, which showed effective distinction between multiple classes, while class-wise AUC values ranged from 0.964 for PLM and RBD to 0.992 for Narcolepsy, as shown in Figure 8.

4.2.2. Ablation Study: MF-DFA Feature Selection

Table 9 presents a systematic ablation across six MF-DFA feature configurations, isolating the contribution of each fractal descriptor while keeping all other pipeline components fixed. Configuration A, supplying only the generalized Hurst exponent h(1), achieved the highest accuracy (86.38 ± 0.42%) and F1-macro (88.02 ± 0.38%), confirming its role as the single most discriminative fractal feature. Removing all MF-DFA features (Configuration F) reduced accuracy to 82.14 ± 0.81% and F1-macro to 86.16 ± 0.74%, a 4.24% accuracy deficit and a 3.41% F1 deficit, quantifying the net contribution of fractal augmentation to the hybrid architecture. This degradation is statistically significant, consistent with McNemar’s test result reported in Section 4.2 (χ² = 14.73, p < 0.001). Substituting the multifractal spectrum width Δh as the sole fractal input (Configuration B) recovered only part of this gap (83.72% accuracy), indicating that Δh conflates positive- and negative-moment scaling behavior into a single summary statistic, thereby discarding the fine-grained temporal-correlation information encoded in h(1). Configurations C and D supplied richer moment-order subsets of three and 11 generalized Hurst exponents, respectively, yet both underperformed Configuration A (85.53% and 84.91%). The decline from C to D is consistent with a curse-of-dimensionality effect: the compact three-layer fractal branch (34 → 512 → 512 → 256) has limited capacity, and supplying 11 correlated exponents introduces redundant dimensions that dilute the gradient signal during training. Configuration E appended Δh to h(1); its accuracy (85.86 ± 0.45%) marginally exceeded the multi-q configurations but remained 0.52 points below Configuration A, confirming that spectrum width adds partially redundant rather than complementary information. These results, combined with the effect-size analysis in Section 4.2.1, where h(1) exhibited the largest between-class Cohen’s d (>1.0 for Healthy vs. Narcolepsy) and the lowest within-class coefficient of variation (CV < 0.09), provide converging empirical and statistical justification for selecting h(1) as the sole MF-DFA descriptor in the proposed pipeline.

4.3. Cross-Dataset Transfer Evaluation on the ISRUC Sleep Dataset

To assess the transferability of the representations learned on the CAP dataset, the CNN backbone and fractal branch were frozen, and only the three-class classification head was fine-tuned on a small labeled ISRUC subset (≤50 segments per class, drawn from subjects disjoint from the test set). This protocol bridges the domain gap arising from differences in sampling rate (512 Hz vs. 256 Hz), recording equipment, the number of diagnostic classes (5 vs. 3), and patient demographics, while ensuring that high-level feature representations, rather than ISRUC-specific memorization, drive classification.

On the held-out ISRUC test set (360 segments, 120 per class, from subjects unseen during fine-tuning), the framework achieved an accuracy of 87.50%, macro-precision of 87.90%, and macro-recall of 87.50%. The macro-averaged F1-score was 87.41%, weighted F1-score 87.55%, and Cohen’s Kappa 0.8125. The Matthews Correlation Coefficient (MCC) reached 0.8137. Per-class AUC values shown in Figure 9 were 0.967 (Healthy), 0.955 (PLM), and 0.999 (RBD), demonstrating robust class separation despite disparities in class distributions. The fact that a model trained on an entirely different dataset (CAP) achieved 87.50% accuracy on ISRUC with only classification-head fine-tuning on ≤150 labeled segments provides strong evidence that the MF-DFA and CNN features support cross-dataset generalization under limited fine-tuning. The adaptation converged at epoch 27, achieving the highest validation F1-score of 90.02%, as shown in Figure 10.

The ISRUC cross-dataset evaluation was conducted on 360 test segments (120 per class: Healthy, PLM, RBD), derived from polysomnographic recordings of 10 subjects. To assess statistical reliability, the test set was evaluated using 1000-iteration bootstrap resampling with replacement. The resulting 95% confidence intervals for the overall accuracy were [84.72%, 90.28%], for macro F1-score [84.15%, 90.67%], and for Cohen’s Kappa [0.7688, 0.8563]. These intervals indicate that the observed cross-dataset performance was statistically robust and not attributable to sampling variability. It should be noted, however, that the relatively small sample size of 10 subjects limits generalizability claims, and validation in larger, more diverse cohorts is recommended in future work.

Overall classification metrics on the ISRUC Sleep Dataset test set achieved an accuracy of 87.50%, as shown in Table 10. Class-wise results indicated RBD with 97.91% F1-score, 98.32% precision, 97.50% recall, Healthy with 84.50% F1-score, 78.99% precision, 90.83% recall, and PLM with 79.82% F1-score, 74.17% recall, and 86.41% precision, respectively, as shown in Table 11. The strong performance of RBD implied that ISRUC recordings had more discriminative EEG signatures than CAP.

The confusion matrix showed Healthy with 90.83% recall, 9.2% misclassified as PLM; PLM with 74.2% recall, 24.2% misclassified as Healthy; and RBD with 97.5% recall, 2.5% misclassified as PLM. The asymmetric Healthy–PLM conglomeration (9.2% versus 24.2%) indicated that PLM segments usually resembled the EEG features of Healthy controls during non-movement periods, as shown in Figure 11.

The ROC analysis revealed how different thresholds affected the ability to distinguish between classes. The ISRUC database showed one-vs-rest AUC results of 0.967 for Healthy, 0.955 for PLM, and 0.999 for RBD, which demonstrated high detection capabilities and low false identification rates, particularly for RBD. For the CAP dataset, the one-vs-rest ROC curves are shown in Figure 8, with multiclass AUC values ranging from 0.964 to 0.992, demonstrating effective discrimination between different disorders through balanced assessment methods. The hybrid CNN-MF-DFA model achieved better performance than standard CNNs by successfully capturing both short-term and long-term EEG patterns from sleep disorder classification across multiple datasets.

5. Conclusions

This study presented a hybrid EEG-based framework for automated sleep disorder classification that integrated CNN-based temporal feature learning with MF-DFA-derived multifractal descriptors to capture both short-term dynamics and long-range temporal correlations characteristic of distinct disorders. Evaluated on the CAP Sleep Dataset across five diagnostic categories, the model achieved 86.38% accuracy, 88.02% macro F1-score, and a Cohen’s Kappa of 0.8519, with narcolepsy showing the highest performance (F1 = 92.96%) due to its distinctive fractal properties. An ablation study supported by McNemar’s test (p < 0.001) confirmed the statistical significance of MF-DFA features, as their removal reduced validation accuracy to 82.14% and degraded per-class performance, underscoring their discriminative contribution. A systematic feature-selection ablation shown in Table 9 further demonstrated that replacing h(1) with alternative multifractal descriptors consistently degraded performance, while complete removal of MF-DFA features caused a 4.24% accuracy drop, confirming that the fractal contribution is feature-specific rather than merely additive. Cross-dataset transfer evaluation on the ISRUC Sleep Dataset, where the CAP-trained feature-extraction backbone was frozen and only the classification head was fine-tuned on ≤50 labeled segments per class from disjoint subjects, yielded 87.50% accuracy (95% CI: 84.72–90.28%), demonstrating that the learned representations generalize across differing acquisition protocols, sampling rates, and patient populations. From a deployment perspective, the framework achieved an average inference time of approximately 12 ms per 2-s epoch on an NVIDIA RTX 3090 GPU and 85 ms in CPU-only mode, with a compact size of 4.2 million parameters (~17 MB), supporting real-time and edge-based applications. Despite strong overall performance, discrimination between PLM and RBD remained comparatively lower, reflecting inherent EEG microarchitectural overlap. The study did not include age-stratified analysis due to limited demographic metadata, even though age is known to influence sleep architecture and the prevalence of sleep disorders. Furthermore, prospective clinical validation, regulatory considerations, and integration with clinical systems are necessary before real-world deployment. Future research should incorporate multimodal signals, larger demographically diverse cohorts, age-stratified evaluation, and explainable AI techniques to enhance generalizability, interpretability, and clinical applicability.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Restrictions apply to the availability of the data used in this study. The CAP Sleep Database was obtained from PhysioNet. (https://physionet.org/content/capslpdb/ (accessed on 15 October 2025)) and is available under the PhysioNet Open Data License. The ISRUC Sleep Dataset was obtained from the ISRUC-Sleep repository (https://sleeptight.isr.uc.pt/ (accessed on 15 October 2025)) and is available with the permission of the University of Coimbra. Both datasets are publicly accessible for research purposes.

Acknowledgments

The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2026).

Conflicts of Interest

The author declare no conflicts of interest.

References

Ramos, A.R.; Wheaton, A.G.; Johnson, D.A. Sleep Deprivation, Sleep Disorders, and Chronic Disease. Prev. Chronic Dis. 2023, 20, 230197. [Google Scholar] [CrossRef]
Medic, G.; Wille, M.; Hemels, M.E.H. Short- and long-term health consequences of sleep disruption. Nat. Sci. Sleep 2017, 9, 151–161. [Google Scholar] [CrossRef] [PubMed]
Wadichar, A.; Murarka, S.; Shah, D.; Bhurane, A.; Sharma, M.; Mir, H.S.; Acharya, U.R. A hierarchical approach for the diagnosis of sleep disorders using a convolutional recurrent neural network. IEEE Access 2023, 11, 125244–125255. [Google Scholar] [CrossRef]
Iranzo, A.; Molinuevo, J.L.; Santamaría, J.; Serradell, M.; Martí, M.J.; Valldeoriola, F.; Tolosa, E. Rapid-eye-movement sleep behaviour disorder as an early marker for a neurodegenerative disorder: A descriptive study. Lancet Neurol. 2006, 5, 572–577. [Google Scholar] [CrossRef]
Rechtschaffen, A.; Kales, A. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects; Public Health Service, U.S. Government Printing Office: Washington, DC, USA, 2023.
Michielli, E.; Acharya, U.R.; Molinari, F. Cascaded LSTM recurrent neural network for automated sleep stage classification using single-channel EEG signals. Comput. Biol. Med. 2019, 106, 71–81. [Google Scholar] [CrossRef]
Biswal, S.; Sun, H.; Goparaju, B.; Westover, M.B.; Sun, J.; Bianchi, M.T. Expert-level sleep scoring with deep neural networks. J. Am. Med. Inform. Assoc. 2018, 25, 1643–1650. [Google Scholar] [CrossRef]
An, S.; Osei-Mensah, K. A deep semi-supervised domain generalization approach for epileptic seizure prediction using electroencephalography (EEG). J. Stud. Res. 2024, 13, 1. [Google Scholar] [CrossRef]
Jia, X.; Li, K.; Li, X.; Zhang, A. A Novel Semi-Supervised Deep Learning Framework for Affective State Recognition on EEG Signals. In Proceedings of the 2014 IEEE International Conference on Bioinformatics and Bioengineering, Boca Raton, FL, USA, 10–12 November 2014; pp. 30–37. [Google Scholar] [CrossRef]
Wang, I.-N.; Lee, C.-H.; Kim, H.-J.; Kim, H.; Kim, D.-J. An Ensemble Deep Learning Approach for Sleep Stage Classification via Single-channel EEG and EOG. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; pp. 394–398. [Google Scholar] [CrossRef]
Zhu, M.; Chen, K.; Jiang, M. Automatic sleep stage classification with deep residual networks in a mixed-cohort setting. Sleep 2020, 43, zsaa060. [Google Scholar]
Sharma, R.; Pachori, R.B.; Acharya, U.R. An integrated index for the identification of focal electroencephalogram signals using discrete wavelet transform and entropy measures. Entropy 2015, 17, 5218–5240. [Google Scholar] [CrossRef]
Kantelhardt, J.W.; Zschiegner, S.A.; Koscielny-Bunde, E.; Havlin, S.; Bunde, A.; Stanley, H.E. Multifractal detrended fluctuation analysis of nonstationary time series. Phys. A 2002, 316, 87–114. [Google Scholar] [CrossRef]
Ding, S.; Wang, K.; Jiang, W.; Xu, C.; Bo, H.; Ma, L.; Li, H. DGAT: A dynamic graph attention neural network framework for EEG emotion recognition. Front. Psychiatry 2025, 16, 1633860. [Google Scholar] [CrossRef]
Rosenblum, M.; Bódizs, R.; Simor, P.; Andrillon, T. Fractal cycles of sleep: A new aperiodic activity-based definition of sleep cycles. eLife 2025, 14, e96784. [Google Scholar] [CrossRef] [PubMed]
Madhav, M.; Rohit, V.; Mohan, N.; Sachin Kumar, S. EEG-Based Multi-Class Sleep Disorder Classification Using Deep Learning, Machine Learning and Neural Tangent Kernel. In Proceedings of the 2025 5th International Conference on Intelligent Technologies (CONIT), Hubbali, India, 20–22 June 2025; pp. 1–6. [Google Scholar] [CrossRef]
Amin Shazid, M.R. A Multi-Model Approach for Classifying Sleep Disorders Utilizing Machine Learning and Deep Learning Techniques. In Proceedings of the 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), Dhaka, Bangladesh, 2–4 May 2024; pp. 830–835. [Google Scholar] [CrossRef]
Hu, Y.; Shi, W.; Yeh, C.-H. Spatiotemporal convolution sleep network based on graph attention mechanism with automatic feature extraction. Comput. Methods Programs Biomed. 2024, 244, 107930. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Guo, Y.; Shen, Y.; Tong, S.; Guo, H. Multi-layer graph attention network for sleep stage classification based on EEG. Sensors 2022, 22, 9272. [Google Scholar] [CrossRef]
Chen, Z.; Shi, W.; Zhang, X.; Yeh, C.-H. Temporal self-attentional and adaptive graph convolutional mixed model for sleep staging. IEEE Sens. J. 2024, 24, 12840–12852. [Google Scholar] [CrossRef]
Hazra, S.; Ghosh, S. Bridging accuracy and explainability in EEG-based graph attention network for depression detection. arXiv 2025, arXiv:2511.05537. [Google Scholar]
Duan, J.; Xie, F.; Huang, N.; Luo, N.; Guan, Z.; Zhao, W.; Gao, G. An EEG abnormality detection algorithm based on graphic attention network. Multimed. Tools Appl. 2024, 83, 17941–17960. [Google Scholar] [CrossRef]
Wang, Y.; Wang, P.; Li, Z.; Liu, F.; Huang, J. Seizure-NGCLNet: Representation learning of SEEG spatial pathological patterns for epileptic seizure detection via node-graph dual contrastive learning. arXiv 2025, arXiv:2512.02028. [Google Scholar]
Satapathy, S.K.; Loganathan, D.; Narayanan, P.; Sharathkumar, S. Convolutional neural network for classification of multiple sleep stages from dual-channel EEG signals. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3–5 December 2020; pp. 1–16. [Google Scholar] [CrossRef]
Urtnasan, E.; Joo, E.Y.; Lee, K.H. AI-enabled algorithm for automatic classification of sleep disorders based on single-lead electrocardiogram. Diagnostics 2021, 11, 2054. [Google Scholar] [CrossRef]
Kolhar, M.; Alfuraydan, M.M.; Alshammary, A.; Alharoon, K.; Alghamdi, A.; Albader, A.; Alnawah, A.; Alanazi, A. Automated sleep stage classification using PSO-optimized LSTM on CAP EEG sequences. Brain Sci. 2025, 15, 854. [Google Scholar] [CrossRef]
Eldele, E.; Chen, Z.; Liu, C.; Wu, M.; Kwoh, C.-K.; Li, X.; Guan, C. An attention-based deep learning approach for sleep stage classification with single-channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 809–818. [Google Scholar] [CrossRef] [PubMed]
Phan, H.; Mikkelsen, K.; Chen, O.Y.; Koch, P.; Mertins, A.; De Vos, M. SleepTransformer: Automatic sleep staging with interpretability and uncertainty quantification. IEEE Trans. Biomed. Eng. 2022, 69, 2456–2467. [Google Scholar] [CrossRef] [PubMed]
Banville, H.; Chehab, O.; Hyvärinen, A.; Engemann, D.-A.; Gramfort, A. Uncovering the structure of clinical EEG signals with self-supervised learning. arXiv 2020, arXiv:2007.16104. [Google Scholar] [CrossRef] [PubMed]

$Fractalfract 10 00199 g001$

Figure 1. Methodology Diagram.

$Fractalfract 10 00199 g001$

$Fractalfract 10 00199 g002$

Figure 2. Test-set class distribution for the CAP dataset showing equal representation across the five diagnostic categories.

$Fractalfract 10 00199 g002$

$Fractalfract 10 00199 g003$

Figure 3. (A). t-SNE of EEG segments by sleep disorder. (B). Multifractal spectrum h(q) ± 1 SD for each class.

$Fractalfract 10 00199 g003$

$Fractalfract 10 00199 g004$

Figure 4. CAP Sleep Dataset training and validation on 250 epochs.

$Fractalfract 10 00199 g004$

$Fractalfract 10 00199 g005$

Figure 5. Five-class sleep disorder: Confusion matrix on CAP Sleep Dataset.

$Fractalfract 10 00199 g005$

$Fractalfract 10 00199 g006$

Figure 6. CAP Sleep Dataset training and validation without MFDFA.

$Fractalfract 10 00199 g006$

$Fractalfract 10 00199 g007$

Figure 7. Per class performance without MFDFA features.

$Fractalfract 10 00199 g007$

$Fractalfract 10 00199 g008$

Figure 8. ROC curves for one vs. rest sleep disorder classification on the CAP dataset.

$Fractalfract 10 00199 g008$

$Fractalfract 10 00199 g009$

Figure 9. ROC curves for multiclass sleep disorder classification on the ISRUC dataset.

$Fractalfract 10 00199 g009$

$Fractalfract 10 00199 g010$

Figure 10. ISRUC Sleep Dataset: Training and validation dynamics on 35 epochs.

$Fractalfract 10 00199 g010$

$Fractalfract 10 00199 g011$

Figure 11. Classification of sleep disorder (3 classes) on the ISRUC Sleep Dataset of the confusion matrix.

$Fractalfract 10 00199 g011$

Table 1. Comparison of EEG-Based Deep Learning and Fractal Analysis Approaches for Sleep-Related Classification.

Authors	Purpose	Method	Dataset	Results	Key Findings	Limitations
Ding et al. [14]	Emotion recognition	DGAT with multi-head attention	SEED, DEAP	94.0% accuracy on SEED and 93.55% accuracy on DEAP	Dynamic graph learning improved emotion classification	Poor subject-independent generalization
Rosenblum et al. [15]	Sleep cycle identification	Fractal slope analysis	Adults, children, and depressed patients	91–98% REM accuracy	Biologically meaningful sleep cycle modeling	Limited subject-independent robustness
Madhav et al. [16]	Sleep disorder classification	CNN, RF, XGBoost, NTK	CAP EEG	72.0% accuracy achieved on CNN	CNN outperformed ML; effective features	Single-channel EEG; limited data
Shazid et al. [17]	Sleep disorder classification	RF, GNB, 1D CNN, CNN–BiLSTM	Kaggle Sleep Health & Lifestyle	92% accuracy achieved on CNN–BiLSTM	The hybrid DL model achieved the best performance	Synthetic dataset; limited generalization
Hu et al. [18]	Sleep stage classification	ST-GATv2	MASS-SS3	89.0% accuracy	Improved generalization with low complexity	Poor N1 stage accuracy
Wang et al. [19]	Sleep stage classification	MGANet with transition modeling	ISRUC-S3, SHHS	87.3% accuracy	Better modeling of stage transitions	Class imbalance; N1 confusion
Chen et al. [20]	Sleep stage classification	TS-AGCMM	MASS-SS3	89.1% accuracy; κ = 83.9	Strong N2 and NREM performance	Weak N1 discrimination
Hazra & Ghosh [21]	Depression detection	ExPANet	Two EEG datasets	F1 Score = 89.1%	Explainable EEG connectivity diagnosis	Limited population validation
Duan et al. [22]	Epilepsy detection	GAT–LSTM	Bonn, CHB-MIT, clinical	99.8% accuracy	Robust seizure detection	Weak cross-dataset generalization
Wang et al. [23]	Epilepsy detection	Seizure-NGCLNet	SEEG (57 patients)	AUC = 0.988	Effective weakly supervised learning	Patient-specific modeling
Satapathy et al. [24]	Sleep stage classification	Dual-channel 1D CNN	ISRUC-Sleep	78–79% accuracy	Dual-channel EEG improved robustness	EEG-only; limited real-time use
Wadichar et al. [3]	Diagnose sleep disorders using EEG and CAP phases.	Hierarchical CNN–LSTM; CNN for features, LSTM for temporal patterns.	CAP Sleep Dataset; single-channel NREM EEG.	93.31% accuracy for multiclass classification using hierarchical CNN–LSTM (phase B).	Phase B was most informative; it captured spatial and temporal patterns without manual features.	Single-center data, class imbalance, limited validation, and no nonlinear EEG features.
Urtnasan et al. [25]	An automatic sleep disorder classification method using a single-lead ECG was developed.	CNN–GRU–based DL SDN	CAP Sleep Dataset	F1-score up to 99%	Single-lead ECG enabled effective classification.	A small dataset and controlled conditions were used.
Kolhar et al. [26]	Automate sleep-stage and CAP-subtype classification using optimized LSTM.	LSTM with PSO + Hyperband tuning, RF-PCA features, class-weighted loss, and SHAP interpretability.	CAP Sleep Dataset	97% accuracy using PSO-optimized LSTM.	Hybrid optimization improved accuracy, reduced imbalance issues, and enhanced interpretability.	Small dataset, no external validation, and reliance on expert scoring.

Table 2. Dataset configuration, class composition, and stratified splits for CAP and ISRUC.

Component	CAP Sleep Dataset	ISRUC Sleep Dataset
Diagnostic classes	Healthy, Insomnia, Narcolepsy, PLM, RBD	Healthy, PLM, RBD
Data format	CSV	EDF or REC for disorder subjects, MATLAB R2023b mat for healthy subjects
Segments per class	9000 each	Not fixed, depends on subjects after preprocessing and augmentation
Sampling rate	512 Hz	256 Hz

Table 3. Complete hybrid model architecture specification.

Stage	Component	Configuration	Output Dimension
CNN Branch
1a	Conv1D (fine-scale)	filters = 32, kernel = 7, stride = 2, padding = 3, BatchNorm, ReLU	T/2 × 32
1b	Conv1D (coarse-scale)	filters = 32, kernel = 31, stride = 2, padding = 15, BatchNorm, ReLU	T/2 × 32
1c	Channel concatenation + MaxPool1D	pool size = 2, stride = 2	T/4 × 64
2	ResBlock1D	2 × Conv1D (kernel = 7, stride = 2, padding = 3), BatchNorm, ReLU, Dropout (0.3), SE (reduction = 16), residual skip (1 × 1 Conv1D)	T/8 × 128
3	ResBlock1D	same structure as Stage 2	T/16 × 256
4	ResBlock1D	same structure as Stage 2	T/32 × 512
5	Dual global pooling	AdaptiveAvgPool1d(1) ∥ AdaptiveMaxPool1d(1), concatenated	1024
Fractal Branch
6	Dense → BatchNorm → ReLU → Dropout(0.3)	input = 34 features, hidden = 512	512
7	Dense → BatchNorm → ReLU → Dropout(0.3)	512 → 512	512
8	Dense → BatchNorm → ReLU	512 → 256	256
Attention Fusion
9	Concatenation	CNN (1024) + Fractal (256)	1280
10	Attention gate	Dense(1280→320) → ReLU → Dense(320→1280) → Sigmoid; element-wise multiplication	1280
Classification Head
11	Dense → BatchNorm → ReLU → Dropout(0.3)	1280 → 512	512
12	Dense → BatchNorm → ReLU → Dropout(0.15)	512 → 256	256
13	Dense → Softmax	256 → 5 (CAP)/3 (ISRUC)	5/3
Training Configuration
	Optimizer	AdamW (lr = 2 × 10⁻⁵, weight decay = 1 × 10⁻⁵)
	Scheduler	OneCycleLR, cosine annealing, pct_start = 0.1
	Loss function	Focal Loss (γ = 2.0, label smoothing = 0.1)
	Gradient clipping	max norm = 1.0
	Mixed precision	AMP enabled (FP16)
	Weight initialization	Kaiming Normal (fan_out, ReLU); BatchNorm: weight = 1, bias = 0
	Total trainable parameters	~4.2 M

Table 4. Summary of the complete set of MF-DFA parameters employed in this study.

Parameter	Value	Justification
Scale range (s)	10–100 samples	Captures short-to-medium-range temporal dynamics within 2–4 s EEG epochs
Number of scales	20 (logarithmically spaced)	Ensures adequate resolution across the scaling range
Fluctuation order (q)	−5 to +5 (step = 0.5)	Positive q emphasizes large fluctuations; negative q emphasizes small fluctuations
Detrending order	1 (linear)	Removes local linear trends while preserving higher-order dynamics
Selected feature	h(q = 1), the generalized Hurst exponent	Most stable and discriminative exponent across sleep disorder classes
Segment overlap	Forward and backward segmentation (2Ns segments)	Ensures no data loss at epoch boundaries

Table 5. Experimental hardware and software setup.

Component	Specifications
GPU	NVIDIA GeForce RTX 3090, 24 GB VRAM
CPU	Intel Core i7, 16 cores, 3.4 GHz
RAM	32 GB DDR4
Operating System	Windows
Python Version	3.11
Framework	PyTorch 2.1 with CUDA 12.1 support

Table 6. Summary of the experimental configurations for the CAP Sleep Dataset and ISRUC Sleep Dataset.

Parameter	CAP	ISRUC
Number of Classes	5	3
Batch Size	32	32
Learning Rate	2 × 10⁻⁵	2 × 10⁻⁵
Maximum Epochs	250	35
Dropout Rate	0.35	0.35
Weight Decay	1 × 10⁻⁵	1 × 10⁻⁵

Table 7. Overall classification metrics on the CAP Sleep Dataset test set.

Metric	Value
Accuracy	86.38%
Macro Precision	88.05%
Macro Recall	88.15%
Macro F1-Score	88.02%
Cohen’s Kappa	0.8519
MCC	0.8298

Table 8. Per-class classification metrics on the CAP Sleep Dataset test set.

Class	Precision	Recall	F1-Score
Healthy	86.75%	93.16%	89.84%
Insomnia	91.27%	93.82%	92.53%
Narcolepsy	90.91%	95.11%	92.96%
PLM	85.04%	78.31%	81.54%
RBD	86.30%	80.36%	83.22%

Table 9. Ablation study of MF-DFA feature configurations on the CAP Sleep Database test set.

Config	Features Used	Dim	Accuracy (%)	F1-Macro (%)
A	h(1) only	1	86.38 ± 0.42	88.02 ± 0.38
B	Δh (spectrum width) only	1	83.72 ± 0.68	85.89 ± 0.61
C	h(−2), h(1), h(3)	3	85.53 ± 0.51	87.24 ± 0.46
D	Full h(q), q ∈ {−5, …, +5}	11	84.91 ± 0.73	86.68 ± 0.64
E	h(1) + Δh	2	85.86 ± 0.45	87.59 ± 0.40
F	No MF-DFA (CNN only)	0	82.14 ± 0.81	86.16 ± 0.74

Table 10. Overall classification metrics on the ISRUC Sleep Dataset test set.

Metric	Value
Accuracy	87.50%
Macro Precision	87.90%
Macro Recall	87.50%
Macro F1-Score	87.41%
Cohen’s Kappa	0.8125
MCC	0.8137

Table 11. Per-class classification metrics on the ISRUC Sleep Dataset test set.

Class	Precision	Recall	F1-Score
Healthy	78.99%	90.83%	84.50%
PLM	86.41%	74.17%	79.82%
RBD	98.32%	97.50%	97.91%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alorf, A. MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals. Fractal Fract. 2026, 10, 199. https://doi.org/10.3390/fractalfract10030199

AMA Style

Alorf A. MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals. Fractal and Fractional. 2026; 10(3):199. https://doi.org/10.3390/fractalfract10030199

Chicago/Turabian Style

Alorf, Abdulaziz. 2026. "MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals" Fractal and Fractional 10, no. 3: 199. https://doi.org/10.3390/fractalfract10030199

APA Style

Alorf, A. (2026). MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals. Fractal and Fractional, 10(3), 199. https://doi.org/10.3390/fractalfract10030199

Article Menu

MF-DFA–Enhanced Deep Learning for Robust Sleep Disorder Classification from EEG Signals

Abstract

1. Introduction

2. Literature Survey

Synthesis Overview

3. Methodology

3.1. CAP Sleep Dataset

3.2. ISRUC Sleep Dataset and Cross-Dataset Transfer Protocol

3.3. Preprocessing and Normalization

3.4. Feature Extraction Pipeline

3.4.1. Statistical and Time Domain Features

3.4.2. Spectral Features

3.4.3. Multifractal Detrended Fluctuation Analysis (MF-DFA)

3.4.4. CNN-Based Temporal Feature Extraction

3.4.5. Wavelet Scattering Network (WSN) Features

3.5. Software and Hardware Configuration

4. Results and Discussion

4.1. Experimental Configuration

4.2. CAP Sleep Dataset Performance

4.2.1. Interpretability of MF-DFA Features Across Sleep Disorder Classes

4.2.2. Ablation Study: MF-DFA Feature Selection

4.3. Cross-Dataset Transfer Evaluation on the ISRUC Sleep Dataset

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI