1. Introduction
Automatic classification of chaotic and regular dynamics is one of the core issues in the study of nonlinear dynamical systems. Accurately distinguishing between chaos and regular dynamics not only enhances our understanding of the intrinsic mechanisms of complex systems but also provides a theoretical foundation for system modeling, prediction, and control. As a classical criterion for differentiating regular from chaotic dynamics, the maximum Lyapunov exponent (MLE) faces significant challenges in practical applications [
1]. Its estimation relies on phase space reconstruction and local linearization, making it highly sensitive to observation noise, data length, and embedding parameters. Moreover, robust computation is challenging when the underlying equations are unknown. The 0–1 test has attracted attention due to its model-free nature and ease of implementation [
2]. However, when system trajectories exhibit long transients, intermittent chaos, or quasiperiodic motion, the output of the 0–1 test often falls between 0 and 1, making definitive classification difficult. Additionally, for quasiperiodic or weakly chaotic signals, MLE estimates may conflict with the 0–1 test output, leading to ambiguous classification [
3].
In recent years, the rapid development of machine learning and deep learning technologies has provided new insights into the classification of dynamical behaviors. In 2020, researchers trained a neural network to classify chaotic and regular trajectories on the Chirikov standard map, demonstrating the promise of deep learning in low-dimensional conservative systems [
4]. Wenkack Liedji et al. proposed a delay reservoir computer using a single nonlinear node, which achieved efficient differentiation between hyperchaotic, chaotic, and regular dynamics, demonstrating the unique advantages of nonlinear dynamical systems in pattern recognition [
5]. Subsequently, Jimenez-Lopez et al. explored the performance of traditional machine learning methods, such as support vector machines (SVMs), in classifying trajectories of Hamiltonian systems, underscoring the feasibility of data-driven approaches for identifying complex dynamics [
6]. However, these early data-driven studies primarily employed system-specific models. While effective for particular systems, they inadequately capture the multi-scale and high-dimensional characteristics of dynamical systems, limiting their generalization to more complex scenarios. Therefore, optimizing feature representation within a data-driven framework and enhancing the model’s capacity to capture intrinsic dynamical features have emerged as central challenges for advancing the field.
To address the challenges, the research community has progressively developed two complementary technical approaches. The first involves converting time series into image-based representations and applying deep learning models for classification. For example, a 2023 study transformed time series into recurrence plots and employed ResNet to classify periodic, white noise, and chaotic signals [
7]. In 2024, several studies employed deep learning models to identify chaotic systems based on attractor projections and phase diagrams [
8,
9]. The robustness of such models under noise interference was further investigated in [
10]. The second approach focuses on enhancing representational depth by fusing multi-dimensional information. For example, in 2025, Bajkova et al. proposed a spectral-entropy classifier based on the power spectrum of orbital radial time series, establishing a robust framework for distinguishing regular and chaotic dynamics in Galactic globular clusters [
11]. These two approaches have significantly advanced data-driven methodologies for dynamical classification. In particular, the latter—by directly integrating spectral features with the temporal structure of time series—has provided critical insights for developing more accurate and generalizable dynamical classification models.
However, existing research still has key gaps: First, the vast majority of methods focus on modeling and classifying integer-order dynamical systems but have not been systematically extended to fractional-order cases. Integer-order differential equations rely on local derivatives and are well suited for memoryless processes, whereas fractional-order differential equations—defined via non-local operators—can more accurately capture complex physical phenomena exhibiting memory effects and long-range temporal correlations. Therefore, developing a reliable method to distinguish dynamical behaviors generated by integer- versus fractional-order systems is crucial. Such a capability is essential for aligning mathematical models with the intrinsic nature of complex dynamical processes, thereby deepening our understanding of natural and engineering systems and enabling more reliable solutions to real-world problems. Second, trajectories exhibiting long-period orbits, intermittency, transient chaos, or quasi-periodic dynamics often display dynamical characteristics closely resembling those of genuinely chaotic systems, rendering both traditional approaches and current deep learning models susceptible to misclassification [
12].
To address the aforementioned challenges, this paper proposes a multi-branch deep learning model. The model integrates the global representational power of the Transformer, the temporal modeling capacity of a memory-augmented network, the local feature extraction ability of a multi-scale convolutional neural network, and the time–frequency analysis module based on the short-time Fourier transform (STFT) to form a unified recognition framework. It can adaptively classify complex dynamical behaviors—including integer-order chaos, fractional-order chaos, and steady states. It demonstrates strong generalization and robustness in cross-domain signal classification tasks. To comprehensively evaluate model performance, we design two experimental settings: (1) noise-free training and testing to assess classification accuracy under ideal conditions and (2) gradual addition of Gaussian, salt-and-pepper, and Rayleigh noise to input datasets, with increasing intensity, to evaluate the model’s robustness against disturbances.
The remainder of this paper is organized as follows.
Section 2 introduces chaotic systems of integer-order and fractional-order.
Section 3 presents the architecture of the proposed deep learning model.
Section 4 reports experimental results and provides quantitative evaluation using metrics such as accuracy, F1-score, and confusion matrices.
Section 5 concludes the paper and outlines directions for future research.
2. Chaotic Systems
The proposed model is trained, validated, and tested on time series generated by the chaotic systems described in this section. The governing equations and corresponding parameters of these systems are summarized in
Table 1 and
Table 2.
The Lorenz, Rössler, Chen, and Lü systems are foundational models in chaos theory and its applications. The Lorenz system is extremely sensitive to initial conditions, exhibiting the famous “butterfly effect” and a characteristic double-scroll chaotic attractor [
13]. The Rössler system, in contrast, features a simpler structure with only one nonlinear term [
14]. The Chen system is topologically distinct from the Lorenz system yet exhibits a more complex phase-space geometry and is often regarded as its dual [
15]. The Lü system serves as a transitional model between the Lorenz and Chen systems, elucidating their underlying dynamical relationships [
16]. Collectively, these systems span a broad spectrum of chaotic dynamics—from the meteorological origins of the Lorenz model to modern mathematical frameworks—and exhibit diverse yet comparable behaviors in terms of structural complexity, dynamical richness, and practical relevance.
Fractional-order versions of these systems are obtained by replacing their integer-order derivatives with fractional-order ones, yielding the fractional-order Lorenz [
17], Rössler [
18], Lü [
19], and Chen [
20] systems. These fractional-order systems better capture complex dynamical behaviors that exhibit memory effects and history-dependent evolution.
To enable the model to effectively learn and represent the dynamical features of trajectories from chaotic systems of integer-order and fractional-order, we generate datasets by numerically simulating the systems listed in
Table 1 and
Table 2. The integer-order systems in
Table 1 is solved using the Runge–Kutta algorithm. For the fractional-order systems in
Table 2, the Caputo fractional derivative definition is adopted and the Adomian decomposition algorithm is used for the solution (with the first 6 terms of the series taken as the approximate solution). Chaotic systems are highly sensitive to both initial conditions and system parameters. Even minor perturbations can lead to markedly different trajectories over time. This sensitivity serves as the basis for generating diverse and representative datasets.
We generate datasets in two ways. The first strategy fixes the system parameters and varies the initial conditions. Due to their sensitivity to initial conditions, even minute perturbations in the initial state are exponentially amplified during evolution, resulting in rapid trajectory divergence. Leveraging this property, we introduce random perturbations to a set of benchmark initial values to obtain 1000 distinct initial conditions. We then numerically integrate the system equations to obtain the corresponding time-series trajectories. The second strategy fixes the initial conditions and varies the system parameters. Chaotic systems are also highly sensitive to parameter variations; slight adjustments can induce transitions between distinct dynamical regimes, such as from a steady state to chaos. We sample system parameters within their typical chaotic intervals and numerically integrate the equations to generate the corresponding time-series trajectories.
To illustrate the dataset generation process, we use the integer-order Lorenz system as a representative example:
Fixed parameters, varied initial conditions: The system parameters are fixed as listed in
Table 1, and the reference initial condition is set to
. This choice is widely adopted in Lorenz system studies and reliably produces typical chaotic trajectories; it can be adjusted based on specific system characteristics or research objectives. Starting from this reference point, we generate 1000 initial conditions by adding independent uniform perturbations in the range
to each state variable. This approach leverages the system’s sensitivity to initial conditions: although all trajectories originate from nearby points, they rapidly diverge, yielding a diverse set of chaotic time series for the dataset.
Fixed initial conditions, varied parameters: The initial condition is fixed at , a choice validated in numerous studies for its ability to reveal clear dynamical changes under parameter variation. The bifurcation parameter is fixed at the canonical value of 10, the relative Rayleigh number moves equidistantly in the interval to obtain 1000 values, and the positive constant is fixed across all samples for each to build the dataset.
To achieve effective classification of chaotic and steady-state samples, we adopt the largest Lyapunov exponent (LLE) as the core discriminant criterion: a sample is designated chaotic if its LLE is greater than 0; otherwise, it is classified as steady-state. However, sole reliance on the LLE has limitations. Specifically, some parameter points lying within the generalized “chaotic parameter range” may manifest periodic windows—either transitioning to periodic steady states after a transient chaotic stage or exhibiting sustained periodic oscillations throughout. As illustrated by the Lyapunov exponent spectra and bifurcation diagrams in
Figure 1, such periodic windows are particularly prominent when the parameter
is within the interval of (150, 225). The inclusion of such samples in the training dataset would compromise the model’s capacity to identify the intrinsic characteristics of chaos.
Therefore, when generating the parameter-varying dataset, we comprehensively determine the dynamical regime of each parameter combination by integrating multiple diagnostic criteria, including phase-space attractor morphology and bifurcation diagrams. We eliminate the samples corresponding to periodic windows within the chaotic region and assign them to the steady-state sample set. This processing strategy not only ensures the purity of chaotic samples but also enables the steady-state samples to fully cover typical dynamical behaviors such as convergence, periodicity, quasi-periodicity, and period-doubling. For the Rössler, Chen, Lü systems and their fractional-order versions, the determination of chaotic/steady-state parameter ranges follows the same core protocol, so as to ensure the consistency of datasets across different systems.
To verify the effectiveness of the proposed method in scenarios close to real physical systems, a dataset was constructed based on circuit simulations of a non-autonomous multi-wing chaotic system. First proposed by Yan et al. [
21], this system features an infinite number of equilibrium points and symmetric attractors. Mao et al. [
22] further extended it to a fractional-order form and validated its application potential in weak signal detection.
Unlike pure numerical simulations, we implemented both integer-order and fractional-order analog circuits of the system using the MultiSim simulation platform. The circuit design strictly adheres to the system’s dynamical equations, with core components including inverting operational amplifiers, resistors, capacitors and multipliers, which realize the system’s nonlinear characteristics and the input of driving signals. To ensure the diversity and representativeness of the dataset, a dual-parameter tuning strategy was adopted for sample generation: first, the driving signal amplitude was adjusted across 10 gradient levels to obtain attractor topologies with different wing numbers; for each amplitude gradient, 100 independent adjustments of the resistance ratio were further performed to introduce perturbations in circuit parameters. As the driving signal amplitude varies, the attractors of both integer-order and fractional-order multi-wing chaotic systems transform from four-wing to two-wing configurations, as illustrated in
Figure 2.
The dataset is partitioned into three subsets: 60% for training (to optimize model parameters and learn characteristic patterns of chaotic signals), 20% for validation (to monitor performance, tune hyperparameters, and mitigate overfitting), and the remaining 20% for final testing.
3. Multi-Branch Attention Network Model
The evolution of trajectories in fractional-order chaotic systems depends not only on the current state but also on the entire history of past states. This intrinsic memory property is naturally suited to modeling with recurrent neural architectures.
In contrast, integer-order chaotic systems are Markovian: their trajectory evolution depends solely on the current state. Nevertheless, due to the ergodicity of their attractors, their time series exhibit long-range temporal correlations, which can be effectively captured by self-attention mechanisms. By contrast, non-chaotic steady-state dynamics—such as periodic orbits or convergence to fixed points—exhibit concentrated spectral energy, typically appearing as discrete peaks in the frequency domain. This stands in stark contrast to the broadband, continuous power spectrum characteristic of chaotic systems, and the distinction can be reliably identified by time–frequency analysis.
Given the diversity of dynamical characteristics across integer-order chaos, fractional-order chaos, and non-chaotic steady states, a single-model approach inherently struggles to capture the full spectrum of underlying features. To address this challenge, we propose a novel multi-branch network, named the Multi-branch Attention Network (MANet). The model establishes a unified classification framework by integrating complementary feature extraction mechanisms. As shown in
Figure 3, the network comprises four main branches: the Multi-Scale CNN Branch, the Time–Frequency Analysis Branch, the Transformer Branch, and the Dynamic Memory Branch. By leveraging the complementary strengths of each branch, MANet jointly captures diverse dynamical patterns from raw time series, encompassing both time- and frequency-domain features as well as local and global structures. Building on this, the model employs a cross-modal attention mechanism to enable adaptive interaction and efficient fusion of multi-source features, ultimately yielding accurate pattern recognition via the classifier.
3.1. Multi-Scale CNN
In this study, a multi-scale convolutional neural network (CNN) module [
23] is employed as a branch to extract local temporal dynamics from input signals, thereby enhancing the model’s sensitivity to short-term transient features. As shown in
Figure 4, the CNN branch comprises four parallel convolutional blocks. Each block contains a single one-dimensional convolutional (Conv1d) layer with distinct kernel sizes (3, 5, 7, and 9) and corresponding dilation rates (1, 2, 3, and 4).
The operation of each convolutional layer is defined as
Here,
denotes the input signal,
represents the weight at position
k within the convolution kernel of length
K,
b is the bias term, and
d is the dilation rate, which determines the spacing between sampled elements in the input sequence. By introducing dilated convolutions, the receptive field is effectively expanded without increasing the number of parameters, enabling the capture of long-range temporal dependencies. Specifically, smaller convolutional kernels are effective at capturing high-frequency transient patterns, while larger ones better model slow-varying trends.
Each convolutional layer is followed by batch normalization (BatchNorm1d) and a ReLU activation function to facilitate training convergence and enhance representational capacity. Additionally, a residual connection is incorporated: the input is projected to the target channel dimension via a 1 × 1 convolution and then added to the main path, thereby mitigating the vanishing gradient problem in deep networks. The outputs of all convolutional blocks are then concatenated along the channel dimension to form a multi-scale feature representation. Subsequently, these features are further integrated via a 1D convolutional layer with a kernel size of 5. Finally, adaptive average pooling (AdaptiveAvgPool1d) is applied to adaptively map input sequences of arbitrary length to a fixed length of 512, enabling compatibility with downstream classification modules.
3.2. Time–Frequency Analysis
The Short-Time Fourier Transform (STFT) is used to convert the input signal into a time–frequency representation, revealing how its frequency components evolve over time [
24]. Additionally, logarithmic compression is applied to enhance the visibility of weak frequency components. The mathematical formulation is given by
where
represents the complex spectral value at frequency
for the
m-th sliding window, and
is the input time-domain signal.
denotes the window function of length
, which is typically equal to the FFT length
and set to 28 in this study.
indicates the hop size (i.e., frame shift), set to 16. Subsequently, local spectral patterns in the spectrogram are extracted using a one-dimensional convolutional layer, and adaptive pooling is applied to fix the output sequence length to 512.
This module plays a critical role in chaos classification. Chaotic systems typically exhibit a broadband frequency spectrum lacking discrete peaks, whereas non-chaotic (e.g., periodic) states produce distinct, narrowband frequency components. In the resulting time–frequency representations, these stable components manifest as prominent horizontal bands, thereby enabling clear discrimination between chaotic and non-chaotic dynamics.
3.3. Transformer
In this study, the Transformer branch is used to capture long-range dependencies in the input sequence. The self-attention mechanism enables direct interactions between arbitrary positions in the sequence (up to a maximum length of 5000 time steps), thereby effectively modeling long-range temporal structures inherent in chaotic dynamics. As shown in
Figure 5, its architecture comprises an embedding layer, positional encodings, and two encoder layers. Input features are first projected into a high-dimensional embedding space, and learnable positional encodings are added to preserve sequential order. Global contextual features are then refined through the encoder layers. Finally, adaptive average pooling is applied to reduce the output sequence to a fixed length of 512.
The attention mechanism uses scaled dot-product attention, defined as follows [
25]:
where
Q is the query matrix, representing the current position seeking attention;
K is the key matrix, used to compute similarity with other positions; and
V is the value matrix, containing the content vectors associated with the keys. The product
computes the dot product between each query and all keys, yielding an attention score matrix. Here,
denotes the dimensionality of the key vectors, and scaling the dot products by
mitigates the risk of vanishing gradients caused by excessively large values. Subsequently, the softmax function normalizes these scores into attention weights, indicating the relative importance of each key position with respect to a given query. Finally, a weighted sum of the value matrix
V using these attention weights produces a context-aware representation that captures global dependencies.
3.4. Dynamic Memory Network
The dynamic memory branch combines a bidirectional Gated Recurrent Unit (Bi-GRU) [
26] with a temporal attention mechanism to model temporal dependencies in the input sequence. The Bi-GRU captures bidirectional temporal features, while the attention mechanism enables the model to emphasize informative time steps. The module also incorporates residual connections and layer normalization to improve training stability and generalization. Finally, global average pooling across the time dimension produces a 512-dimensional context vector, which serves as the prior in cross-modal attention to reduce the computational burden of the Transformer on ultra-long sequences. The overall architecture is illustrated in
Figure 6.
3.5. Cross-Modal Attention Fusion Mechanism
The model introduces a cross-modal attention module to fuse the features from the four branches. This module takes the outputs of the first three branches (
Section 3.1,
Section 3.2 and
Section 3.3) as the query, key, and value, respectively, calculates the correlation weights among them via a dot-product attention mechanism, and then performs weighted fusion based on these weights to obtain a cross-modal feature representation. In addition, the output of the fourth branch (
Section 3.4) is extended to the same number of time steps and then concatenated to the fused features, forming the final joint feature vector. Finally, the fused features undergo regularization via a dropout layer and are then fed into a fully connected layer to output the final prediction results.
In chaos classification, this mechanism enables the model to adaptively integrate complementary information from multiple perspectives, thereby better capturing the multi-dimensional dynamics of chaotic systems. When the input to a branch suffers from a sharp drop in signal-to-noise ratio—due to sensitivity to initial conditions or external noise—the attention weights are automatically adjusted to downweight unreliable features, ensuring robust fusion.
To ensure the reproducibility of our experiments,
Table 3 summarizes the key hyperparameters and configurations adopted in model training, including the optimizer, learning rate, and hardware environment.
4. Results and Discussion
This paper adopts three representative methods commonly used in pattern recognition as baselines to evaluate the classification performance of the proposed MANet model. Support Vector Machine (SVM), grounded in statistical learning theory, separates classes by constructing an optimal separating hyperplane and is known for its strong generalization capability [
27]. Long Short-Term Memory (LSTM) is a recurrent neural network designed to model sequential data by capturing long-range temporal dependencies [
28]. The Transformer processes input sequences through self-attention and is particularly effective at capturing global contextual features [
25]. All baseline models are trained and evaluated under identical experimental conditions to ensure a fair comparison.
4.1. Recognition Performance of Network Models Based on Variable Initial Value Dataset
Figure 7 displays confusion matrices for four models on the initial variable dataset (Class 0: stable dynamics; Class 1: integer-order chaos; Class 2: fractional-order chaos). SVM achieves 100% perfect classification, while MANet misclassifies only 0.02% of Class 0 (as Class 1) and 0.08% of Class 1 (as Class 2), with 100% accuracy on Class 2.
Table 4 further confirms MANet’s 99.97% precision/recall/F1-score. To verify stability, 20 repeated experiments (
Table 5) show MANet’s test-set accuracy averages 99.994% (std = 0.008%), outperforming others with minimal variance—proving stable convergence under different initializations.
The aforementioned performance is evaluated under ideal noise-free conditions. However, real-world time series are often corrupted by observation noise (e.g., sensor errors, electromagnetic interference), which can significantly degrade model performance. To simulate such conditions, we introduce three types of perturbations: Gaussian, salt-and-pepper, and Rayleigh noise, each at three intensity levels: 10%, 20%, and 30%. Specifically, the standard deviation of Gaussian and Rayleigh noise is set to 10%, 20%, or 30% of the original signal’s standard deviation; for salt-and-pepper noise, we randomly replace 1% of data points with extreme values scaled by ±10%, ±20%, or ±30% relative to the signal’s max/min.
As shown in
Table 6, MANet and SVM maintain exceptional robustness across all noise types and intensities, while LSTM suffers the most significant degradation.
However, when applied to chaotic time series generated under varying system parameters, the model’s performance degrades significantly. This indicates that, while different initial conditions yield distinct trajectories, they preserve the system’s intrinsic dynamics. Consequently, the model learns features tied to the attractor of the training parameters and cannot adapt to parameter-induced dynamical changes (e.g., bifurcations or attractor shifts).
We therefore conjecture that robust cross-parameter generalization requires exposing the model to such dynamical variations during training. To this end, we construct a second dataset comprising time series generated across a wide range of system parameters, aiming to capture a broader spectrum of chaotic behaviors and enhance adaptability to diverse dynamical regimes.
4.2. Recognition Performance of Network Models Based on Parameter-Varying Dataset
The confusion matrices for the four classification models on the parameter-varying dataset are shown in
Figure 8. It shows that for MANet, SVM, and Transformer, the diagonal entries—representing correct classifications—are substantially higher than the off-diagonal entries, which correspond to misclassifications. Notably, MANet achieves 100.00% accuracy across all classes, demonstrating exceptional discriminative capability.
Table 7 summarizes the classification performance of the four models on the variable-parameter test set. MANet achieves the highest performance across all evaluation metrics, with the F1-score reaching 100.00%, while LSTM shows notably inferior results, particularly in recall.
We conducted 20 independent runs for each model and report the mean and standard deviation of accuracy on the training, validation, and test sets in
Table 8. MANet achieves an average test accuracy of 99.887% ± 0.152%, confirming its robust performance. More importantly, this result supports our earlier observation about the limitations of the variable-initial-value dataset: while varying initial conditions yield diverse time-series trajectories, the fixed system parameters limit the model’s ability to learn discriminative features that generalize across dynamical regimes. Consequently, the model fails to generalize when evaluated under parameter variations.
In contrast, the parameter-varying dataset encompasses the full spectrum of state evolution across different system parameters, compelling the model to learn robust dynamical invariants rather than memorizing parameter-specific trajectory patterns. Thus, MANet’s strong performance not only validates the effectiveness of our data construction strategy but also underscores the critical role of training data diversity in chaotic system classification.
To assess robustness to input noise, we applied Gaussian, salt-and-pepper, and Rayleigh noise (as described in
Section 4.1) to the variable-parameter test set. MANet consistently outperforms the other models across all noise levels, achieving the highest precision, recall, and F1-score. While LSTM shows the weakest resistance to perturbations(results in
Table 9).
In this study, the base model was selected from the one pre-trained on the parameter-varying dataset augmented with Rayleigh noise. This noise-embedded training strategy endows the model with a certain degree of anti-disturbance capability. To verify the generalization performance of the model, a transfer learning strategy was adopted: the base model was fine-tuned using only 30 samples from the Lü system dataset. Key hyperparameters during fine-tuning were configured as follows: the learning rate was set to , the batch size was 8, and only the top 3 convolutional blocks of the model were designated as trainable layers. This setup maximized the preservation of the intrinsic dynamical feature representations of chaotic systems acquired during the pre-training phase. Experimental results demonstrate that, relying solely on 30 fine-tuning samples, the model achieved a classification accuracy of 86% on the Lü system test set within a small number of training epochs. With an increase in training iterations, the accuracy further rose to 96%. This outcome fully indicates that the model pre-trained on the parameter-varying dataset has learned the universal dynamical features of chaotic systems, rather than overfitting to the attractor geometric structures of specific systems. Meanwhile, such performance also validates the model’s strong generalization capability to rapidly adapt to completely new chaotic systems with an extremely limited number of target-domain samples.
Table 10 presents the classification performance of all models on the MultiSim circuit simulation dataset. It can be observed that the proposed MANet achieves the highest classification accuracy (99.029%) and F1-score (99.020%), which benefits from its multi-branch architecture that enables effective extraction of the intrinsic dynamical features of chaotic systems. These results verify the effectiveness and generalization capability of the proposed method in scenarios close to real physical systems.
Relative to the full MANet model (
Table 7), which achieves near-perfect precision (99.967%), recall (99.966%), and F1-score (100%), the removal of any single branch results in a measurable performance degradation, as quantified in
Table 11. Notably, removing either the multi-scale convolution branch or the time–frequency analysis branch causes a substantial drop. Specifically, eliminating the multi-scale convolution branch reduces precision from 99.967% to 98.312% (a 1.66 percentage point decrease), while removing the frequency-domain analysis branch results in a 9.0 percentage point decline in precision, dropping from 99.967% to 90.968%, highlighting its critical role in capturing the distinctive time–frequency characteristics of chaotic dynamics.
5. Conclusions
To address the challenge of automatically classifying integer-order and fractional-order chaotic systems, this study proposes a multi-branch attention network that integrates a Transformer, a dynamic memory network, a multi-scale CNN, and a time–frequency analysis module based on the short-time Fourier transform (STFT). This architecture jointly captures long-range temporal dependencies, local dynamical structures, multi-scale temporal patterns, and time–frequency characteristics of input time series, enabling accurate and robust discrimination among the target dynamical classes.
Experiments on simulated datasets generated by canonical chaotic systems—including Lorenz, Rössler, Lü, and Chen—show that the proposed model achieves over 99% classification accuracy under noise-free conditions and maintains high performance in the presence of Gaussian, salt-and-pepper, and Rayleigh noise, significantly outperforming baseline models such as LSTM, SVM, and standalone Transformer. These results confirm the effectiveness of the multi-branch collaborative design for complex dynamical recognition.
Ablation studies further reveal that removing any branch degrades performance, with the omission of the multi-scale CNN or the time–frequency analysis module causing the largest drops in accuracy. This underscores the complementary nature of the branches, each contributing distinct yet synergistic features to the model’s discriminative power. Moreover, we find that the composition of the training data critically affects the model’s generalization performance: models trained on datasets generated with fixed parameters and varying initial conditions exhibit substantial performance degradation when tested on samples with unseen parameter values, indicating limited robustness to parameter perturbations. In contrast, training on data generated with diverse parameter settings markedly improves generalization to new dynamical configurations.
This work not only presents an efficient and robust data-driven framework for classifying integer- and fractional-order chaotic systems but also highlights the critical roles of both architectural design and training data diversity in dynamical system recognition. Future research will extend this framework to real-world experimental signals—such as those from electronic circuits, fluid systems, and biological processes—and develop lightweight variants to support real-time classification applications.