Symbolic Early Stopping in Neural Sequence Models via Mapper-Induced Symbolic Dynamics
Abstract
1. Introduction
- We introduce SES, a hybrid representation-aware stopping criterion for neural sequence models based on Mapper-induced symbolization of hidden-state trajectories obtained during validation.
- We use Mapper as a lightweight topological scaffold for hidden-state organization, reducing the computational burden of full simplicial-complex constructions while preserving the large-scale structure relevant for stopping decisions.
- We define validation symbolization with respect to the phase-space partition inferred directly from validation hidden states at each epoch, which keeps the method consistent with the implemented algorithmic pipeline.
- We aggregate several symbolic and entropy-geometric descriptors into a single score to improve robustness with respect to short-term fluctuations of individual metrics.
- We study the variability of stopping times across individual SD metrics and their ensembles, the transferability of Mapper hyperparameters, and the computational overhead of representation-aware monitoring.
- We empirically benchmark SES against representative loss-based, slope-based, correlation-based, and activation-similarity baselines (Patience, Slope, CDSC, and SVCCA) on datasets with different dynamical regimes and noise levels, and characterize the regimes in which SES yields a favorable quality–efficiency trade-off rather than universal dominance.
2. Related Works
3. Methodology
3.1. Method Overview and Interpretation
3.2. Method Description
- −
- Lempel–Ziv complexity (LZ) as a measure of incremental string compressibility;
- −
- Markov entropy rate over the transition matrix and the stationary distribution;
- −
- permutation entropy (PermEn) over ordinal patterns;
- −
- correlation dimension over the correlation sum;
- −
- optionally, the fractal (box-counting) dimension of the set of visited states.
- − “min_epoch”—epoch number starting from which stopping is generally allowed;
- − patience parameter P—maximum number of consecutive epochs without any significant improvement;
- − minimum improvement threshold δsym, separating meaningful decreases in Se from random fluctuations.
3.3. Metrics of Symbolic Dynamics
3.4. Datasets and Metrics
- −
- The epoch number
- −
- “Oracle epoch” e*, defined as the epoch of the global minimum of the validation loss over the entire learning horizon under consideration, for example, in the first E_max epochs,
- −
- Savings in training epochs,
- −
- The proportion of runs in which the compared ES method is “close enough” to the oracle,
3.5. Experimental Techniques
4. Results and Discussion
4.1. Variability of Symbolic-Dynamics Indicators (E1 Group of Experiments)
4.2. Comparison with Baseline Early-Stopping Methods (E2 Group of Experiments)
4.3. Robustness Under Additive Gaussian Noise (E3 Group of Experiments)
4.4. Layer-Wise Analysis and Robustness Interpretation (E4 Group of Experiments)
4.5. Mapper Hyperparameter Transfer Across Architectures (E5 Group of Experiments)
4.6. Runtime Profiling and Global Robustness Interpretation (E6 Group of Experiments)
4.7. Reproducibility and Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Algorithmic Specification of SES
Symbolic Early Stopping (SES, Hybrid Rule)
| Algorithm A1: Symbolic Early Stopping (SES) |
|
input: model, train_loader, val_loader, mapper_params, metrics_params, α, min_epoch, P, W_slope, ε_slope, δ, val_guard_abs, val_guard_rel, guard_win Output: e_stop, best_epoch, best_val 1: Initialize: best_score ← +∞, best_val ← +∞, score_no_improve ← 0, S ← [], e_stop ← E 2: for e = 1 to E do 3: Train one epoch 4: H_e ← collect_hidden(val_loader) 5: nodes, assign ← build_mapper(H_e, mapper_params) 6: S_str ← symbolize(assign) 7: feats ← compute_panel(S_str, metrics_params) 8: feats_sm ← EMA(feats, α) 9: score_e ← rank_aggregate(feats_sm) 10: Append score_e to S; val_e ← evaluate_val_loss() 11: if score_e < best_score − δ then 12: best_score ← score_e; score_no_improve ← 0 13: else score_no_improve ← score_no_improve + 1 14: if val_e < best_val then best_val ← val_e; best_epoch ← e 15: stall ← (e ≥ min_epoch) ∧ (score_no_improve ≥ P) 16: plateau ← (e ≥ min_epoch) ∧ flat_slope(S, W_slope, ε_slope) 17: guard ← (e ≥ min_epoch) ∧ ¬guard_improve(val_history, …) 18: if (stall ∨ plateau) ∧ guard then 19: e_stop ← e; break 20: end for 21: Optionally: restore checkpoint(best_epoch) 22: return e_stop, best_epoch, best_val, S |
Appendix B. Detailed Description of the Experimental Techniques
Appendix C. Detailed Description of the Results of the E2 Group Experiments


- The Lorenz trajectory provides the clearest chaotic test case. Here, the trade-off between stopping early and staying close to the oracle becomes especially delicate. SES often identifies a stabilization phase very early, but on the most sophisticated variants, this early stop can be more aggressive than the lowest-regret alternative.


- AirPassengers illustrates a different failure mode. The dataset is small and quasi-periodic, and the usefulness of an early stopping strongly depends on the architecture. On recurrent models, several criteria, including SES, often do not stop substantially prior to the full budget being exhausted, whereas on the Transformer, many rules stop very early but not always at an appropriate point.

- Bitcoin serves as a non-stationary stress test with regime shifts. In this case, ordering of internal representations progresses more unevenly, so the main basis for comparison is not only the final regret but also the consistency with which a method produces a genuinely early stop.



- EEG is high-dimensional, noisy, and only weakly stationary, so symbolic metrics stabilize later than on ETT or AirPassengers. This makes EEG one of the hardest datasets for SES and for the baselines alike.














Appendix D. Detailed Description of the Results of the E3 Group Experiments
Appendix Е. Detailed Description of the Results of the E4 Group Experiments









Appendix F. Statistical Comparison of SES with Baseline Stopping Rules
| Dataset | Model | Baseline | SES ΔBest, med [IQR] | Baseline ΔBest, med [IQR] | Δmed (SES−Base) [95% CI] | SES wins? |
|---|---|---|---|---|---|---|
| ETTh1 | RNN | Patience | 2.7623 [3.1851] | 0.1127 [9.1169] | +2.6496 [−2.9115, 8.2106] | ✗ |
| ETTh1 | RNN | Slope | 2.7623 [3.1851] | 2.7804 [6.4139] | −0.0181 [−4.1418, 4.1056] | ✓ |
| ETTh1 | RNN | CDSC | 2.7623 [3.1851] | 2.7768 [2.6655] | −0.0145 [−2.4062, 2.3771] | ✓ |
| ETTh1 | RNN | SVCCA | 2.7623 [3.1851] | 0 [0] | +2.7623 [0.9281, 4.5964] | ✗ |
| ETTh1 | BiRNN | Patience | 0.1134 [0.1199] | 0.1110 [0.1110] | +0.0025 [−0.0916, 0.0966] | ✗ |
| ETTh1 | BiRNN | Slope | 0.1134 [0.1199] | 0.1370 [0.2410] | −0.0236 [−0.1786, 0.1315] | ✓ |
| ETTh1 | BiRNN | CDSC | 0.1134 [0.1199] | 0.1123 [0.0971] | +0.0011 [−0.0877, 0.0900] | ✗ |
| ETTh1 | BiRNN | SVCCA | 0.1134 [0.1199] | 0 [0] | +0.1134 [0.0444, 0.1825] | ✗ |
| ETTh1 | Transformer | Patience | 0.2740 [0.0793] | 0.0614 [0.3956] | +0.2126 [−0.0197, 0.4449] | ✗ |
| ETTh1 | Transformer | Slope | 0.2740 [0.0793] | 0.3424 [0.1036] | −0.0683 [−0.1434, 0.0068] | ✓ |
| ETTh1 | Transformer | CDSC | 0.2740 [0.0793] | 0.2318 [0.0719] | +0.0422 [−0.0194, 0.1038] | ✗ |
| ETTh1 | Transformer | SVCCA | 0.2740 [0.0793] | 0 [0] | +0.2740 [0.2284, 0.3197] | ✗ |
| ETTh2 | RNN | Patience | 0.2104 [0.1514] | 0.1887 [0.1568] | +0.0217 [−0.1038, 0.1472] | ✗ |
| ETTh2 | RNN | Slope | 0.2104 [0.1514] | 0.2351 [0.2935] | −0.0248 [−0.2150, 0.1654] | ✓ |
| ETTh2 | RNN | CDSC | 0.2104 [0.1514] | 0.1990 [0.1450] | +0.0114 [−0.1093, 0.1321] | ✗ |
| ETTh2 | RNN | SVCCA | 0.2104 [0.1514] | 0 [0] | +0.2104 [0.1232, 0.2975] | ✗ |
| ETTh2 | BiRNN | Patience | 0.2078 [0.1601] | 0.1995 [0.1736] | +0.0084 [−0.1276, 0.1444] | ✗ |
| ETTh2 | BiRNN | Slope | 0.2078 [0.1601] | 0.2429 [0.3041] | −0.0351 [−0.2329, 0.1628] | ✓ |
| ETTh2 | BiRNN | CDSC | 0.2078 [0.1601] | 0.2063 [0.1538] | +0.0016 [−0.1263, 0.1294] | ✗ |
| ETTh2 | BiRNN | SVCCA | 0.2078 [0.1601] | 0 [0] | +0.2078 [0.1157, 0.3000] | ✗ |
| ETTh2 | Transformer | Patience | 0.2575 [0.0813] | 0.0737 [0.3992] | +0.1838 [−0.0508, 0.4184] | ✗ |
| ETTh2 | Transformer | Slope | 0.2575 [0.0813] | 0.3107 [0.1131] | −0.0532 [−0.1334, 0.0270] | ✓ |
| ETTh2 | Transformer | CDSC | 0.2575 [0.0813] | 0.2147 [0.0749] | +0.0428 [−0.0209, 0.1064] | ✗ |
| ETTh2 | Transformer | SVCCA | 0.2575 [0.0813] | 0 [0] | +0.2575 [0.2107, 0.3043] | ✗ |
| ETTm1 | RNN | Patience | 0.5307 [0.1422] | 0.5122 [0.1548] | +0.0184 [−0.1026, 0.1395] | ✗ |
| ETTm1 | RNN | Slope | 0.5307 [0.1422] | 0.5672 [0.3073] | −0.0365 [−0.2315, 0.1584] | ✓ |
| ETTm1 | RNN | CDSC | 0.5307 [0.1422] | 0.5127 [0.1391] | +0.0179 [−0.0966, 0.1324] | ✗ |
| ETTm1 | RNN | SVCCA | 0.5307 [0.1422] | 0 [0] | +0.5307 [0.4488, 0.6125] | ✗ |
| ETTm1 | BiRNN | Patience | 0.5338 [0.1465] | 0.5183 [0.1764] | +0.0155 [−0.1166, 0.1476] | ✗ |
| ETTm1 | BiRNN | Slope | 0.5338 [0.1465] | 0.5960 [0.3226] | −0.0622 [−0.2662, 0.1419] | ✓ |
| ETTm1 | BiRNN | CDSC | 0.5338 [0.1465] | 0.5278 [0.1477] | +0.0060 [−0.1138, 0.1258] | ✗ |
| ETTm1 | BiRNN | SVCCA | 0.5338 [0.1465] | 0 [0] | +0.5338 [0.4495, 0.6182] | ✗ |
| ETTm1 | Transformer | Patience | 0.4262 [0.0827] | 0.2519 [0.4016] | +0.1743 [−0.0617, 0.4104] | ✗ |
| ETTm1 | Transformer | Slope | 0.4262 [0.0827] | 0.4914 [0.1127] | −0.0652 [−0.1457, 0.0153] | ✓ |
| ETTm1 | Transformer | CDSC | 0.4262 [0.0827] | 0.3888 [0.0771] | +0.0374 [−0.0277, 0.1025] | ✗ |
| ETTm1 | Transformer | SVCCA | 0.4262 [0.0827] | 0 [0] | +0.4262 [0.3786, 0.4738] | ✗ |
| ETTm2 | RNN | Patience | 0.5793 [0.1475] | 0.5562 [0.1653] | +0.0231 [−0.1045, 0.1507] | ✗ |
| ETTm2 | RNN | Slope | 0.5793 [0.1475] | 0.6247 [0.3276] | −0.0454 [−0.2523, 0.1615] | ✓ |
| ETTm2 | RNN | CDSC | 0.5793 [0.1475] | 0.5674 [0.1504] | +0.0119 [−0.1094, 0.1332] | ✗ |
| ETTm2 | RNN | SVCCA | 0.5793 [0.1475] | 0 [0] | +0.5793 [0.4943, 0.6642] | ✗ |
| ETTm2 | BiRNN | Patience | 0.5811 [0.1548] | 0.5600 [0.1770] | +0.0211 [−0.1143, 0.1565] | ✗ |
| ETTm2 | BiRNN | Slope | 0.5811 [0.1548] | 0.6541 [0.3434] | −0.0730 [−0.2899, 0.1439] | ✓ |
| ETTm2 | BiRNN | CDSC | 0.5811 [0.1548] | 0.5742 [0.1568] | +0.0069 [−0.1200, 0.1338] | ✗ |
| ETTm2 | BiRNN | SVCCA | 0.5811 [0.1548] | 0 [0] | +0.5811 [0.4920, 0.6703] | ✗ |
| ETTm2 | Transformer | Patience | 0.4689 [0.0882] | 0.2933 [0.4144] | +0.1756 [−0.0683, 0.4196] | ✗ |
| ETTm2 | Transformer | Slope | 0.4689 [0.0882] | 0.5357 [0.1184] | −0.0668 [−0.1518, 0.0182] | ✓ |
| ETTm2 | Transformer | CDSC | 0.4689 [0.0882] | 0.4281 [0.0805] | +0.0409 [−0.0279, 0.1096] | ✗ |
| ETTm2 | Transformer | SVCCA | 0.4689 [0.0882] | 0 [0] | +0.4689 [0.4181, 0.5197] | ✗ |
| AirPassengers | RNN | Patience | 0 [0] | 0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | RNN | Slope | 0 [0] | 0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | RNN | CDSC | 0 [0] | 58,891.3260 [36,358.8600] | −58,891.3260 [−79,828.0833, −37,954.5687] | ✓ |
| AirPassengers | RNN | SVCCA | 0 [0] | 97,363.3320 [330.6500] | −97,363.3320 [−97,553.7323, −97,172.9317] | ✓ |
| AirPassengers | BiRNN | Patience | 0 [0] | 0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | BiRNN | Slope | 0 [0] | 0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | BiRNN | CDSC | 0 [0] | 992.3630 [4252.1340] | −992.3630 [−3440.8968, 1456.1708] | ✓ |
| AirPassengers | BiRNN | SVCCA | 0 [0] | 1.539 × 105 [219.3690] | −1.539 × 105 [−1.540 × 105, −1.538 × 105] | ✓ |
| AirPassengers | Transformer | Patience | 8817.4850 [3716.4060] | 6811.7220 [4607.0510] | +2005.7630 [−1402.7118, 5414.2378] | ✗ |
| AirPassengers | Transformer | Slope | 8817.4850 [3716.4060] | 437.5180 [1315.3390] | +8379.9670 [6109.8425, 10,650.0915] | ✗ |
| AirPassengers | Transformer | CDSC | 8817.4850 [3716.4060] | 2036.1630 [2037.0710] | +6781.3220 [4340.8803, 9221.7637] | ✗ |
| AirPassengers | Transformer | SVCCA | 8817.4850 [3716.4060] | 7074.6250 [4660.1190] | +1742.8600 [−1689.4530, 5175.1730] | ✗ |
| Lorenz | RNN | Patience | 0.0190 [0.0437] | 0.0088 [0.0129] | +0.0102 [−0.0160, 0.0365] | ✗ |
| Lorenz | RNN | Slope | 0.0190 [0.0437] | 0.0041 [0.0036] | +0.0149 [−0.0103, 0.0402] | ✗ |
| Lorenz | RNN | CDSC | 0.0190 [0.0437] | 0.0131 [0.0379] | +0.0059 [−0.0274, 0.0392] | ✗ |
| Lorenz | RNN | SVCCA | 0.0190 [0.0437] | 0.0086 [0.0209] | +0.0105 [−0.0174, 0.0384] | ✗ |
| Lorenz | BiRNN | Patience | 0.0074 [0.0068] | 0.0068 [0.0043] | +6.590 × 10−4 [−0.0040, 0.0053] | ✗ |
| Lorenz | BiRNN | Slope | 0.0074 [0.0068] | 0.0048 [0.0079] | +0.0027 [−0.0033, 0.0087] | ✗ |
| Lorenz | BiRNN | CDSC | 0.0074 [0.0068] | 0.0046 [0.0152] | +0.0029 [−0.0067, 0.0124] | ✗ |
| Lorenz | BiRNN | SVCCA | 0.0074 [0.0068] | 0.0023 [0.0020] | +0.0051 [0.0011, 0.0092] | ✗ |
| Lorenz | Transformer | Patience | 0.0335 [0.0544] | 0.0224 [0.0206] | +0.0111 [−0.0224, 0.0446] | ✗ |
| Lorenz | Transformer | Slope | 0.0335 [0.0544] | 0.0036 [0.0062] | +0.0300 [−0.0015, 0.0614] | ✗ |
| Lorenz | Transformer | CDSC | 0.0335 [0.0544] | 0.0164 [0.0326] | +0.0171 [−0.0194, 0.0536] | ✗ |
| Lorenz | Transformer | SVCCA | 0.0335 [0.0544] | 0.0153 [0.0241] | +0.0182 [−0.0160, 0.0525] | ✗ |
| BTC15m | RNN | Patience | 1.309 × 10−6 [6.122 × 10−7] | 6.883 × 10−7 [9.384 × 10−7] | +6.207 × 10−7 [−2.449 × 10−8, 1.266 × 10−6] | ✗ |
| BTC15m | RNN | Slope | 1.309 × 10−6 [6.122 × 10−7] | 1.126 × 10−6 [7.583 × 10−7] | +1.830 × 10−7 [−3.782 × 10−7, 7.442 × 10−7] | ✗ |
| BTC15m | RNN | CDSC | 1.309 × 10−6 [6.122 × 10−7] | 1.319 × 10−6 [7.794 × 10−7] | −1.000 × 10−8 [−5.807 × 10−7, 5.607 × 10−7] | ✓ |
| BTC15m | RNN | SVCCA | 1.309 × 10−6 [6.122 × 10−7] | 9.804 × 10−8 [1.032 × 10−7] | +1.211 × 10−6 [8.535 × 10−7, 1.568 × 10−6] | ✗ |
| BTC15m | BiRNN | Patience | 9.950 × 10−7 [9.029 × 10−7] | 6.209 × 10−7 [9.143 × 10−7] | +3.741 × 10−7 [−3.658 × 10−7, 1.114 × 10−6] | ✗ |
| BTC15m | BiRNN | Slope | 9.950 × 10−7 [9.029 × 10−7] | 1.840 × 10−6 [1.800 × 10−6] | −8.450 × 10−7 [−2.005 × 10−6, 3.146 × 10−7] | ✓ |
| BTC15m | BiRNN | CDSC | 9.950 × 10−7 [9.029 × 10−7] | 1.364 × 10−6 [1.197 × 10−6] | −3.690 × 10−7 [−1.232 × 10−6, 4.944 × 10−7] | ✓ |
| BTC15m | BiRNN | SVCCA | 9.950 × 10−7 [9.029 × 10−7] | 3.207 × 10−7 [5.915 × 10−7] | +6.743 × 10−7 [5.274 × 10−8, 1.296 × 10−6] | ✗ |
| BTC15m | Transformer | Patience | 6.645 × 10−6 [9.574 × 10−6] | 4.485 × 10−6 [7.079 × 10−6] | +2.160 × 10−6 [−4.696 × 10−6, 9.016 × 10−6] | ✗ |
| BTC15m | Transformer | Slope | 6.645 × 10−6 [9.574 × 10−6] | 1.139 × 10−5 [1.774 × 10−5] | −4.745 × 10−6 [−1.635 × 10−5, 6.863 × 10−6] | ✓ |
| BTC15m | Transformer | CDSC | 6.645 × 10−6 [9.574 × 10−6] | 1.139 × 10−5 [1.845 × 10−5] | −4.745 × 10−6 [−1.671 × 10−5, 7.224 × 10−6] | ✓ |
| BTC15m | Transformer | SVCCA | 6.645 × 10−6 [9.574 × 10−6] | 7.171 × 10−6 [3.941 × 10−6] | −5.260 × 10−7 [−6.488 × 10−6, 5.436 × 10−6] | ✓ |
| EEG | RNN | Patience | 0.0116 [0.0150] | 0.0122 [0.0180] | −5.960 × 10−4 [−0.0141, 0.0129] | ✓ |
| EEG | RNN | Slope | 0.0116 [0.0150] | 0.0115 [0.0127] | +1.430 × 10−4 [−0.0112, 0.0115] | ✗ |
| EEG | RNN | CDSC | 0.0116 [0.0150] | 0.0077 [0.0131] | +0.0039 [−0.0076, 0.0154] | ✗ |
| EEG | RNN | SVCCA | 0.0116 [0.0150] | 0.0422 [0.0312] | −0.0306 [−0.0506, −0.0107] | ✓ |
| EEG | BiRNN | Patience | 0.0140 [0.0256] | 0.0145 [0.0416] | −4.550 × 10−4 [−0.0286, 0.0277] | ✓ |
| EEG | BiRNN | Slope | 0.0140 [0.0256] | 0.0132 [0.0191] | +8.510 × 10−4 [−0.0175, 0.0192] | ✗ |
| EEG | BiRNN | CDSC | 0.0140 [0.0256] | 0.0127 [0.0114] | +0.0014 [−0.0148, 0.0175] | ✗ |
| EEG | BiRNN | SVCCA | 0.0140 [0.0256] | 0.1173 [0.0589] | −0.1033 [−0.1403, −0.0663] | ✓ |
| EEG | Transformer | Patience | 0.0216 [0.0196] | 0.0162 [0.0275] | +0.0053 [−0.0141, 0.0248] | ✗ |
| EEG | Transformer | Slope | 0.0216 [0.0196] | 0.0273 [0.0247] | −0.0057 [−0.0238, 0.0125] | ✓ |
| EEG | Transformer | CDSC | 0.0216 [0.0196] | 0.0210 [0.0167] | +6.060 × 10−4 [−0.0142, 0.0154] | ✗ |
| EEG | Transformer | SVCCA | 0.0216 [0.0196] | 0.0206 [0.0175] | +9.930 × 10−4 [−0.0141, 0.0161] | ✗ |
| Dataset | Model | Baseline | SES e_stop, med [IQR] | Baseline e_stop, med [IQR] | Δmed (SES−Base) [95% CI] | Earlier? |
|---|---|---|---|---|---|---|
| ETTh1 | RNN | Patience | 7.5 [1.0] | 23.5 [9.0] | −16.0 [−21.2, −10.8] | ✓ |
| ETTh1 | RNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTh1 | RNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh1 | RNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTh1 | BiRNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh1 | BiRNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTh1 | BiRNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh1 | BiRNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTh1 | Transformer | Patience | 7.0 [0] | 19.0 [23.0] | −12.0 [−25.2, 1.2] | ✓ |
| ETTh1 | Transformer | Slope | 7.0 [0] | 5.0 [1.0] | +2.0 [1.4, 2.6] | ✗ |
| ETTh1 | Transformer | CDSC | 7.0 [0] | 10.0 [0] | −3.0 [−3.0, −3.0] | ✓ |
| ETTh1 | Transformer | SVCCA | 7.0 [0] | 100.0 [0] | −93.0 [−93.0, −93.0] | ✓ |
| ETTh2 | RNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh2 | RNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTh2 | RNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh2 | RNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTh2 | BiRNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh2 | BiRNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTh2 | BiRNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTh2 | BiRNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTh2 | Transformer | Patience | 7.0 [0] | 14.5 [26.5] | −7.5 [−22.8, 7.8] | ✓ |
| ETTh2 | Transformer | Slope | 7.0 [0] | 5.0 [1.0] | +2.0 [1.4, 2.6] | ✗ |
| ETTh2 | Transformer | CDSC | 7.0 [0] | 10.0 [0] | −3.0 [−3.0, −3.0] | ✓ |
| ETTh2 | Transformer | SVCCA | 7.0 [0] | 100.0 [0] | −93.0 [−93.0, −93.0] | ✓ |
| ETTm1 | RNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm1 | RNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTm1 | RNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm1 | RNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTm1 | BiRNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm1 | BiRNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTm1 | BiRNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm1 | BiRNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTm1 | Transformer | Patience | 7.0 [0] | 14.5 [26.5] | −7.5 [−22.8, 7.8] | ✓ |
| ETTm1 | Transformer | Slope | 7.0 0] | 5.0 [1.0] | +2.0 [1.4, 2.6] | ✗ |
| ETTm1 | Transformer | CDSC | 7.0 [0] | 10.0 [0] | −3.0 [−3.0, −3.0] | ✓ |
| ETTm1 | Transformer | SVCCA | 7.0 [0] | 100.0 [0] | −93.0 [−93.0, −93.0] | ✓ |
| ETTm2 | RNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm2 | RNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTm2 | RNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm2 | RNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTm2 | BiRNN | Patience | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm2 | BiRNN | Slope | 7.5 [1.0] | 5.0 [0] | +2.5 [1.9, 3.1] | ✗ |
| ETTm2 | BiRNN | CDSC | 7.5 [1.0] | 10.0 [0] | −2.5 [−3.1, −1.9] | ✓ |
| ETTm2 | BiRNN | SVCCA | 7.5 [1.0] | 100.0 [0] | −92.5 [−93.1, −91.9] | ✓ |
| ETTm2 | Transformer | Patience | 7.0 [0] | 14.5 [26.5] | −7.5 [−22.8, 7.8] | ✓ |
| ETTm2 | Transformer | Slope | 7.0 [0] | 5.0 [1.0] | +2.0 [1.4, 2.6] | ✗ |
| ETTm2 | Transformer | CDSC | 7.0 [0] | 10.0 [0] | −3.0 [−3.0, −3.0] | ✓ |
| ETTm2 | Transformer | SVCCA | 7.0 [0] | 100.0 [0] | −93.0 [−93.0, −93.0] | ✓ |
| AirPassengers | RNN | Patience | 100.0 [0] | 100.0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | RNN | Slope | 100.0 [0] | 100.0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | RNN | CDSC | 100.0 [0] | 38.0 [32.2] | +62.0 [43.5, 80.5] | ✗ |
| AirPassengers | RNN | SVCCA | 100.0 [0] | 6.0 [0] | +94.0 [94.0, 94.0] | ✗ |
| AirPassengers | BiRNN | Patience | 100.0 [0] | 100.0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | BiRNN | Slope | 100.0 [0] | 100.0 [0] | 0 [0, 0] | ≈ |
| AirPassengers | BiRNN | CDSC | 100.0 [0] | 97.0 [10.2] | +3.0 [−2.9, 8.9] | ✗ |
| AirPassengers | BiRNN | SVCCA | 100.0 [0] | 6.0 [0] | +94.0 [94.0, 94.0] | ✗ |
| AirPassengers | Transformer | Patience | 7.0 [0] | 8.0 [15.0] | −1.0 [−9.6, 7.6] | ✓ |
| AirPassengers | Transformer | Slope | 7.0 [0] | 100.0 [0] | −93.0 [−93.0, −93.0] | ✓ |
| AirPassengers | Transformer | CDSC | 7.0 [0] | 20.5 [4.8] | −13.5 [−16.3, −10.7] | ✓ |
| AirPassengers | Transformer | SVCCA | 7.0 [0] | 6.0 [0] | +1.0 [1.0, 1.0] | ✗ |
| Lorenz | RNN | Patience | 7.0 [0] | 11.5 [2.8] | −4.5 [−6.1, −2.9] | ✓ |
| Lorenz | RNN | Slope | 7.0 [0] | 39.5 [22.2] | −32.5 [−45.3, −19.7] | ✓ |
| Lorenz | RNN | CDSC | 7.0 [0] | 11.5 [4.8] | −4.5 [−7.3, −1.7] | ✓ |
| Lorenz | RNN | SVCCA | 7.0 [0] | 6.0 [0] | +1.0 [1.0, 1.0] | ✗ |
| Lorenz | BiRNN | Patience | 7.0 [0] | 11.5 [4.0] | −4.5 [−6.8, −2.2] | ✓ |
| Lorenz | BiRNN | Slope | 7.0 [0] | 20.5 [8.5] | −13.5 [−18.4, −8.6] | ✓ |
| Lorenz | BiRNN | CDSC | 7.0 [0] | 13.0 [2.8] | −6.0 [−7.6, −4.4] | ✓ |
| Lorenz | BiRNN | SVCCA | 7.0 [0] | 6.0 [0] | +1.0 [1.0, 1.0] | ✗ |
| Lorenz | Transformer | Patience | 7.0 [0] | 11.5 [4.0] | −4.5 [−6.8, −2.2] | ✓ |
| Lorenz | Transformer | Slope | 7.0 [0] | 31.0 [14.5] | −24.0 [−32.3, −15.7] | ✓ |
| Lorenz | Transformer | CDSC | 7.0 [0] | 12.0 [2.0] | −5.0 [−6.2, −3.8] | ✓ |
| Lorenz | Transformer | SVCCA | 7.0 [0] | 6.0 [0] | +1.0 [1.0, 1.0] | ✗ |
| BTC15m | RNN | Patience | 6.0 [1.0] | 6.0 [0] | 0 [−0.6, 0.6] | ≈ |
| BTC15m | RNN | Slope | 6.0 [1.0] | 5.0 [0] | +1.0 [0.4, 1.6] | ✗ |
| BTC15m | RNN | CDSC | 6.0 [1.0] | 5.0 [1.0] | +1.0 [0.2, 1.8] | ✗ |
| BTC15m | RNN | SVCCA | 6.0 [1.0] | 100.0 [0] | −94.0 [−94.6, −93.4] | ✓ |
| BTC15m | BiRNN | Patience | 7.0 [1.0] | 6.0 [0] | +1.0 [0.4, 1.6] | ✗ |
| BTC15m | BiRNN | Slope | 7.0 [1.0] | 5.0 [0] | +2.0 [1.4, 2.6] | ✗ |
| BTC15m | BiRNN | CDSC | 7.0 [1.0] | 5.0 [0] | +2.0 [1.4, 2.6] | ✗ |
| BTC15m | BiRNN | SVCCA | 7.0 [1.0] | 100.0 [0] | −93.0 [−93.6, −92.4] | ✓ |
| BTC15m | Transformer | Patience | 6.0 [1.0] | 6.0 [0] | 0 [−0.6, 0.6] | ≈ |
| BTC15m | Transformer | Slope | 6.0 [1.0] | 5.0 [0] | +1.0 [0.4, 1.6] | ✗ |
| BTC15m | Transformer | CDSC | 6.0 [1.0] | 5.0 [0.8] | +1.0 [0.3, 1.7] | ✗ |
| BTC15m | Transformer | SVCCA | 6.0 [1.0] | 100.0 [0] | −94.0 [−94.6, −93.4] | ✓ |
| EEG | RNN | Patience | 7.5 [2.8] | 7.5 [4.8] | 0 [−3.2, 3.2] | ≈ |
| EEG | RNN | Slope | 7.5 [2.8] | 5.0 [1.8] | +2.5 [0.6, 4.4] | ✗ |
| EEG | RNN | CDSC | 7.5 [2.8] | 10.0 [0] | −2.5 [−4.1, −0.9] | ✓ |
| EEG | RNN | SVCCA | 7.5 [2.8] | 100.0 [0] | −92.5 [−94.1, −90.9] | ✓ |
| EEG | BiRNN | Patience | 6.5 [2.5] | 7.5 [4.0] | −1.0 [−3.7, 1.7] | ✓ |
| EEG | BiRNN | Slope | 6.5 [2.5] | 5.0 [0.8] | +1.5 [−0.0, 3.0] | ✗ |
| EEG | BiRNN | CDSC | 6.5 [2.5] | 10.0 [0] | −3.5 [−4.9, −2.1] | ✓ |
| EEG | BiRNN | SVCCA | 6.5 [2.5] | 100.0 [0] | −93.5 [−94.9, −92.1] | ✓ |
| EEG | Transformer | Patience | 7.5 [4.8] | 7.5 [6.8] | 0 [−4.8, 4.8] | ≈ |
| EEG | Transformer | Slope | 7.5 [4.8] | 5.5 [1.0] | +2.0 [−0.8, 4.8] | ✗ |
| EEG | Transformer | CDSC | 7.5 [4.8] | 10.0 [0.8] | −2.5 [−5.3, 0.3] | ✓ |
| EEG | Transformer | SVCCA | 7.5 [4.8] | 5.0 [71.2] | +2.5 [−38.6, 43.6] | ✗ |
| Comparison | Metric | SES wins | Ties | Losses | Total | p-Value (One-Sided Sign Test) |
|---|---|---|---|---|---|---|
| SES vs. Patience | ΔBest (↓) | 2 | 2 | 20 | 24 | 1.0000 |
| SES vs. Patience | e_stop (↓) | 17 | 6 | 1 | 24 | <0.0001 |
| SES vs. Patience | epochs_saved (↑) | 17 | 6 | 1 | 24 | <0.0001 |
| SES vs. Slope | ΔBest (↓) | 15 | 2 | 7 | 24 | 0.0669 |
| SES vs. Slope | e_stop (↓) | 4 | 2 | 18 | 24 | 0.9996 |
| SES vs. Slope | epochs_saved (↑) | 4 | 2 | 18 | 24 | 0.9996 |
| SES vs. CDSC | ΔBest (↓) | 6 | 0 | 18 | 24 | 0.9967 |
| SES vs. CDSC | e_stop (↓) | 19 | 0 | 5 | 24 | 0.0033 |
| SES vs. CDSC | epochs_saved (↑) | 19 | 0 | 5 | 24 | 0.0033 |
| SES vs. SVCCA | ΔBest (↓) | 5 | 0 | 19 | 24 | 0.9992 |
| SES vs. SVCCA | e_stop (↓) | 17 | 0 | 7 | 24 | 0.0320 |
| SES vs. SVCCA | epochs_saved (↑) | 17 | 0 | 7 | 24 | 0.0320 |
Appendix G. Aggregation-Rule Ablation on the Development Benchmark
| Aggregator | med e_stop | IQR e_stop | med ΔBest | IQR ΔBest | med eps_saved | fail_rate |
|---|---|---|---|---|---|---|
| mean-rank | 28.0 | 13.8 | 0.0024 | 0.1441 | 72.0 | 0.33 |
| median-rank | 26.5 | 14.0 | 0.0033 | 0.1821 | 73.5 | 0.33 |
| top-50% | 24.5 | 12.5 | 0.0025 | 0.3436 | 75.5 | 0.33 |
| top-30% | 24.5 | 12.5 | 0.0025 | 0.3436 | 75.5 | 0.33 |
| weighted (1/var) | 22.5 | 17.2 | 0.0022 | 0.2905 | 77.5 | 0.33 |
| Aggregator | med e_stop | IQR e_stop | med ΔBest | IQR ΔBest | med eps_saved | fail_rate | Model |
|---|---|---|---|---|---|---|---|
| mean-rank | 24.5 | 8.5 | 0.0018 | 0.0006 | 75.5 | 0.00 | RNN |
| median-rank | 20.5 | 7.8 | 0.0026 | 0.0018 | 79.5 | 0.00 | RNN |
| top-50% ‡ | 22.5 | 11.0 | 0.0020 | 0.0006 | 77.5 | 0.00 | RNN |
| top-30% ‡ | 22.5 | 11.0 | 0.0020 | 0.0006 | 77.5 | 0.00 | RNN |
| weighted (1/var) | 22.5 | 19.2 | 0.0020 | 0.0014 | 77.5 | 0.00 | RNN |
| mean-rank | 30.0 | 17.0 | 0.0013 | 0.0020 | 70.0 | 0.00 | BiRNN |
| median-rank | 31.5 | 13.8 | 0.0017 | 0.0011 | 68.5 | 0.00 | BiRNN |
| top-50% | 30.0 | 18.8 | 0.0014 | 0.0017 | 70.0 | 0.00 | BiRNN |
| top-30% | 30.0 | 18.8 | 0.0014 | 0.0017 | 70.0 | 0.00 | BiRNN |
| weighted (1/var) | 20.0 | 14.5 | 0.0015 | 0.0010 | 80.0 | 0.00 | BiRNN |
| mean-rank | 31.0 | 38.0 | 0.2860 | 0.2249 | 69.0 | 1.00 | Transformer |
| median-rank | 30.5 | 17.2 | 0.3275 | 0.2259 | 69.5 | 1.00 | Transformer |
| top-50% | 21.0 | 11.5 | 0.3735 | 0.1233 | 79.0 | 1.00 | Transformer |
| top-30% | 21.0 | 11.5 | 0.3735 | 0.1233 | 79.0 | 1.00 | Transformer |
| weighted (1/var) | 25.5 | 17.5 | 0.3086 | 0.0996 | 74.5 | 1.00 | Transformer |
References
- Caruana, R.; Lawrence, S.; Giles, C. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13; MIT Press: Cambridge, MA, USA, 2001; pp. 402–408. [Google Scholar]
- Prechelt, L. Early stopping—But when? In Neural Networks: Tricks of the Trade, 2nd ed.; Montavon, G., Orr, G.B., Müller, K.-R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 53–67. [Google Scholar]
- Raskutti, G.; Wainwright, M.J.; Yu, B. Early stopping and non-parametric regression: An optimal data-dependent stopping rule. J. Mach. Learn. Res. 2014, 15, 335–366. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. 1310–1318. [Google Scholar]
- Kajitsuka, T.; Sato, I. On the optimal memorization capacity of transformers. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 24–28 April 2025. [Google Scholar]
- Dana, L.; Pydi, M.S.; Chevaleyre, Y. Memorization in attention-only transformers. In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS 2025), Phuket, Thailand, 3–5 May 2025; Volume 258, pp. 3133–3141. [Google Scholar]
- Sussillo, D.; Barak, O. Opening the black box: Low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 2013, 25, 626–649. [Google Scholar] [CrossRef] [PubMed]
- Maheswaranathan, N.; Williams, A.; Golub, M.; Ganguli, S.; Sussillo, D. Universality and individuality in neural dynamics across large populations of recurrent networks. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Dao, T.; Gu, A. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Kim, C.M.; Chow, C.C. Learning recurrent dynamics in spiking networks. eLife 2018, 7, e37124. [Google Scholar] [CrossRef]
- Mastrogiuseppe, F.; Carmona, J.; Machens, C.K. Stochastic activity in low-rank recurrent neural networks. PLoS Comput. Biol. 2025, 21, e1013371. [Google Scholar] [CrossRef]
- Rieck, B.; Togninalli, M.; Bock, C.; Moor, M.; Horn, M.; Gumbsch, T.; Borgwardt, K. Neural persistence: A complexity measure for deep neural networks using algebraic topology. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Gutiérrez-Fandiño, A.; Pérez-Fernández, D.; Armengol-Estapé, J.; Villegas, M. Persistent homology captures the generalization of neural networks without a validation set. arXiv 2021, arXiv:2106.00012. [Google Scholar] [CrossRef]
- Zhang, B.; Lin, H. Functional loops: Monitoring functional organization of deep neural networks using algebraic topology. Neural Netw. 2024, 174, 106239. [Google Scholar] [CrossRef] [PubMed]
- Zia, A.; Khamis, A.; Nichols, J.; Hayder, Z.; Rolland, V.; Petersson, L. Topological deep learning: A review of an emerging paradigm. Artif. Intell. Rev. 2024, 57, 77. [Google Scholar] [CrossRef]
- Damrich, S.; Berens, P.; Kobak, D. Persistent homology for high-dimensional data based on spectral methods. In Proceedings of the Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Singh, G.; Mémoli, F.; Carlsson, G. Topological methods for the analysis of high dimensional data sets and 3D object recognition. In Eurographics Symposium on Point-Based Graphics; Botsch, M., Pajarola, R., Chen, B., Zwicker, M., Eds.; The Eurographics Association: Geneva, Switzerland, 2007; pp. 91–100. [Google Scholar] [CrossRef]
- Madukpe, V.N.; Ugoala, B.C.; Zulkepli, N.F.S. A comprehensive review of the Mapper algorithm, a topological data analysis technique, and its applications across various fields (2007–2025). arXiv 2025, arXiv:2504.09042. [Google Scholar] [CrossRef]
- Haşegan, D.; Patel, S.; Sahoo, A.; Saggar, M. Deconstructing the Mapper algorithm to extract richer topological and temporal features from functional neuroimaging data. Netw. Neurosci. 2024, 8, 1355–1382. [Google Scholar] [CrossRef] [PubMed]
- Simpson, S.G. Symbolic dynamics: Entropy = dimension = complexity. Theory Comput. Syst. 2015, 56, 527–543. [Google Scholar] [CrossRef]
- Hirata, Y.; Amigó, J.M. A review of symbolic dynamics and symbolic reconstruction of dynamical systems. Chaos 2023, 33, 052101. [Google Scholar] [CrossRef]
- Ren, Q.; Zhang, J.; Xu, Y.; Wang, Y.; Yu, Y.; Zhang, Q. Towards the dynamics of a DNN learning symbolic interactions. In Proceedings of the Advances in Neural Information Processing Systems 38 (NeurIPS 2024), Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Das, J.; Bhaumik, B.; De, S.; Mitra, A. Physics-informed neural network with symbolic regression for deriving analytical approximate solutions to nonlinear partial differential equations. Neural Comput. Appl. 2025, 37, 20205–20240. [Google Scholar] [CrossRef]
- Raghu, M.; Gilmer, J.; Yosinski, J.; Sohl-Dickstein, J. SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Ferro, M.V.; Mosquera, Y.D.; Pena, F.J.R.; Bilbao, V.M.D. Early stopping by correlating online indicators in neural networks. Neural Netw. 2023, 159, 109–124. [Google Scholar] [CrossRef]
- Nakkiran, P.; Kaplun, G.; Bansal, Y.; Yang, T.; Barak, B.; Sutskever, I. Deep double descent: Where bigger models and more data hurt. arXiv 2019, arXiv:1912.02292. [Google Scholar] [CrossRef]
- Xia, X.; Liu, T.; Han, B.; Gong, C.; Wang, N.; Ge, Z.; Chang, Y. Robust early-learning: Hindering the memorization of noisy labels. In Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, 3–7 May 2021. [Google Scholar]
- Hensel, F.; Moor, M.; Rieck, B. A survey of topological machine learning methods. Front. Artif. Intell. 2021, 4, 681108. [Google Scholar] [CrossRef] [PubMed]
- Naitzat, G.; Zhitnikov, A.; Lim, L.-H. Topology of deep neural networks. J. Mach. Learn. Res. 2020, 21, 1–40. [Google Scholar]
- Mediano, P.A.M.; Rosas, F.E.; Bor, D.; Seth, A.K.; Barrett, A.B. Spectrally and temporally resolved estimation of neural signal diversity. bioRxiv 2023. [Google Scholar] [CrossRef]
- Hussein, B.M.; Shareef, M.S. An empirical study on the correlation between early stopping patience and epochs in deep learning. ITM Web Conf. 2024, 64, 01003. [Google Scholar] [CrossRef]
- Hu, T.; Lei, Y. Early stopping for iterative regularization with general loss functions. J. Mach. Learn. Res. 2022, 23, 1–36. [Google Scholar]
- Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
- Welch, T.A. A technique for high-performance data compression. Computer 1984, 17, 8–19. [Google Scholar] [CrossRef]
- Höhn, C.; Hahn, M.A.; Lendner, J.D.; Hoedlmoser, K. Spectral slope and Lempel–Ziv complexity as robust markers of brain states during sleep and wakefulness. eNeuro 2024, 11. [Google Scholar] [CrossRef]
- Dingle, K.; Hamzi, B.; Hutter, M.; Owhadi, H. Retrodicting chaotic systems: An algorithmic information theory approach. arXiv 2025, arXiv:2507.04780. [Google Scholar] [CrossRef]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. In Proceedings of the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017. [Google Scholar]
- Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 3rd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 1994. [Google Scholar]
- McNally, S.; Roche, J.; Caton, S. Predicting the price of Bitcoin using machine learning. In Proceedings of the 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2018); IEEE: Piscataway, NJ, USA, 2018; pp. 339–343. [Google Scholar] [CrossRef]
- Koelstra, S.; Mühl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. DEAP: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 2012, 3, 18–31. [Google Scholar] [CrossRef]
- Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Miseta, T.; Fodor, A.; Vathy-Fogarassy, Á. Surpassing early stopping: A novel correlation-based stopping criterion for neural networks. Neurocomputing 2024, 567, 127028. [Google Scholar] [CrossRef]















| Model | Base epoch, s | Repr epoch, s | Overhead, % | Full Training, s | Patience (s/%/ΔBest) | Slope (s/%/ΔBest) | CDSC (s/%/ΔBest) | SES (s/%/ΔBest) | SVCCA (s/%/ΔBest) |
|---|---|---|---|---|---|---|---|---|---|
| RNN | 3.37 [0.18] | 3.51 [0.18] | 3.95 [0.04] | 337.4 [17.6] | 126.3/64.0/0.0006 | 111.9/65.5/0.0011 | 14.2/95.8/0.0115 | 116.4/64.9/0.0006 | 351.4/−4.1/0.0035 |
| BiRNN | 5.64 [0.03] | 5.87 [0.04] | 4.07 [0.02] | 563.8 [3.4] | 219.5/61.0/0.0013 | 158.0/72.0/0.0010 | 23.6/95.8/0.0085 | 177.9/69.9/0.0007 | 588.3/−4.4/0.0062 |
| Transformer | 33.25 [0.46] | 33.57 [0.46] | 0.98 [0.01] | 3324.7 [45.5] | 400.8/88.0/0.3662 | 400.0/88.0/0.3802 | 134.5/96.0/0.0682 | 1195.7/64.5/0.2052 | 1855.4/44.4/0.2120 |
| cfg_id (short) | bins | overlap | local_k | merge_eps | win_rate ↑ | median_rel_regret ↓ | median_saved_frac ↑ | score ↓ |
|---|---|---|---|---|---|---|---|---|
| 9392dc | 8 | 0.30 | 10 | 0.50 | 0.200 | 0.467 | 0.510 | 0.990 |
| dc1b40 | 8 | 0.20 | 10 | 0.00 | 0.133 | 0.524 | 0.425 | 1.084 |
| 2210ff | 8 | 0.40 | 10 | 1.00 | 0.100 | 0.986 | 0.000 | 1.297 |
| 09256c | 6 | 0.40 | 5 | 0.75 | 0.067 | 1.016 | 0.000 | 1.337 |
| Model | N | SES e_stop (Median [IQR]) | SES ΔBest (Median [IQR]) | SES epochs_saved (Median [IQR]) | Best SOTA | SOTA e_stop (Median [IQR]) | SOTA ΔBest (Median [IQR]) | SOTA epochs_saved (Median [IQR]) |
|---|---|---|---|---|---|---|---|---|
| RNN | 10 | 18 [1.00] | 0.002912 [0.003918] | 82 [1.00] | Patience | 36 [6.75] | 0.000576 [0.000771] | 64 [6.75] |
| BiRNN | 10 | 17 [1.50] | 0.002172 [0.001218] | 83 [1.50] | Slope | 28 [11.25] | 0.001017 [0.001065] | 72 [11.25] |
| Transformer | 10 | 15 [0.00] | 0.000207 [0.000331] | 85 [0.00] | Patience | 15 [1.50] | 0.000221 [0.000323] | 85 [1.50] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tomilov, I.; Zamotaev, R.; Gusarova, N.; Vatian, A. Symbolic Early Stopping in Neural Sequence Models via Mapper-Induced Symbolic Dynamics. Technologies 2026, 14, 339. https://doi.org/10.3390/technologies14060339
Tomilov I, Zamotaev R, Gusarova N, Vatian A. Symbolic Early Stopping in Neural Sequence Models via Mapper-Induced Symbolic Dynamics. Technologies. 2026; 14(6):339. https://doi.org/10.3390/technologies14060339
Chicago/Turabian StyleTomilov, Ivan, Rodion Zamotaev, Natalia Gusarova, and Aleksandra Vatian. 2026. "Symbolic Early Stopping in Neural Sequence Models via Mapper-Induced Symbolic Dynamics" Technologies 14, no. 6: 339. https://doi.org/10.3390/technologies14060339
APA StyleTomilov, I., Zamotaev, R., Gusarova, N., & Vatian, A. (2026). Symbolic Early Stopping in Neural Sequence Models via Mapper-Induced Symbolic Dynamics. Technologies, 14(6), 339. https://doi.org/10.3390/technologies14060339
