Efficient Parameter Search for Chaotic Dynamical Systems Using Lyapunov-Based Reinforcement Learning
Abstract
1. Introduction
- (i)
- A unified reinforcement learning-based framework for discovering chaotic parameters in multiple nonlinear dynamical systems;
- (ii)
- The integration of Lyapunov exponents as continuous reward signals for training reinforcement learning agents;
- (iii)
- An empirical comparison between Q-learning and SARSA in terms of convergence, reward profiles, and attractor richness across various systems.
2. Reinforcement Learning for Chaotic Parameter Exploration
2.1. Formal Problem Definition
2.2. Previous RL Applications in Chaotic Systems
2.3. SARSA Algorithm
2.4. Q-Learning Algorithm
2.5. Reward Function Design
2.6. Advantages over Traditional Numerical Analysis
- An adaptive exploration of parameter regions with higher chaotic potential based on experience.
- The reuse of evaluated configurations to inform future decisions and reduce redundancy.
- Scalability to high-dimensional systems.
- Autonomous operation without reliance on heuristic or manual tuning strategies.
3. Methodology
3.1. Chaotic System Models
- Logistic Map (1D) [37]: A discrete-time model that serves as a canonical example of how complex, chaotic behavior can arise from a simple nonlinear iterative equation. It is defined bywhere and r is the bifurcation parameter.
- Hénon Map (2D) [38]: A classic 2D discrete-time dynamical system that is well-known for exhibiting a strange attractor. Its equation is given bywith control parameters a and b.
- Lorenz System (3D) [39]: A simplified mathematical model for atmospheric convection, notable for its “butterfly” strange attractor, which was one of the first systems to demonstrate deterministic chaos. It is defined bywhere , , and are adjustable parameters.
- Chua’s Circuit (3D) [40]: A simple electronic circuit that exhibits a wide range of nonlinear dynamics, including bifurcations and chaos, making it a standard benchmark. Its dynamics are governed bywith a piecewise linear function modeling the nonlinear resistor.
- Lorenz–Haken System (4D) [41]: A model that describes the dynamics of a single-mode laser, often used to study instabilities and chaos in optical systems. Its equations are given bywhere , , r, and b are system parameters.
- Custom 5D Hyperchaotic System: A high-dimensional system designed for this study to test scalability and performance in a hyperchaotic regime. It is defined bywhich is constructed by embedding a Lorenz-like core into a 5D structure with nonlinear interactions and auxiliary feedback states. This model yields multiple positive Lyapunov exponents and high-dimensional attractors.
3.2. Numerical Integration
3.3. Search Space and Training Settings
4. Experimental Results and Analysis
4.1. Quantitative Baseline Comparison
4.2. Overview of Evaluated Systems
4.3. Training Performance and Reward Convergence
4.4. Attractor Reconstruction and Chaotic Behavior
5. Discussion
5.1. Summary of Experimental Findings
5.2. Comparison with Related Work and Practical Implications
5.3. Analysis of the Chua’s Circuit Case: Algorithmic Limitations and Landscape Topology
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kapitaniak, T. Chaos for Engineers: Theory, Applications, and Control; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000; Volume 3. [Google Scholar]
- Zhang, B.; Liu, L. Chaos-based image encryption: Review, application, and challenges. Mathematics 2023, 11, 2585. [Google Scholar] [CrossRef]
- Shi, L.; Li, X.; Jin, B.; Li, Y. A chaos-based encryption algorithm to protect the security of digital artwork images. Mathematics 2024, 12, 3162. [Google Scholar] [CrossRef]
- El-Latif, A.A.A.; Ramadoss, J.; Abd-El-Atty, B.; Khalifa, H.S.; Nazarimehr, F. A novel chaos-based cryptography algorithm and its performance analysis. Mathematics 2022, 10, 2434. [Google Scholar] [CrossRef]
- Al-Daraiseh, A.; Sanjalawe, Y.; Al-E’mari, S.; Fraihat, S.; Bany Taha, M.; Al-Muhammed, M. Cryptographic grade chaotic random number generator based on tent-map. J. Sens. Actuator Netw. 2023, 12, 73. [Google Scholar] [CrossRef]
- Chen, T.; Guo, H.; Li, G.; Ji, H.; Xie, L.; Yang, Y. Chaotic Mixing Analyzing in Continuous Mixer with Tracing the Morphology Development of a Polymeric Drop. Processes 2020, 8, 1308. [Google Scholar] [CrossRef]
- Azimi, S.; Ashtari, O.; Schneider, T.M. Constructing periodic orbits of high-dimensional chaotic systems by an adjoint-based variational method. Phys. Rev. E 2022, 105, 014217. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. (JMLR) 2012, 13, 281–305. [Google Scholar]
- Wu, Q.; Pugh, A. Reinforcement learning control of unknown dynamic systems. In IEE Proceedings D (Control Theory and Applications); IET: Stevenage, UK, 1993; Volume 140, pp. 313–322. [Google Scholar]
- Sahoo, S.; Roy, B.K. Design of multi-wing chaotic systems with higher largest Lyapunov exponent. Chaos Solitons Fractals 2022, 157, 111926. [Google Scholar] [CrossRef]
- Oh, S.H.; Yoon, Y.T.; Kim, S.W. Online reconfiguration scheme of self-sufficient distribution network based on a reinforcement learning approach. Appl. Energy 2020, 280, 115900. [Google Scholar] [CrossRef]
- El Wafi, M.; Youssefi, M.A.; Dakir, R.; Bakir, M. Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning. Automation 2025, 6, 12. [Google Scholar] [CrossRef]
- Wang, Z.; Zhou, Q.; Song, Y.; Zhang, J.; Wang, J. Research on the Choice of Strategy for Connecting Online Ride-Hailing to Rail Transit Based on GQL Algorithm. Electronics 2025, 14, 3199. [Google Scholar] [CrossRef]
- Jin, Z.; Ma, M.; Zhang, S.; Hu, Y.; Zhang, Y.; Sun, C. Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs. SARSA. Electronics 2022, 11, 3161. [Google Scholar] [CrossRef]
- Mehdizadeh, S. The largest Lyapunov exponent of gait in young and elderly individuals: A systematic review. Gait Posture 2018, 60, 241–250. [Google Scholar] [CrossRef]
- Meyn, S. The projected Bellman equation in reinforcement learning. IEEE Trans. Autom. Control 2024, 69, 8323–8337. [Google Scholar] [CrossRef]
- Peng, S. Stochastic hamilton–jacobi–bellman equations. SIAM J. Control Optim. 1992, 30, 284–304. [Google Scholar] [CrossRef]
- Li, D.J.; Tang, L.; Liu, Y.J. Adaptive intelligence learning for nonlinear chaotic systems. Nonlinear Dyn. 2013, 73, 2103–2109. [Google Scholar] [CrossRef]
- Han, Y.; Ding, J.; Du, L.; Lei, Y. Control and anti-control of chaos based on the moving largest Lyapunov exponent using reinforcement learning. Phys. D Nonlinear Phenom. 2021, 428, 133068. [Google Scholar] [CrossRef]
- Adeyemi, V.A.; Tlelo-Cuautle, E.; Perez-Pinal, F.J.; Nuñez-Perez, J.C. Optimizing the maximum Lyapunov exponent of fractional order chaotic spherical system by evolutionary algorithms. Fractal Fract. 2022, 6, 448. [Google Scholar] [CrossRef]
- Weissenbacher, M.; Borovykh, A.; Rigas, G. Reinforcement Learning of Chaotic Systems Control in Partially Observable Environments. Flow Turbul. Combust. 2025, 115, 1357–1378. [Google Scholar] [CrossRef]
- Prosperino, D.; Ma, H.; Räth, C. A generalized method for estimating parameters of chaotic systems using synchronization with modern optimizers. J. Phys. Complex. 2025, 6, 015012. [Google Scholar] [CrossRef]
- Ulibarrena, V.S.; Zwart, S.P. Reinforcement learning for adaptive time-stepping in the chaotic gravitational three-body problem. Commun. Nonlinear Sci. Numer. Simul. 2025, 145, 108723. [Google Scholar] [CrossRef]
- Islam, M.A.; Hassan, I.R.; Ahmed, P. Dynamic complexity of fifth-dimensional Henon map with Lyapunov exponent, permutation entropy, bifurcation patterns and chaos. J. Comput. Appl. Math. 2025, 466, 116547. [Google Scholar] [CrossRef]
- Ali, F.; Jhangeer, A.; Muddassar, M. Comprehensive classification of multistability and Lyapunov exponent with multiple dynamics of nonlinear Schrödinger equation. Nonlinear Dyn. 2025, 113, 10335–10364. [Google Scholar] [CrossRef]
- Kocarev, L. Chaos-based cryptography: A brief overview. IEEE Circuits Syst. Mag. 2001, 1, 6–21. [Google Scholar] [CrossRef]
- Ibrahim, M.; Elhafiz, R. Security Assessment of Industrial Control System Applying Reinforcement Learning. Processes 2024, 12, 801. [Google Scholar] [CrossRef]
- Cao, S.; Wang, X.; Cheng, Y. Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation. IEEE/CAA J. Autom. Sin. 2024, 11, 2497–2511. [Google Scholar] [CrossRef]
- Van Seijen, H.; Van Hasselt, H.; Whiteson, S.; Wiering, M. A theoretical and empirical analysis of expected sarsa. In Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, USA, 31 March–1 April 2009; IEEE: New York, NY, USA, 2009; pp. 177–184. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 1998; Volume 1. [Google Scholar]
- Nguyen, H.; Dang, H.B.; Dao, P.N. On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system. Aerosp. Sci. Technol. 2024, 146, 108972. [Google Scholar] [CrossRef]
- Nguyen, B.Q.A.; Dang, N.T.; Le, T.T.; Dao, P.N. On-policy and Off-policy Q-learning algorithms with policy iteration for two-wheeled inverted pendulum systems. Robot. Auton. Syst. 2025, 193, 105111. [Google Scholar] [CrossRef]
- Lu, C.; Schroecker, Y.; Gu, A.; Parisotto, E.; Foerster, J.; Singh, S.; Behbahani, F. Structured state space models for in-context reinforcement learning. Adv. Neural Inf. Process. Syst. 2023, 36, 47016–47031. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Sun, S.; Liu, R.; Lyu, J.; Yang, J.W.; Zhang, L.; Li, X. A large language model-driven reward design framework via dynamic feedback for reinforcement learning. Knowl.-Based Syst. 2025, 326, 114065. [Google Scholar] [CrossRef]
- Yu, F.; Gracia, Y.M.; Guo, R.; Ying, Z.; Xu, J.; Yao, W.; Jin, J.; Lin, H. Dynamic analysis and application of 6d multistable memristive chaotic system with wide range of hyperchaotic states. Axioms 2025, 14, 638. [Google Scholar] [CrossRef]
- Phatak, S.C.; Rao, S.S. Logistic map: A possible random-number generator. Phys. Rev. E 1995, 51, 3670. [Google Scholar] [CrossRef]
- Benedicks, M.; Carleson, L. The dynamics of the Hénon map. Ann. Math. 1991, 133, 73–169. [Google Scholar] [CrossRef]
- Yang, S.K.; Chen, C.L.; Yau, H.T. Control of chaos in Lorenz system. Chaos Solitons Fractals 2002, 13, 767–780. [Google Scholar] [CrossRef]
- Chua, L.O. Chua’s circuit: An overview ten years later. J. Circuits Syst. Comput. 1994, 4, 117–159. [Google Scholar] [CrossRef]
- Natiq, H.; Said, M.R.M.; Al-Saidi, N.M.; Kilicman, A. Dynamics and complexity of a new 4d chaotic laser system. Entropy 2019, 21, 34. [Google Scholar] [CrossRef]
- Masri, G.E.; Ali, A.; Abuwatfa, W.H.; Mortula, M.; Husseini, G.A. A Comparative Analysis of Numerical Methods for Solving the Leaky Integrate and Fire Neuron Model. Mathematics 2023, 11, 714. [Google Scholar] [CrossRef]
- Press, W.H. Numerical Recipes 3rd Edition: The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
- Li, S.; Yang, Y. Data-driven modeling of bifurcation systems by learning the bifurcation parameter generalization. Nonlinear Dyn. 2025, 113, 1163–1174. [Google Scholar] [CrossRef]
- Shahab, M.L.; Suheri, F.A.; Kusdiantara, R.; Susanto, H. Physics-informed neural networks for high-dimensional solutions and snaking bifurcations in nonlinear lattices. Phys. D Nonlinear Phenom. 2025, 481, 134836. [Google Scholar] [CrossRef]
- Kern, S.; McGuinn, M.E.; Smith, K.M.; Pinardi, N.; Niemeyer, K.E.; Lovenduski, N.S.; Hamlington, P.E. Computationally efficient parameter estimation for high-dimensional ocean biogeochemical models. Geosci. Model Dev. 2024, 17, 621–649. [Google Scholar] [CrossRef]
















| Method | Policy Type | Pros for Search | Limitations/Notes |
|---|---|---|---|
| Q-learning | Off-policy | Fast progress in sparse high-LLE landscapes; simple tabular update; good at jumping to promising peaks. | Can over-exploit noisy estimates near the edge of chaos; needs sufficient coverage and tuned step sizes. |
| SARSA | On-policy | Behavior-consistent updates; smoother learning when chaotic windows are narrow. | May converge more slowly in very sparse peaks; performance depends on exploration schedule (GLIE/annealing). |
| Grid/Random | N/A | Simple baseline; embarrassingly parallel; stable reference for fair budgets. | Curse of dimensionality; no adaptation to outcomes; many redundant evaluations in non-chaotic regions. |
| System | Parameters (Range) | Discretization |
|---|---|---|
| Logistic map | 100 steps | |
| Hénon map | grid | |
| Lorenz system | grid | |
| Chua’s circuit | grid | |
| Lorenz–Haken | grid | |
| Custom 5D | bounded box over model parameters | sampled points |
| Component | Setting |
|---|---|
| Environment | Single-step evaluation; state ; action selects ; reward ; simulator integrates dynamics under . |
| Behavior policy | -greedy over the discrete candidate set. |
| Update rule | SARSA or Q-learning on tabular . |
| Learning rate | . |
| Discount factor | (single-step return). |
| Exploration rate | . |
| Episodes | 1000–4000 (system-dependent). |
| Outputs | Reward trends, parameter heatmaps/attractor plots, final . |
| System | Grid Search Baseline (Total Evaluations) | RL Agent (Approx. Episodes to Find High LLE) | Cost Reduction |
|---|---|---|---|
| Logistic Map | 100 | ≈100–200 | ≈0% |
| Hénon Map | 2500 | ≈200–300 | ≈88–92% |
| Lorenz System | 8000 | ≈100–200 | ≈97.5–98.8% |
| Chua’s Circuit | 3375 | ≈150–200 | ≈94.1–95.6% |
| Lorenz–Haken | 10,000 | ≈25–50 | ≈99.5–99.75% |
| Custom 5D System | 10,000 | ≈100–150 | ≈98.5–99.0% |
| System | Episodes to High LLE | Attractor Quality | Q-Learning vs. SARSA Notes |
|---|---|---|---|
| Logistic Map (1D) | ≈100–200 | Correct | Both converge; Q slightly earlier. |
| Hénon Map (2D) | ≈200–300 | Correct | Similar; SARSA higher variance. |
| Lorenz System (3D) | ≈100–200 | Correct | Q faster; SARSA less late variance. |
| Chua’s Circuit (3D) | ≈150–200 | Correct (Q-Learning), Failed (SARSA) | Q-learning found attractor; SARSA converged to fixed point. |
| Lorenz–Haken (4D) | ≈25–50 | Correct | Both succeed; Q broader, SARSA tighter loops. |
| Custom 5D System | ≈100–150 | Correct | Both succeed; SARSA steadier curve. |
| System | Q-Learning Time (s) | SARSA Time (s) |
|---|---|---|
| Logistic Map (1D) | ≈2.5 | ≈2.5 |
| Hénon Map (2D) | ≈39.0 | ≈39.6 |
| Lorenz System (3D) | ≈226.5 | ≈226.7 |
| Chua’s Circuit (3D) | ≈1418.6 | ≈1438.6 |
| Lorenz–Haken (4D) | ≈145.3 | ≈145.6 |
| Custom 5D System | ≈340 | ≈340 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, G.-C. Efficient Parameter Search for Chaotic Dynamical Systems Using Lyapunov-Based Reinforcement Learning. Symmetry 2025, 17, 1832. https://doi.org/10.3390/sym17111832
Huang G-C. Efficient Parameter Search for Chaotic Dynamical Systems Using Lyapunov-Based Reinforcement Learning. Symmetry. 2025; 17(11):1832. https://doi.org/10.3390/sym17111832
Chicago/Turabian StyleHuang, Gang-Cheng. 2025. "Efficient Parameter Search for Chaotic Dynamical Systems Using Lyapunov-Based Reinforcement Learning" Symmetry 17, no. 11: 1832. https://doi.org/10.3390/sym17111832
APA StyleHuang, G.-C. (2025). Efficient Parameter Search for Chaotic Dynamical Systems Using Lyapunov-Based Reinforcement Learning. Symmetry, 17(11), 1832. https://doi.org/10.3390/sym17111832

