Abstract
Recent advancements in lightweight electroencephalogram(EEG) signal classification have enabled real-time human–robot interaction, yet challenges persist in balancing computational efficiency and safety in dynamic path planning. This study proposes an EEG-based inverse reinforcement learning (EIRL) framework to simulate human navigation strategies by decoding neural decision preferences. The method integrates a pruned WNFG-SSCCNet-ADMM classifier for EEG signal mapping, apprenticeship learning for reward function extraction, and Q-learning for policy optimization. Experimental validation in an 8 × 8 FrozenLake-v1 environment demonstrates that EIRL reduces average path risk values by 50% compared with traditional reinforcement learning, achieving expert-level safety ( = 4) while maintaining optimal path lengths. The framework enhances adaptability in unknown environments by embedding human-like risk aversion into robotic planning, offering a robust solution for applications requiring minimal prior environmental knowledge. Results highlight the synergy between neural feedback and computational models, advancing inclusive human–robot collaboration in safety-critical scenarios.
1. Introduction
In recent years, lightweight electroencephalogram (EEG) signal classification technologies have opened new possibilities for real-time human–robot interaction in wearable devices and robotic systems [1,2,3,4]. By integrating pruning algorithms and graph representation methods, researchers can significantly compress neural network parameters, enabling efficient EEG classification in low-computational environments [5,6,7]. Recent advancements in neural network optimization have witnessed significant progress through innovative integration of mathematical optimization and graph theory. A notable example is the WNFG-SSGCNet-ADMM framework developed by Wang et al. [8], which employs the alternating direction method of multipliers (ADMM) to achieve lightweight model design while maintaining performance integrity. The framework addresses critical challenges in model compression through constrained non-convex optimization formulation, demonstrating remarkable tenfold parameter reduction on both Bonn and SSW datasets without compromising classification accuracy. The theoretical underpinnings of the approach are strengthened by rigorous proofs of local convergence under practical assumptions, particularly when employing full-rank operators (), thereby overcoming theoretical limitations inherent in conventional pruning methodologies. The computational architecture is further enhanced through a weighted neighborhood field graph (WNFG) that dramatically reduces graph construction complexity from quadratic to , achieving an order-of-magnitude reduction in redundant edges while significantly improving memory efficiency.
Particularly noteworthy is the framework’s frequency-domain sparse graph construction, which demonstrates superior classification accuracy and stability compared with traditional time-domain approaches, especially under aggressive pruning conditions.
Subsequent validation in epileptic EEG recognition tasks has confirmed the framework’s robustness and practical applicability in critical biomedical applications. The Bonn dataset comprises five subsets (A–E), with subset E containing seizure data. Signals are segmented into non-overlapping 256-point windows to form four binary classification tasks (A v.s. E, B v.s. E, C v.s. E, D v.s. E), each containing 3200 samples (1600 epileptic vs non-epileptic) [9]. Experiments demonstrate that SSGCNet with WNFG in the frequency domain outperforms traditional time-domain methods, while ADMM pruning reduces model parameters by 10-fold without performance degradation. These results highlight ADMM’s superiority in balancing lightweight design and precision, paving the way for portable epilepsy monitoring devices [8].
Motor imagery (MI)-based EEG classification, which decodes neural activity patterns generated during imagined movements (e.g., limb motion), has emerged as a critical research direction in human-machine interaction. The PhysioNet MI dataset [10], for example, includes 64-channel EEG signals (sampled at 160 Hz) from 109 subjects performing 14 tasks (e.g., eye opening/closing, fist clenching/relaxing), with over 1500 samples [11]. Traditional classifiers such as support vector machines (SVM) [12] and linear discriminant analysis (LDA) [13] achieve high accuracy but rely on manually designed features, limiting their generalizability. In contrast, deep learning models (e.g., convolutional neural networks (CNN) [3,14]) automatically extract frequency-domain features but suffer from excessive training time and computational costs [15,16].
To address these challenges, lightweight improvement strategies have been proposed. Dropout techniques mitigate overfitting by randomly deactivating neurons [17,18], while magnitude-based pruning removes redundant weights for model compression. However, existing methods often lack theoretical optimization of network sparsity, leading to a trade-off between pruning rates and classification accuracy. Recent studies have demonstrated that hybrid frameworks combining ADMM with masked retraining have effectively solved non-convex sparse optimization problems, showing advantages in convergence and computational efficiency for epileptic EEG classification tasks [7,19].
Meanwhile, reinforcement learning (RL) and inverse reinforcement learning (IRL) have gained traction in robotic path planning. Sichkar [20] compared Q-Learning and Sarsa algorithms, highlighting parameter optimization for safety enhancement in dynamic environments; Wang et al. [21] proposed a globally guided RL framework (G2RL) achieving near-centralized planning efficiency in distributed multi-robot scenarios; and Gao [22] improved trajectory smoothness by integrating path graph preprocessing with Q-Learning. Nevertheless, traditional methods depend on prior environmental knowledge, struggling to adapt to unknown dynamic risks (e.g., random obstacles or terrain variations).
This study introduces an Embodied Intelligence Reinforcement Learning (EIRL) framework that synthesizes safety-conscious global path strategies through computational modeling of human navigation preferences. The methodology initiates with neurocognitive signal translation, where electroencephalogram (EEG) patterns are transformed into robotic control commands using a lightweight classification architecture, generating expert-level navigation trajectories. Building upon these empirical demonstrations, the framework subsequently employs apprenticeship learning to distill essential feature expectations and quantify reward weight distributions inherent in human decision-making processes. These optimized reward functions are then systematically integrated into Q-Learning iterations, enabling dynamic path planning that intrinsically balances exploration efficiency with collision avoidance. Experimental validation in the Frozen Lake V1 benchmark environment reveals the framework’s superior safety performance, achieving over 50% reduction in path risk metrics compared with conventional reinforcement learning and search algorithms. This neurocognitive-inspired approach establishes a novel paradigm for developing human-aligned autonomous systems that preserve biological decision-making advantages while addressing the safety-critical requirements of real-world robotic operations.
The main contributions of this study include:
- (I)
- Providing a physical-operation-free robot control interface for individuals with motor impairments, enhancing human–robot collaboration inclusivity;
- (II)
- Learning reward functions from EEG-implicit decision preferences via IRL, reducing reliance on prior maps;
- (III)
- Establishing a synergistic “brain-machine-environment” learning framework by integrating neural feedback with path execution results.
The research progression is structured to systematically bridge critical gaps in EEG-driven path planning. Initial investigations revealed two persistent limitations: conventional reinforcement learning frameworks struggle to decode human-like risk sensitivity from sparse neural signals, while existing EEG classification methods prioritize accuracy over real-time deployability. To address these challenges, our methodology integrates lightweight neural decoding with inverse reward modeling, enabling policy optimization that inherently balances safety and efficiency. This phased approach ensures theoretical rigor while maintaining practicality for real-world robotic applications. Conventional navigation systems rely on explicit environmental models or reactive RL policies, both of which struggle in partially observable, safety-critical scenarios (e.g., disaster rescue). EIRL addresses this gap by integrating EEG as a safety prior—a paradigm shift that enables proactive risk mitigation through human neurocognitive patterns. While traditional baselines (RL/) are used for benchmarking, their inability to leverage neural data inherently limits comparability, as no prior work combines EEG with IRL for navigation.
2. Problem Modeling and Analysis
This section decomposes the EEG-based inverse reinforcement learning framework for human-like navigation simulation into three interconnected components: neurosignal-driven expert trajectory generation, apprenticeship learning-based reward modeling, and reinforcement learning-optimized global path planning. The methodology initiates with multi-channel EEG signal acquisition, followed by feature classification and pattern recognition to establish deterministic mappings between EEG biomarkers and robotic motion primitives. Target-oriented navigation tasks leverage classified EEG signals to generate movement trajectories while recording complete path histories, from which expert datasets are synthesized by identifying optimal path segments through EEG pattern analysis. Inverse reinforcement learning algorithms then extract spatial–temporal feature expectations and their associated weights from these datasets, enabling the derivation of biomimetic reward functions that are systematically integrated into reinforcement learning architectures for policy optimization. The final policy undergoes rigorous validation in parametrized simulation environments through iterative refinement cycles until achieving deployment readiness criteria.
2.1. EEG Signal Mapping and Expert Data Generation
Beyond binary classification, EEG signal categorization can be extended to multi-class paradigms contingent on specific requirements and dataset characteristics. As illustrated in Figure 1, a robot navigating a grid-based environment permits four directional movements (up, down, left, right), necessitating corresponding EEG signal classification into four distinct categories. This alignment enables precise robotic maneuver control through discrete EEG-driven commands.
Figure 1.
A schematic diagram of the electroencephalogram classification of the robot’s movement direction and different actions in the grid map, the ultimate goal of the robot’s movement is to reach the destination marked by the yellow star.
Following the categorical mapping of EEG signals to specific actions, subjects were tasked with repeated pathfinding attempts in hazardous environments while maintaining cognitive unawareness of test map configurations. Through iterative trials, optimal paths (minimizing traversal time/distance while circumventing hazards) were identified and translated into expert strategies via EEG-controlled robotic navigation.
2.2. Feature Extraction and Computational Framework via Inverse Reinforcement Learning
Post-expert strategy formulation, feature extraction was performed across multiple trajectory datasets to characterize state-action relationships. This process yielded semantically meaningful feature vectors representing latent objective functions underpinning expert behaviors. Feature expectations were subsequently computed alongside optimal weightings through constrained optimization, enabling reward function derivation for subsequent reinforcement learning applications. The procedural workflow is diagrammatically represented in Figure 2.
Figure 2.
Flowchart of the reward function for inverse reinforcement learning.
2.3. Reinforcement Learning-Based Global Path Planning Paradigm
Global path planning involves converting raw environmental data into grid-based representations through SLAM algorithms, thereby transforming the problem into grid-optimized trajectory identification (Figure 3). This entails determining optimal routes between arbitrary start and goal positions within mapped environments. Conventional approaches often employ map-matching or trajectory-following techniques to localize robotic positions before initiating path optimization procedures.
Figure 3.
Example of raster map path optimization.
Reinforcement learning algorithms demonstrate efficacy in simple grid-world scenarios through appropriately tuned reward/penalty mechanisms (e.g., step-count penalties, collision penalties, and goal-reach rewards). However, complex environments necessitate alternative strategies. Inverse reinforcement learning offers a viable solution by inferring latent reward structures from expert demonstrations, thereby enabling navigation policies that encapsulate human-like decision-making tendencies.
4. Experimental Methodology
This experiment validates the EIRL algorithm’s EEG processing efficacy under hardware constraints through epileptic EEG simulations structured in three coherent phases. Initial EEG signal classification leverages the Bonn dataset with Weighted Normalized Feature Graph (WNFG) screening and Sparse Spatial Graph Convolutional Networks (SSGCNets) to establish discriminative neural pattern recognition. Processed EEG outputs subsequently guide robotic trajectory control within the FrozenLake simulation environment, implementing real-time navigation decision protocols.
The final phase synthesizes optimal navigation paths into expert demonstration datasets, from which apprenticeship learning extracts spatial feature expectations and optimizes reward weights via inverse reinforcement learning. These refined reward parameters are systematically incorporated into Q-learning frameworks for policy training, culminating in deployable Q-table optimizations that maintain the original algorithmic architecture. The framework of the overall test experiment and related comparative tests is shown in Figure 6.
Figure 6.
Experimental setup and baseline comparison framework.
4.1. EEG Signal Classification Test
Three classification tasks were constructed: emotion recognition (binary: subsets A/E), motor imagery (quadratic: A/B/C/D), and extended motor imagery (quintuple: A/B/C/D/E). Raw EEG signals underwent stratified slicing preprocessing, followed by structured pruning of WNFG-SSGCNet models via ADMM optimization (pruning rates: , , , , , , , , , , and , 100 iterations). Classification performance was evaluated using five-fold cross-validation, with final accuracy determined by averaging top-performing submodels from frequency-domain analyses to mitigate stochastic bias.
4.2. Expert Dataset Generation
The experiment engaged five healthy participants (22–28 years, 3 male/2 female) in robotic navigation tasks within the FrozenLake-v1 environment, operated via handheld controllers with treasure chest visual targets. Adopting a single-blind protocol, subjects received real-time environmental visual feedback devoid of wind disturbance preknowledge.
Data acquisition followed a two-stage protocol comprising two distinct operational regimes: initial trials enforced mandatory wind interference () across six consecutive navigation attempts, succeeded by four probabilistic trials implementing Bernoulli-distributed wind activation (). Expert trajectory qualification required simultaneous satisfaction of three optimality criteria: minimal navigational steps, maximized obstacle avoidance probability, and elimination of wind-influenced path deviations.
As illustrated in Figure 7, expert trajectories exhibit distinct spatial optimization characteristics. Each trajectory is stored as state transition tuple sequences:
where denotes current state, the next-state observation, and the executed action.
Figure 7.
Expert trajectories of ice lake Environment V1.
4.3. Feature Extraction Framework
Within the inverse reinforcement learning framework, this study employs apprenticeship learning to model expert data features. With discount factor , feature mapping functions calculate feature expectations for expert policy and for apprentice policy. Iterative approximation via convex optimization continues until the policy divergence metric falls below threshold , yielding optimal weight vector .
4.4. Reward-Driven Reinforcement Learning Training
The derived IRL reward function is integrated into Q-learning with key parameters: discount factor , learning rate , and -greedy exploration rate . Following 60,000 policy iterations, the converged Q-table is generated for navigation decisions. A dynamic exploration decay strategy ensures balanced exploitation-exploration tradeoffs during training.
4.5. Comparative Reinforcement Learning Experiments
This experiment addresses limitations in the native FrozenLake 8 × 8 environment reward structure (single terminal +1 reward lacking intermediate incentives) by redesigning the reward function with three interdependent components: a per-step penalty (−1) to optimize path efficiency, a severe ice hole falling penalty (−10) to enforce global obstacle avoidance, and a magnified terminal reward (+100) upon successful navigation to amplify policy update gradients. This synthesized reward scheme systematically balances exploration-exploitation dynamics while accelerating policy convergence through differentiated feedback signals.
Comparative experiments were conducted in FrozenLake-v1 with deactivated stochastic wind disturbances. Parameter consistency between inverse reinforcement learning (IRL) and baseline Q-learning was maintained: discount factor , learning rate , and -greedy exploration rate , and 60,000 training iterations. This configuration ensures comparable Q-table matrices generation across experimental groups.
4.6. Algorithm Benchmark Experiment
The search algorithm was implemented for deterministic global path planning in the FrozenLake-v1 environment with deactivated stochastic wind disturbances. Using Manhattan distance as the heuristic function, the algorithm autonomously generates optimal obstacle-avoiding trajectories from initial coordinate to terminal goal . These deterministic paths serve as baseline references for comparative performance analysis against stochastic navigation outcomes from reinforcement or inverse reinforcement learning approaches.
4.7. Policy Performance Evaluation
The derived policy parameters from inverse reinforcement learning (IRL) and standard reinforcement learning (RL) were deployed into the benchmark testing environment (FrozenLake-v1) for systematic navigation performance validation. A full parameter consistency protocol was implemented: environmental dynamics (including state transition mechanisms and reward computation architecture) strictly inherited configurations from Section 4.5 comparative RL experiments. Real-time trajectory tracking modules recorded agent kinematic characteristics, enabling quantitative comparative analysis of obstacle avoidance efficiency and path optimality between both policy types.
5. Results and Analysis
Experimental validation explicitly correlates with the three-phase framework design. The pruned EEG classifier demonstrated sufficient temporal resolution to support real-time control, with empirical measurements confirming stable operation under hardware constraints. Subsequent inverse reinforcement learning successfully captured risk-aversion patterns, as evidenced by policy trajectories avoiding high-risk zones even in unmapped environments. Most critically, the integrated EIRL framework maintained baseline path efficiency while significantly reducing collision probabilities compared with conventional approaches, validating the synergy between neural decoding and learned reward structures.
5.1. EEG Signal Classification Results
Table 1 presents five-fold cross-validation results of ADMM-pruned models across binary, quadratic, and quintuple classification tasks. The data reveal a strong negative correlation between model accuracy and pruning ratio: when pruning ratios exceed 0.8, quintuple and quadratic classification accuracies degrade to random guessing levels (20% and 25%, respectively), while binary classification maintains 50% baseline accuracy even at a 0.99 pruning ratio (Figure 8). This demonstrates ADMM pruning’s stronger robustness for low-dimensional tasks (binary) versus higher sensitivity in high-dimensional scenarios (quadratic/quintuple).
Table 1.
The five-fold cross-validation results under different classification categories and pruning rates.
Figure 8.
Performance of SSGCnet on Frequency-Domain EEG Datasets with varying class numbers. (a) The five-classification accuracy of EEG signals in the five-fold cross experiment under different pruning rates; (b) The four-classification accuracy of EEG signals in the five-fold cross experiment under different pruning rates (c) The binary classification accuracy of EEG signals in the five-fold cross experiment under different pruning rates; (d) Accuracy of different classification methods under various pruning rates.
Following the accuracy–efficiency tradeoff principle, we select a 0.6 pruning ratio as optimal (67.4% quadratic accuracy v.s. 70.6% at 0.1 ratio), achieving 500% model compression with merely 3.2% absolute accuracy loss. This balance enables real-time EEG decoding on resource-constrained devices.
Figure 9 further illustrates the model’s generalization capacity in classical control scenarios: quadratic outputs map to up, down, left, right action spaces for FrozenLake navigation, while binary classification’s high stability suits rapid-decision environments like CartPole. Experiments confirm the method’s extensibility to CliffWalking and Snake game scenarios, establishing a new paradigm for cross-domain brain–computer interface applications.
Figure 9.
Classic control scenarios in the Gym.
5.2. Global Path Planning Results and Analysis
The deterministic algorithm achieved globally optimal path planning in the FrozenLake-v1 environment (Figure 10). Generated trajectory from initial coordinate to target strictly adheres to Manhattan distance heuristic, exhibiting “right-priority, downward-supplement” navigation patterns. This path validates the theoretical completeness of in discrete grid environments for shortest obstacle-avoiding path computation, with decision-making fully driven by prior environmental knowledge without stochastic interference.
Figure 10.
The algorithm global planning path.
5.3. Q-Learning Global Path Planning Results and Analysis
Under identical configurations, Q-learning generated two optimized paths through 60,000 policy iterations (Figure 11). Action space mapping: 0—left, 1—down, 2—right, 3—up. The action sequences are
Figure 11.
The Q-learning algorithm global planning path.
1. Trajectory 1:
2. Trajectory 2:
Results demonstrate Q-learning’s convergence to Manhattan-optimal paths under exploration-exploitation tradeoffs, with multi-modal trajectories highlighting RL’s adaptability in dynamic environments. Compared with ’s deterministic paths, Q-learning achieves equivalent optimization through online learning mechanisms.
5.4. EIRL Global Path Planning Results and Analysis
The Enhanced Inverse Reinforcement Learning (EIRL) algorithm was evaluated in FrozenLake-v1 using expert demonstrations (Figure 7), with expert dependency coefficient modulating the expert-exploration tradeoff ( indicates full expert reliance).
Two optimized path types emerged (Figure 12):
Figure 12.
EIRL global planning path.
1. Low-dependency mode (): Action sequence .
2. High-dependency mode (): Action sequence .
Experimental results demonstrate that EIRL converged to Manhattan-optimal paths across all validating theoretical optimality.
5.5. Safety Analysis of Path Planning
For glacial lake environment V1 with stochastic wind disturbances and other potential environmental perturbations, path safety evaluation requires systematic analysis of spatial adjacency to ice holes. This study defines risk level 1 as path segments adjacent to one ice hole and risk level 2 for those adjacent to two ice holes (Table 2). Corresponding risk values are quantified as 0 for risk level 0 (no adjacent ice holes), 2 for level 1, and 4 for level 2.
Table 2.
Comparison of the effect and risk of path planning.
Expert strategy analysis reveals a total risk value of for five typical paths. To evaluate algorithm safety performance, we establish the average risk metric:
where n denotes test iterations and represents the risk value of the i-th path. And denotes the number of grid cells with distinct risk levels along the i-th path. As shown in Table 3, paths 2–5 generated by EIRL demonstrate superior stability with , matching expert-level safety standards. Notably, path 1 was excluded from typical EIRL analysis due to its predominant reliance on autonomous exploration (Low-dependency mode). In contrast, Q-learning exhibits significant divergence in safety propensity compared with expert strategies. Experimental results confirm that EIRL enhances path safety by over 100% compared with conventional reinforcement learning through expert knowledge integration.
Table 3.
Comparison of average risk values of different path planning algorithms.
In navigation scenarios with stochastic risks (e.g., glacial lake model V1), traditional path planning algorithms primarily handle deterministic obstacles but fail to address environmental uncertainties (e.g., wind-induced displacement). Our inverse reinforcement learning framework incorporates human expertise to enable rapid derivation of optimal paths balancing safety and efficiency. This approach proves particularly effective in environments with known obstacle distributions but unmodeled stochastic risks.
5.6. Limitations and Future Directions
While the proposed EIRL framework demonstrates promising results, several limitations warrant discussion. First, the reliance on epilepsy-adjacent EEG data, though methodologically compatible with motor imagery paradigms, necessitates validation on dedicated MI datasets to confirm generalizability across physiological states. Second, the discrete 4-directional action space, while simplifying EEG-robot mapping, restricts applicability to continuous control tasks. Future work will integrate hierarchical RL architectures to bridge this gap. Third, our participant pool’s homogeneity (healthy adults aged 22–28) limits insights into neurodiverse populations. Collaborative trials with clinical cohorts (e.g., stroke survivors) are planned to address this. Lastly, scalability to complex environments (e.g., 3D dynamic terrains) remains an open challenge, motivating research into multi-modal biosignal fusion and adaptive resolution grids. These limitations, while non-trivial, define clear pathways for advancing EEG-driven autonomy in real-world applications.
6. Conclusions
This study proposes an electroencephalogram (EEG)-based inverse reinforcement learning framework (EIRL) that constructs expert strategies for global path planning by decoding human neural activities, which subsequently drives the inverse reinforcement learning process to derive optimal paths. Experimental results demonstrate that EIRL-generated paths not only satisfy the optimality criteria of inverse reinforcement learning but also reveal significant consistency between robotic decisions and human path selection behaviors through expert strategy integration. The proposed method achieves 100% enhancement in path safety performance, particularly validating the reinforcement effect of expert prior knowledge on safe path selection in scenarios containing unmodeled risks (e.g., dynamic environmental disturbances).
The staged methodology provides critical insights into neurocognitive-inspired robotics. The success of Phase 1 highlights that aggressive network pruning need not sacrifice temporal precision when guided by domain-specific constraints. Phase 2’s reward abstraction demonstrates that human risk sensitivity can be encoded as spatial feature expectations, offering a generalizable alternative to handcrafted penalty functions. Most notably, Phase 3’s policy convergence behavior suggests that neural-driven exploration strategies naturally balance safety and efficiency—a property notoriously difficult to achieve through manual reward shaping. These findings collectively advance the design of autonomous systems requiring minimal prior environmental knowledge.
Two main limitations should be acknowledged: First, constrained by experimental apparatus, epilepsy-related datasets were employed as substitutes for standard motor imagery data in classification validation, though the methodological framework remains compatible with typical motor imagery paradigms. Second, to ensure EEG classification accuracy, robot motion was restricted to a four-direction discrete action space under Manhattan distance constraints, with current experiments only validating EIRL’s efficacy in low-resolution grid maps. Future research will focus on advancing path planning methodologies through the development of high-dimensional continuous state space models, coupled with multi-scale algorithmic frameworks capable of adapting to dynamically evolving 2D/3D environments. Concurrent investigations will explore multi-modal biosignal-integrated decision architectures to enhance autonomous system responsiveness.
The current comparisons, while illustrative, underscore a broader challenge in neural-integrated navigation research: the absence of standardized benchmarks to disentangle EEG’s specific contributions. Future studies must bridge this gap by developing quantitative metrics that isolate neural features’ impact on risk perception versus spatial reasoning. Such efforts could involve constructing open-source testbeds with human-in-the-loop baselines, where EEG’s role in encoding safety priors can be systematically compared against other biosignals (e.g., EMG or EOG). Additionally, large-scale validation in unmodeled scenarios—such as post-disaster environments with collapsed structural maps—will be critical to assess generalizability beyond laboratory-controlled settings. Collaborations with neuroscientists will further elucidate the biological underpinnings of observed EEG-safety correlations, advancing toward explainable human–AI symbiosis.
Author Contributions
Conceptualization, H.Z. and J.W.; methodology, H.Z. and J.W.; software, H.Z. and J.W.; validation, J.W.; formal analysis, H.Z. and R.G.; investigation, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, J.W. and R.G.; supervision, R.G. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Science and Technology Commission of Shanghai Municipality under Grant 2021SHZDZX, and the National Key Research and Development Program of China (2021ZD0202200, 2021ZD0202202), the National Natural Science Foundation of China (No. 52301402) and the Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515110574). Rui Gao is affiliated with the Key Laboratory of Marine Intelligent Equipment and System of Ministry of Education, Shanghai Jiao Tong University, and Shenzhen Research Institute of Shanghai Jiao Tong University.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Baldi, P.; Sadowski, P.J. Understanding dropout. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, VS, USA, 3–8 December 2013; pp. 2814–2822. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
- Wang, G.; Wang, D.; Du, C.; Li, K.; Zhang, J.; Liu, Z.; Tao, Y.; Wang, M.; Cao, Z.; Yan, X. Seizure prediction using directed transfer function and convolution neural network on intracranial EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2711–2720. [Google Scholar] [CrossRef] [PubMed]
- Mao, S.; Sejdić, E. A review of recurrent neural network-based methods in computational physiology. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 6983–7003. [Google Scholar] [CrossRef] [PubMed]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar]
- Gao, R.; Tronarp, F.; Särkkä, S. Variable splitting methods for constrained state estimation in partially observed Markov processes. IEEE Signal Process. Lett. 2020, 27, 1305–1309. [Google Scholar] [CrossRef]
- Ye, S.; Zhang, T.; Zhang, K.; Li, J.; Xu, K.; Yang, Y.; Yu, F.; Tang, J.; Fardad, M.; Liu, S.; et al. Progressive weight pruning of deep neural networks using ADMM. arXiv 2018, arXiv:1810.07378. [Google Scholar]
- Wang, J.; Gao, R.; Zheng, H.; Zhu, H.; Shi, C.J.R. Ssgcnet: A sparse spectra graph convolutional network for epileptic eeg signal classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 12157–12171. [Google Scholar] [CrossRef] [PubMed]
- Andrzejak, R.G.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar] [CrossRef] [PubMed]
- Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
- Chambolle, A.; Pock, T. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 2011, 40, 120–145. [Google Scholar] [CrossRef]
- Yu, H.; Kim, S. SVM Tutorial-Classification, Regression and Ranking. Handb. Nat. Comput. 2012, 1, 479–506. [Google Scholar]
- Yang, J.; Yu, H.; Kunz, W. An efficient LDA algorithm for face recognition. In Proceedings of the International Conference on Automation, Robotics, and Computer Vision (ICARCV 2000), Singapore, 5–8 December 2000; pp. 34–47. [Google Scholar]
- Gao, Z.; Wang, X.; Yang, Y.; Mu, C.; Cai, Q.; Dang, W.; Zuo, S. EEG-based spatio–temporal convolutional neural network for driver fatigue evaluation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2755–2763. [Google Scholar] [CrossRef] [PubMed]
- Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Vandorpe, J.; Van Brussel, H.; Xu, H. Exact dynamic map building for a mobile robot using geometrical primitives produced by a 2D range finder. In Proceedings of the IEEE International Conference on Robotics and Automation, Minneapolis, MN, USA, 22–28 April 1996; Volume 1, pp. 901–908. [Google Scholar]
- Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the value of network pruning. arXiv 2018, arXiv:1810.05270. [Google Scholar]
- Blalock, D.; Gonzalez Ortiz, J.J.; Frankle, J.; Guttag, J. What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2020, 2, 129–146. [Google Scholar]
- Zhang, T.; Ye, S.; Zhang, K.; Tang, J.; Wen, W.; Fardad, M.; Wang, Y. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 184–199. [Google Scholar]
- Sichkar, V.N. Reinforcement learning algorithms in global path planning for mobile robot. In Proceedings of the 2019 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Sochi, Russia, 25–29 March 2019; pp. 1–5. [Google Scholar]
- Wang, B.; Liu, Z.; Li, Q.; Prorok, A. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot. Autom. Lett. 2020, 5, 6932–6939. [Google Scholar] [CrossRef]
- Gao, P.; Liu, Z.; Wu, Z.; Wang, D. A global path planning algorithm for robots using reinforcement learning. In Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, 6–8 December 2019; pp. 1693–1698. [Google Scholar]
- Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 1. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
