Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (807)

Search Parameters:
Keywords = online-learning control

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2171 KB  
Article
Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump
by Jordan Kralev, Alexander Mitov and Tsonyo Slavov
Mathematics 2026, 14(7), 1127; https://doi.org/10.3390/math14071127 - 27 Mar 2026
Abstract
This article presents the synthesis, real-time implementation, and experimental validation of an approximated adaptive dynamic programming (AADP) actor–critic controller for precise flow rate regulation of a variable-displacement axial-piston pump designed for open-circuit hydraulic systems. Replacing the conventional hydro-mechanical regulator with an electrohydraulic proportional [...] Read more.
This article presents the synthesis, real-time implementation, and experimental validation of an approximated adaptive dynamic programming (AADP) actor–critic controller for precise flow rate regulation of a variable-displacement axial-piston pump designed for open-circuit hydraulic systems. Replacing the conventional hydro-mechanical regulator with an electrohydraulic proportional spool valve, the model-free controller employs two compact two-layer neural networks: the actor generates valve PWM signals from the flow tracking error, its integral, and measured discharge pressure, while the critic approximates the infinite-horizon quadratic cost-to-go via the online solution of the Bellman equation through gradient descent on Bellman residuals. Lyapunov analysis establishes closed-loop stability under bounded learning rates, with initial weights tuned via nominal plant simulation to ensure convergence from feasible starting policies. After extensive laboratory testing across four fixed loading conditions and dynamic load variations, the adaptive controller demonstrated superior performance compared with a proportional-integral (PI) controller, a Lyapunov model-reference adaptive controller (LMRAC), and an H controller (Hinf). Real-time metrics confirm bounded critic signals and near-zero Bellman errors, validating optimal policy convergence amid unmodeled hydraulic nonlinearities. Full article
(This article belongs to the Special Issue Advances in Robust Control Theory and Its Applications)
21 pages, 2822 KB  
Article
Policy-Guided Model Predictive Path Integral for Safe Manipulator Trajectory Planning
by Liang Liang, Chengdong Wu and Xiaofeng Wang
Sensors 2026, 26(7), 2074; https://doi.org/10.3390/s26072074 - 26 Mar 2026
Abstract
Aiming at the problems of difficult hard-constraint enforcement and weak environmental generalization ability in the safe trajectory planning of manipulators in complex environments, a Policy-Guided Model Predictive Path Integral (PG-MPPI) planning framework is proposed. This framework integrates the advantages of reinforcement learning and [...] Read more.
Aiming at the problems of difficult hard-constraint enforcement and weak environmental generalization ability in the safe trajectory planning of manipulators in complex environments, a Policy-Guided Model Predictive Path Integral (PG-MPPI) planning framework is proposed. This framework integrates the advantages of reinforcement learning and model predictive control to construct a global prior guidance, local real-time optimization and hard-constraint safety assurance: a Constraint-Discounted Soft Actor–Critic (CD-SAC) offline learning policy is designed, which incorporates the configuration-space distance field as a safety guidance term to realize the learning of obstacle avoidance behavior; the offline policy is used to guide the online sampling and optimization of MPPI, improving sampling efficiency and planning quality; and a Control Barrier Function (CBF) safety filter is introduced to revise control commands in real time, ensuring the strict satisfaction of constraints. Taking the SIASUN T12B manipulator as the research object, simulation comparison experiments are carried out in multi-obstacle scenarios. The results show that the PG-MPPI algorithm outperforms the comparison algorithms in the success rate of collision-free target reaching, ensures the smoothness and feasibility of the trajectory, and has a good adaptive capacity to complex environments with unknown obstacle configurations, thus providing an efficient solution for the autonomous and safe operation of manipulators. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

19 pages, 2509 KB  
Article
Is Burnout the Hidden Architecture of Academic Life in University Students? A Network Analysis of Psychological Functioning Within a Control–Value and Job Demands–Resources Framework
by Edgar Demeter, Dana Rad, Mușata Bocoș, Alina Roman, Anca Egerău, Sonia Ignat, Tiberiu Dughi, Dana Dughi, Alina Costin, Ovidiu Toderici, Gavril Rad, Radiana Marcu, Daniela Roman, Otilia Clipa and Roxana Chiș
Behav. Sci. 2026, 16(4), 493; https://doi.org/10.3390/bs16040493 - 26 Mar 2026
Abstract
Academic functioning in university students emerges from the interplay of motivational, self-regulatory, emotional, and contextual processes. The present study examined the network structure linking academic motivation, self-regulated learning, academic engagement, academic burnout, generalized anxiety, self-esteem, and students’ ratings of instruction. Participants were 530 [...] Read more.
Academic functioning in university students emerges from the interplay of motivational, self-regulatory, emotional, and contextual processes. The present study examined the network structure linking academic motivation, self-regulated learning, academic engagement, academic burnout, generalized anxiety, self-esteem, and students’ ratings of instruction. Participants were 530 university students from Western Romania (Mage = 28.86, SD = 9.75; 87.5% women). Data were collected through an online cross-sectional survey using validated self-report instruments. A Gaussian Graphical Model was estimated using the EBICglasso procedure to examine the unique associations among the study variables and their relative structural importance within the network. The results indicated a moderately dense psychological network, with academic burnout emerging as the most structurally central node. Intrinsic motivation toward achievement, identified regulation, and performance control were positioned within the adaptive core of the network, whereas burnout, anxiety, amotivation, and low self-esteem clustered within the maladaptive region. Academic engagement occupied an intermediary position linking motivational and self-regulatory processes. Overall, the findings support a systems-oriented interpretation of academic functioning, suggesting that burnout represents a key convergence point in students’ psychological functioning, while self-determined motivation and self-regulated learning may serve as protective processes. These results highlight the value of network analysis for identifying psychologically meaningful intervention targets in higher education. Full article
(This article belongs to the Special Issue Academic Anxieties and Coping Strategies)
Show Figures

Figure 1

30 pages, 2519 KB  
Article
Super-Twisting-Based Online Learning in High-Order Neural Networks for Robust Backstepping Control of DC Motors Under Uncertainty
by Ivan R. Urbina Leos, Jesus A. Medrano Hermosillo, Abraham E. Rodriguez Mata, Francisco R. Lopez-Estrada, Oscar J. Suarez and Alma Alejandra Luna-Gómez
Processes 2026, 14(6), 1019; https://doi.org/10.3390/pr14061019 - 22 Mar 2026
Viewed by 266
Abstract
This paper addresses the speed control problem of a DC motor in the presence of nonlinearities, disturbances, and unmodeled dynamics by proposing a neural backstepping control scheme based on a Recurrent High-Order Neural Network (RHONN). The proposed RHONN serves as an online approximator [...] Read more.
This paper addresses the speed control problem of a DC motor in the presence of nonlinearities, disturbances, and unmodeled dynamics by proposing a neural backstepping control scheme based on a Recurrent High-Order Neural Network (RHONN). The proposed RHONN serves as an online approximator to compensate for uncertain nonlinear dynamics in a PD-based backstepping controller, enabling the system to handle disturbances, modeling errors, and unmodeled dynamics. Instead of relying on the traditional Extended Kalman Filter (EKF) for RHONN weight adaptation, the neural parameters are updated online using a Super-Twisting Algorithm (STA). As a result, the proposed STA-based learning law provides a simpler and robust covariance-free adaptation mechanism with practical finite-time convergence properties, making it suitable for real-time embedded implementations. The proposed method was evaluated through numerical simulations and implemented on an embedded microcontroller to assess its real-time performance. Simulation results show reductions between 0.04% and 2.04% in steady-state and integral error metrics compared with a tuned PD controller, and improvements up to 25.66% and 23.82% over LQR and MPC in the IMSE index. Experimental results demonstrate good tracking performance, robustness under varying load conditions, and low computational requirements, confirming the practical feasibility. Full article
(This article belongs to the Special Issue Advances in Electrical Drive Control Methodologies)
Show Figures

Figure 1

37 pages, 2896 KB  
Article
Energy-Efficient Resilience Scheduling for Elevator Group Control via Queueing-Based Planning and Safe Reinforcement Learning
by Tingjie Zhang, Tiantian Zhang, Hao Zou, Chuanjiang Li and Jun Huang
Machines 2026, 14(3), 352; https://doi.org/10.3390/machines14030352 - 21 Mar 2026
Viewed by 120
Abstract
High-rise elevator group control systems operate under pronounced nonstationarity during commuting peaks, post-event surges, and capacity degradation, where the waiting time distribution becomes right-tail heavy and stresses service-level agreements (SLAs) defined by coverage and high-quantile targets. At the same time, the time-of-use tariffs [...] Read more.
High-rise elevator group control systems operate under pronounced nonstationarity during commuting peaks, post-event surges, and capacity degradation, where the waiting time distribution becomes right-tail heavy and stresses service-level agreements (SLAs) defined by coverage and high-quantile targets. At the same time, the time-of-use tariffs and carbon constraints sharpen the tension between peak-power control, energy savings, and service capacity. This paper proposes a two-layer resilience scheduling framework that integrates queueing-based planning with safe reinforcement learning (RL) fine-tuning. In the planning layer, parsimonious queueing approximations and scenario-based evaluation construct a finite set of implementable mode cards and emergency switching cards; Sample Average Approximation (SAA) combined with Conditional Value-at-Risk (CVaR) constraints filter candidates to enforce tail-risk-aware service limits while keeping power demand within a prescribed envelope. In the execution layer, online dispatch is formulated as a constrained Markov decision process; within the planning layer limits, action masking and Lagrangian safe RL learn small adaptive adjustments to suppress tail-waiting risk and improve recovery dynamics without increasing peak-power commitments. The experiments under morning peaks and post-event surges confirm tail risk reduction and accelerated recovery. For partial outages, the framework prioritizes SLA coverage and recovery speed, accepting a bounded increase in tail risk as a manageable trade-off. Throughout all tests, peak power remains within the prescribed limits. Improvements persist across random seeds and demand fluctuations, indicating distributional robustness and cross-scenario generalization. Ablation studies further reveal complementary roles: removing the planning layer CVaR screening worsens tail performance, while removing the execution layer action masking increases constraint violations and destabilizes recovery. Full article
Show Figures

Figure 1

22 pages, 10289 KB  
Article
Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems
by Zhuoyue Dai, Yongmin Yang, Yanting Luo, Zhilong Lin and Guanpeng Yang
Drones 2026, 10(3), 218; https://doi.org/10.3390/drones10030218 - 19 Mar 2026
Viewed by 149
Abstract
Maintaining high power delivery under uncertain landing positions is a key challenge for wireless charging of unmanned aerial vehicles (UAVs). This paper presents a data-driven power optimization method based on the Soft Actor-Critic algorithm for multi-transmitter single-receiver wireless power transfer (MTSR-WPT) systems. To [...] Read more.
Maintaining high power delivery under uncertain landing positions is a key challenge for wireless charging of unmanned aerial vehicles (UAVs). This paper presents a data-driven power optimization method based on the Soft Actor-Critic algorithm for multi-transmitter single-receiver wireless power transfer (MTSR-WPT) systems. To support effective learning without explicit online parameter identification, a physics-informed dual-current state representation is constructed from measurable current responses, combining a zero-phase current with the current response under the applied phase command. The agent is trained using a reward defined directly from normalized load power, and the transmitter voltage phases serve as the control actions. In simulations of a five-transmitter system, the learned policy achieves about 97% of the theoretical maximum power in the training region and about 96% in the expanded evaluation region. Additional robustness studies show strong performance under moderate measurement noise and substantial recovery under model mismatch after short fine-tuning. Experimental validation on a physical prototype confirms the effectiveness of the method, yielding an average power improvement of 188% from a zero-phase baseline and reaching 87% of the maximum power measured on the hardware platform. These results support the proposed method as a practical data-driven alternative to model-dependent MTSR-WPT power optimization for UAV wireless charging. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

24 pages, 2649 KB  
Article
LQR-Tuned Self-Regulating Sliding Mode Control of a Boost Converter for Robust Voltage Regulation in DC Microgrids
by Omer Saleem, Muhammad Rafique and Jamshed Iqbal
Mathematics 2026, 14(6), 1030; https://doi.org/10.3390/math14061030 - 18 Mar 2026
Viewed by 122
Abstract
This paper presents a hybrid control strategy for robust voltage regulation of a DC–DC boost converter used in a renewable-rich DC microgrid. The DC microgrid may comprise batteries, photovoltaic, and wind energy sources connected to a common DC bus, where voltage fluctuations arise [...] Read more.
This paper presents a hybrid control strategy for robust voltage regulation of a DC–DC boost converter used in a renewable-rich DC microgrid. The DC microgrid may comprise batteries, photovoltaic, and wind energy sources connected to a common DC bus, where voltage fluctuations arise due to variable generation and dynamic load profiles. To ensure optimal and efficient output voltage regulation under these conditions, a novel Linear Quadratic Regulator (LQR) driven self-regulating Sliding Mode Control (SMC) approach is developed. The proposed scheme is realized by combining the optimal performance of an LQR voltage-reference tracking controller with the robustness of a tangent-hyperbolic-based-sliding-mode reaching law defined over an LQR-driven sliding surface. To reduce chattering and improve adaptability to bounded disturbances, the waveform of the hyperbolic switching function in the reaching law is adaptively modulated via an online indirect supervised learning law. The control parameters are tuned offline using numerical optimization. Simulation results under different scenarios, including input voltage disturbances, load variations, and model uncertainties, show that the proposed method achieves superior voltage regulation, reduced chattering, and enhanced dynamic response compared to conventional controllers. The framework ensures reliable EV integration into intelligent DC microgrids. Full article
Show Figures

Figure 1

12 pages, 1019 KB  
Proceeding Paper
Intelligent Drone Patrolling with Real-Time Object Detection and GPS-Based Path Adaptation
by Gurugubelli V. S. Narayana, Shiba Prasad Swain, Debabrata Pattnayak, Manas Ranjan Pradhan and P. Ankit Krishna
Eng. Proc. 2026, 124(1), 82; https://doi.org/10.3390/engproc2026124082 - 18 Mar 2026
Viewed by 217
Abstract
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we [...] Read more.
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we aim at designing and validating experimentally a low-cost drone-based unmanned autonomous mission patrolling system with waypoint navigation, real-time video backhauling, AI-based human/object detection and GPS path re-planning when an event occurs to ensure the safety of patrol missions under battery constraints. Methods: The proposed architecture combines autonomous navigation and embedded flight-control with online analog video streaming and ground-station-based computer vision processing. Object detection based on deep learning for live aerial video is used, and the proposed system’s performance is tested at different altitudes, lighting states and GPS patrol plans. Results: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system is able to adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Conclusions: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system can adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Full article
(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

23 pages, 10022 KB  
Article
Biomimetic Dual-Strategy Adaptive Differential Evolution for Joint Kinematic-Residual Calibration with a Neuro-Physical Hybrid Jacobian
by Xibin Ma, Yugang Zhao and Zhibin Li
Biomimetics 2026, 11(3), 217; https://doi.org/10.3390/biomimetics11030217 - 18 Mar 2026
Viewed by 211
Abstract
Improving absolute accuracy in industrial manipulators remains difficult because rigid-body kinematic calibration cannot fully represent configuration-dependent non-geometric effects. Drawing inspiration from biological brain–body co-adaptation, this study presents an Evolutionary Neuro-Physical Hybrid (Evo-NPH) framework in which rigid geometric parameters and neural compensator weights are [...] Read more.
Improving absolute accuracy in industrial manipulators remains difficult because rigid-body kinematic calibration cannot fully represent configuration-dependent non-geometric effects. Drawing inspiration from biological brain–body co-adaptation, this study presents an Evolutionary Neuro-Physical Hybrid (Evo-NPH) framework in which rigid geometric parameters and neural compensator weights are treated as a single co-evolving decision vector. In the offline phase, a Dual-Strategy Adaptive Differential Evolution (DS-ADE) optimizer performs global joint identification using complementary exploration–exploitation behaviors and success-history inheritance, analogous to morphology-control co-evolution in biological systems. In the online phase, a Neuro-Physical Hybrid Jacobian (NPHJ) solver augments the analytical Jacobian with gradients from a Graph Kolmogorov–Arnold Network (GKAN), enabling sensorimotor-like real-time compensation on the learned physical manifold. Experiments on an ABB IRB 120 manipulator with 600 configurations (500 training, 100 testing) report a testing distance-residual RMSE of 0.62 mm, STD of 0.59 mm, and MAX of 0.83 mm. Relative to the uncalibrated baseline, RMSE is reduced by 86.75%; compared with the strongest published baseline, RMSE improves by 23.46%. Ablation results show that joint DS-ADE optimization outperforms a sequential pipeline by 32.6%, and the graph-structured KAN outperforms a parameter-matched MLP by 26.2%. Wilcoxon signed-rank tests (p<0.001) confirm statistical significance. Full article
(This article belongs to the Section Biological Optimisation and Management)
Show Figures

Figure 1

21 pages, 9615 KB  
Article
Neuro-Adaptive Control for a Balance Board: Comparative Study with PID and LQR
by Gazi Akgun
Appl. Sci. 2026, 16(6), 2890; https://doi.org/10.3390/app16062890 - 17 Mar 2026
Viewed by 181
Abstract
Balance is an essential component in both everyday movement and sports performance. Balance boards are commonly used for training and physical therapy to improve balance. Conventional balance boards primarily rely on the user’s voluntary actions, whereas active/actuated balance boards can provide dynamic motion [...] Read more.
Balance is an essential component in both everyday movement and sports performance. Balance boards are commonly used for training and physical therapy to improve balance. Conventional balance boards primarily rely on the user’s voluntary actions, whereas active/actuated balance boards can provide dynamic motion for both balance and rehabilitation. While this enables more effective training, it also introduces strong user-dependent and time-varying dynamics that are difficult to regulate with conventional controllers. This study addresses this limitation by developing a neuro-adaptive sliding mode controller to handle the strong inter-user variability and nonlinear pressure–force dynamics of pneumatic artificial muscles. The controller combines a learning neural network that updates online with a robust control structure to ensure stable motion in the presence of disturbances. The proposed approach was evaluated against commonly used PID and LQR controllers under sudden changes in operating conditions. Simulation results show that the proposed controller improves stability, reduces control effort, and adapts more effectively to different users and external disturbances. These findings suggest that neuro-adaptive control strategies can improve the reliability and responsiveness of balance training and rehabilitation devices, supporting safer and more personalized therapy. Full article
Show Figures

Figure 1

24 pages, 2066 KB  
Article
Reinforcement Learning Based Warm Initialization for Constrained Open-System Quantum Optimal Control: A Controlled Budget-Matched RL-GRAPE Benchmark
by Daniele Gabriele and Lorenzo Ricciardi Celsi
Electronics 2026, 15(6), 1251; https://doi.org/10.3390/electronics15061251 - 17 Mar 2026
Viewed by 171
Abstract
Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL [...] Read more.
Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL agent is trained offline in simulation to generate feasible, bounded seed pulses, which are subsequently refined via GRAPE under Lindblad dynamics. Hard amplitude constraints are enforced consistently across both stages, ensuring strict feasibility throughout optimization. Performance is evaluated using a budget-matched protocol based on fidelity evaluations (F-evals), enabling controlled comparison with random-start multi-start GRAPE. On a transmon-like qubit benchmark with relaxation and dephasing, RL warm-starting reduces the median online refinement effort in the adopted finite-difference GRAPE implementation from 7568 to 3543 F-evals (2.14× reduction) while achieving terminal state fidelity ≥0.995 under identical constraints and evaluation budgets. We provide a theoretical interpretation of the improvement in terms of basin-of-attraction probability shaping in constrained control landscapes and an amortized cost analysis showing that the offline RL cost is recovered after a small number of reuse cycles. The results support the view that learning-based initialization can improve warm-start quality relative to uninformed feasible multi-start in constrained open-system quantum-control benchmarks, while broader practical comparison against stronger physics-guided seeds remains for future work. Full article
Show Figures

Figure 1

28 pages, 1600 KB  
Article
A Data-Driven Deep Reinforcement Learning Framework for Real-Time Economic Dispatch of Microgrids Under Renewable Uncertainty
by Biao Dong, Shijie Cui and Xiaohui Wang
Energies 2026, 19(6), 1481; https://doi.org/10.3390/en19061481 - 16 Mar 2026
Viewed by 190
Abstract
The real-time economic dispatch of microgrids (MGs) is challenged by the high penetration of renewable energy and the resulting source–load uncertainties. Conventional optimization-based scheduling methods rely heavily on accurate probabilistic models and often suffer from high computational burdens, which limits their real-time applicability. [...] Read more.
The real-time economic dispatch of microgrids (MGs) is challenged by the high penetration of renewable energy and the resulting source–load uncertainties. Conventional optimization-based scheduling methods rely heavily on accurate probabilistic models and often suffer from high computational burdens, which limits their real-time applicability. To address these challenges, a data-driven deep reinforcement learning (DRL) framework is proposed for real-time microgrid energy management. The MG dispatch problem is formulated as a Markov decision process (MDP), and a Deep Deterministic Policy Gradient (DDPG) algorithm is adopted to efficiently handle the high-dimensional continuous action space of distributed generators and energy storage systems (ESS). The system state incorporates renewable generation, load demand, electricity price, and ESS operational conditions, while the reward function is designed as the negative of the operational cost with penalty terms for constraint violations. A continuous-action policy network is developed to directly generate control commands without action discretization, enabling smooth and flexible scheduling. Simulation studies are conducted on an extended European low-voltage microgrid test system under both deterministic and stochastic operating scenarios. The proposed approach is compared with model-based methods (MPC and MINLP) and representative DRL algorithms (SAC and PPO). The results show that the proposed DDPG-based strategy achieves competitive economic performance, fast convergence, and good adaptability to different initial ESS conditions. In stochastic environments, the proposed method maintains operating costs close to the optimal MINLP reference while significantly reducing the online computational time. These findings demonstrate that the proposed framework provides an efficient and practical solution for the real-time economic dispatch of microgrids with high renewable penetration. Full article
Show Figures

Figure 1

24 pages, 4975 KB  
Article
Disturbance Observer-Based Actor–Critic Reinforcement Learning with Adaptive Reward for Energy-Efficient Control of Robotic Manipulators
by Le Thi Minh Tam, Nguyen Viet Ngu, Duc Hung Pham and V. T. Mai
Actuators 2026, 15(3), 167; https://doi.org/10.3390/act15030167 - 16 Mar 2026
Viewed by 288
Abstract
Reinforcement learning controllers for robot manipulators depend strongly on reward tuning, and fixed weights may yield poor trade-offs under uncertainty and disturbances. This paper proposes a disturbance observer-based actor–critic RL (DOB–ACRL) with adaptive multi-objective reward shaping for a torque-saturated 2-DOF manipulator, where the [...] Read more.
Reinforcement learning controllers for robot manipulators depend strongly on reward tuning, and fixed weights may yield poor trade-offs under uncertainty and disturbances. This paper proposes a disturbance observer-based actor–critic RL (DOB–ACRL) with adaptive multi-objective reward shaping for a torque-saturated 2-DOF manipulator, where the reward weights are updated online using normalized indicators of tracking error, control energy, and effort. A Lyapunov analysis guarantees the uniform ultimate boundedness of closed-loop signals. The simulations show improved learning and performance over a static reward actor–critic baseline, reducing the RMS tracking error by up to 22.8%, the control energy by ~4.6%, the control effort by 1.9%, and the settling time by up to 29.2%. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

27 pages, 7476 KB  
Article
Real-Time Embedded Smart-Particle Monitoring for Index-Based Evaluation of Asphalt Mixture Compaction Quality
by Min Xiao, Xilan Yu, Wei Min, Fengteng Liu, Yongwei Li, Haojie Duan, Feng Liu, Hairui Wu and Xunhao Ding
Sensors 2026, 26(6), 1822; https://doi.org/10.3390/s26061822 - 13 Mar 2026
Viewed by 234
Abstract
Compaction quality governs asphalt pavement durability, but conventional density checks are intermittent. Reliable compaction control of asphalt mixtures requires real-time information on internal responses rather than relying solely on endpoint density measurements. In this study, an embedded smart-particle framework is developed for in [...] Read more.
Compaction quality governs asphalt pavement durability, but conventional density checks are intermittent. Reliable compaction control of asphalt mixtures requires real-time information on internal responses rather than relying solely on endpoint density measurements. In this study, an embedded smart-particle framework is developed for in situ monitoring and index-based evaluation of vibratory compaction quality, integrating multi-source sensing, feature extraction, and compaction degree mapping. The smart particle integrates inertial/orientation sensing together with thermal–mechanical measurements, and its high-temperature survivability and calibratability are verified through thermal exposure and calibration tests. During laboratory vibratory compaction of representative asphalt mixtures, raw signals are converted into stable attitude responses via attitude estimation and filtering; posture-dominant descriptors are then extracted and used to establish a data-driven mapping from internal responses to compaction degree using regression models. Results show that the device remains stable under typical hot-mix asphalt conditions, with calibration exhibiting high linearity (temperature channel R2 > 0.990; force channel R2 > 0.980 in the relevant range). Filtering markedly enhances inertial-signal usability under strong vibration and improves the interpretability of attitude-response evolution during compaction. The evolution of attitude features is consistent with the “rapid-to-slow densification” process, yielding correlations of |r| ≈ 0.35–0.47 with compaction degree evolution. Nonlinear regressors outperform linear baselines, and the better-performing nonlinear models achieve strong predictive performance across all six specimens, with R2 values reaching 0.740–0.960 and RMSE reaching 0.016–0.043. Moreover, machine-learning-based feature-importance analysis reveals distinct mixture-type-dependent characteristics, indicating that AC and SMA transmit compaction-state information through partly different dominant response features. These findings demonstrate the feasibility of embedded smart particles for online compaction-quality evaluation and provide a basis for real-time feedback in intelligent compaction. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

30 pages, 5358 KB  
Article
Peak Shaving and Solar Utilization for Sustainable Campus EV Charging Using Reinforcement Learning Approach
by Heba M. Abdullah, Adel Gastli, Lazhar Ben-Brahim and Shirazul Islam
Sustainability 2026, 18(6), 2737; https://doi.org/10.3390/su18062737 - 11 Mar 2026
Viewed by 262
Abstract
To reduce the carbon footprint, electric vehicles (EVs) are considered an alternative transportation choice. However, increased use of EVs could lead to overloading the existing power network when accounting for all installed chargers. With the increasing deployment of EV chargers, universities are potential [...] Read more.
To reduce the carbon footprint, electric vehicles (EVs) are considered an alternative transportation choice. However, increased use of EVs could lead to overloading the existing power network when accounting for all installed chargers. With the increasing deployment of EV chargers, universities are potential locations for the oversized power network issue. This paper applies reinforcement learning (RL) to optimize for EV charging infrastructure at the university scale using real-world data, directly contributing to sustainable energy management by reducing grid burden and increasing renewable energy utilization. The RL-based charger aims to reduce the burden on the grid while increasing renewable energy utilization. This study investigated practical relevance in real-world systems, considering three demand scenarios: random, stochastic historical demand from Qatar University, and actual online data from Caltech University. Three RL algorithms—Deep Q-Network (DQN), Advantage Actor–Critic (A2C), and Proximal Policy Optimization (PPO)—are applied. While training, the historical stochastic data requires more tuning of the RL framework than the random demand, emphasizing the importance of realistic demand profiles. The performance of the RL approach depends on the type of demand. The results show that the proposed RL approach can efficiently mitigate the peak charging currents. For the Qatar University historical demand scenario, the PPO algorithm minimized the peak charging currents by 50% relative to uncontrolled charging (160 A to 80 A) and Model Predictive Control maintained the energy transfer capability at 99.710%. For the random demand type, the peak charging currents are minimized by 38.3% as compared to uncontrolled charging (128 A to 79 A), with a nominal reduction in energy transfer capability to 95.89%. Scalability is tested by integrating the model into the IEEE-33 bus network. Without solar integration, the proposed RL-based EV charging management model improves the voltage drop by 0.05 p.u., leading to reduction in the line losses by 17% as compared to the MPC benchmark method and by 32% as compared to the uncontrolled charging scheme. Further, the proposed RL approach leads to a 9% reduction in line current during peak hours in the IEEE-33 bus system. With solar integration into the IEEE-bus system, the proposed framework of the RL approach improved the sustainability of the charging infrastructures by enhancing solar energy utilization by 42.5%. These findings validate the applicability of the proposed model used for optimizing the sustainable EV charging infrastructure while managing the charging coordination problem. Full article
Show Figures

Figure 1

Back to TopTop