Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (207)

Search Parameters:
Keywords = double deep Q-networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1430 KB  
Article
Microgrid Operation Optimization Strategy Based on CMDP-D3QN-MSRM Algorithm
by Jiayu Kang, Yushun Zeng and Qian Wei
Electronics 2025, 14(18), 3654; https://doi.org/10.3390/electronics14183654 - 15 Sep 2025
Viewed by 235
Abstract
This paper addresses the microgrid operation optimization challenges arising from the variability in and uncertainty and complex power flow constraints of distributed power sources. A novel method is proposed, based on an improved Dual-Competitive Deep Q-Network (D3QN) algorithm, which is enhanced by a [...] Read more.
This paper addresses the microgrid operation optimization challenges arising from the variability in and uncertainty and complex power flow constraints of distributed power sources. A novel method is proposed, based on an improved Dual-Competitive Deep Q-Network (D3QN) algorithm, which is enhanced by a multi-stage reward mechanism (MSRM) and formulated within a Constrained Markov Decision Process (CMDP) framework. First, the reward mechanism of the D3QN algorithm is optimized by introducing a redesigned MSRM, enhancing the training efficiency and the optimality of trained agents. Second, the microgrid operation optimization problem is modeled as a CMDP, thereby enhancing the algorithm’s capacity for handling complex constraints. Finally, numerical experiments demonstrate that our method reduces operating costs by 16.5%, achieves a better convergence performance, and curtails bus voltage fluctuations by over 40%, significantly improving the economic efficiency and operational stability of microgrids. Full article
Show Figures

Figure 1

27 pages, 4238 KB  
Article
A Scalable Reinforcement Learning Framework for Ultra-Reliable Low-Latency Spectrum Management in Healthcare Internet of Things
by Adeel Iqbal, Ali Nauman, Tahir Khurshaid and Sang-Bong Rhee
Mathematics 2025, 13(18), 2941; https://doi.org/10.3390/math13182941 - 11 Sep 2025
Viewed by 302
Abstract
Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This [...] Read more.
Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This paper proposed a multi-agent reinforcement learning (MARL) framework for dynamic, priority-aware spectrum management (PASM), where cooperative MARL agents jointly optimize throughput, latency, energy efficiency, fairness, and blocking probability under varying traffic and channel conditions. Six learning strategies are developed and compared, including Q-Learning, Double Q-Learning, Deep Q-Network (DQN), Actor–Critic, Dueling DQN, and Proximal Policy Optimization (PPO), within a simulated H-IoT environment that captures heterogeneous traffic, device priorities, and realistic URLLC constraints. A comprehensive simulation study across scalable scenarios ranging from 3 to 50 devices demonstrated that PPO consistently outperforms all baselines, improving mean throughput by 6.2%, reducing 95th-percentile delay by 11.5%, increasing energy efficiency by 11.9%, lowering blocking probability by 33.3%, and accelerating convergence by 75.8% compared to the strongest non-PPO baseline. These findings establish PPO as a robust and scalable solution for QoS-compliant spectrum management in dense H-IoT environments, while Dueling DQN emerges as a competitive deep RL alternative. Full article
Show Figures

Figure 1

26 pages, 4054 KB  
Article
Multi-Time-Scale Demand Response Optimization in Active Distribution Networks Using Double Deep Q-Networks
by Wei Niu, Jifeng Li, Zongle Ma, Wenliang Yin and Liang Feng
Energies 2025, 18(18), 4795; https://doi.org/10.3390/en18184795 - 9 Sep 2025
Viewed by 430
Abstract
This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle [...] Read more.
This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle (EV) participants in a 24 h rolling horizon. By incorporating a structured state representation—including forecasted load, photovoltaic (PV) output, dynamic pricing, historical DR actions, and voltage states—the agent autonomously learns control policies that minimize total operational costs while maintaining grid feasibility and voltage stability. The physical system is modeled via detailed constraints, including power flow balance, voltage magnitude bounds, PV curtailment caps, deferrable load recovery windows, and user-specific availability envelopes. A case study based on a modified IEEE 33-bus distribution network with embedded PV and DR nodes demonstrates the framework’s effectiveness. Simulation results show that the proposed method achieves significant cost savings (up to 35% over baseline), enhances PV absorption, reduces load variance by 42%, and maintains voltage profiles within safe operational thresholds. Training curves confirm smooth Q-value convergence and stable policy performance, while spatiotemporal visualizations reveal interpretable DR behavior aligned with both economic and physical system constraints. This work contributes a scalable, model-free approach for intelligent DR coordination in smart grids, integrating learning-based control with physical grid realism. The modular design allows for future extension to multi-agent systems, storage coordination, and market-integrated DR scheduling. The results position Double DQN as a promising architecture for operational decision-making in AI-enabled distribution networks. Full article
Show Figures

Figure 1

22 pages, 763 KB  
Article
Optimizing TSCH Scheduling for IIoT Networks Using Reinforcement Learning
by Sahar Ben Yaala, Sirine Ben Yaala and Ridha Bouallegue
Technologies 2025, 13(9), 400; https://doi.org/10.3390/technologies13090400 - 3 Sep 2025
Viewed by 473
Abstract
In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the [...] Read more.
In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the industrial Internet of Things. TSCH relies on time synchronization and channel hopping to improve performance and reduce energy consumption. Despite these characteristics, configuring an efficient schedule under varying traffic conditions and interference scenarios remains a challenging problem. The exploitation of reinforcement learning (RL) techniques offers a promising approach to address this challenge. AI enables TSCH to dynamically adapt its scheduling based on real-time network conditions, making decisions that optimize key performance criteria such as energy efficiency, reliability, and latency. By learning from the environment, reinforcement learning can reconfigure schedules to mitigate interference scenarios and meet traffic demands. In this work, we compare various reinforcement learning (RL) algorithms in the context of the TSCH environment. In particular, we evaluate the deep Q-network (DQN), double deep Q-network (DDQN), and prioritized DQN (PER-DQN). We focus on the convergence speed of these algorithms and their capacity to adapt the schedule. Our results show that the PER-DQN algorithm improves the packet delivery ratio and achieves faster convergence compared to DQN and DDQN, demonstrating its effectiveness for dynamic TSCH scheduling in Industrial IoT environments. These quantifiable improvements highlight the potential of prioritized experience replay to enhance reliability and efficiency under varying network conditions. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

24 pages, 6077 KB  
Article
Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads
by Yuying Fang, Pengwei Wang, Song Gao, Binbin Sun, Qing Zhang and Yuhua Zhang
Technologies 2025, 13(9), 394; https://doi.org/10.3390/technologies13090394 - 1 Sep 2025
Viewed by 498
Abstract
To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time [...] Read more.
To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time domain, a low-computational-cost adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed to realize the real-time adjustment of prediction time domain change with vehicle speed and road curvature. Secondly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm; then, the convergence speed and control effectiveness of the tracking controller are improved. Finally, hardware-in-the-loop tests and real vehicle tests are conducted. The results show that the proposed adaptive predictive horizon controller (DQN-AP-MPC) solves the problem of poor control performance caused by fixed predictive time domain and fixed weight matrix values, significantly improving the tracking accuracy of intelligent vehicles under different road conditions. Especially under variable curvature and high-speed conditions, the proposed controller reduces the maximum lateral error by 76.81% compared to the unimproved MPC controller, and reduces the average absolute error by 64.44%. The proposed controller has a faster convergence speed and better trajectory tracking performance when tested on variable curvature road conditions and double lane roads. Full article
(This article belongs to the Section Manufacturing Technology)
Show Figures

Figure 1

17 pages, 2179 KB  
Article
Federated Multi-Agent DRL for Task Offloading in Vehicular Edge Computing
by Hongwei Zhao, Yu Li, Zhixi Pang and Zihan Ma
Electronics 2025, 14(17), 3501; https://doi.org/10.3390/electronics14173501 - 1 Sep 2025
Viewed by 770
Abstract
With the expansion of vehicle-to-everything (V2X) networks and the rising demand for intelligent services, vehicle edge computing encounters heightened requirements for more efficient task offloading. This study proposes a task offloading technique that utilizes federated collaboration and multi-agent deep reinforcement learning to reduce [...] Read more.
With the expansion of vehicle-to-everything (V2X) networks and the rising demand for intelligent services, vehicle edge computing encounters heightened requirements for more efficient task offloading. This study proposes a task offloading technique that utilizes federated collaboration and multi-agent deep reinforcement learning to reduce system latency and energy consumption. The task offloading issue is formulated as a Markov decision process (MDP), and a framework utilizing the Multi-Agent Dueling Double Deep Q-Network (MAD3QN) is developed to facilitate agents in making optimal offloading decisions inside intricate environments. Secondly, Federated Learning (FL) is implemented during the training phase, leveraging local training outcomes from many vehicles to enhance the global model, thus augmenting the learning efficiency of the agents. Experimental results indicate that, compared to conventional baseline algorithms, the proposed method decreases latency and energy consumption by at least 10% and 9%, respectively, while enhancing the average reward by at least 21%. Full article
Show Figures

Figure 1

26 pages, 9891 KB  
Article
Real-Time Energy Management of a Microgrid Using MPC-DDQN-Controlled V2H and H2V Operations with Renewable Energy Integration
by Mohammed Alsolami, Ahmad Alferidi and Badr Lami
Energies 2025, 18(17), 4622; https://doi.org/10.3390/en18174622 - 30 Aug 2025
Viewed by 552
Abstract
This paper presents the design and implementation of an Intelligent Home Energy Management System in a smart home. The system is based on an economically decentralized hybrid concept that includes photovoltaic technology, a proton exchange membrane fuel cell, and a hydrogen refueling station, [...] Read more.
This paper presents the design and implementation of an Intelligent Home Energy Management System in a smart home. The system is based on an economically decentralized hybrid concept that includes photovoltaic technology, a proton exchange membrane fuel cell, and a hydrogen refueling station, which together provide a reliable, secure, and clean power supply for smart homes. The proposed design enables power transfer between Vehicle-to-Home (V2H) and Home-to-Vehicle (H2V) systems, allowing electric vehicles to function as mobile energy storage devices at the grid level, facilitating a more adaptable and autonomous network. Our approach employs Double Deep Q-networks for adaptive control and forecasting. A Multi-Agent System coordinates actions between home appliances, energy storage systems, electric vehicles, and hydrogen power devices to ensure effective and cost-saving energy distribution for users of the smart grid. The design validation is carried out through MATLAB/Simulink-based simulations using meteorological data from Tunis. Ultimately, the V2H/H2V system enhances the utilization, reliability, and cost-effectiveness of residential energy systems compared with other management systems and conventional networks. Full article
(This article belongs to the Section A1: Smart Grids and Microgrids)
Show Figures

Figure 1

14 pages, 1932 KB  
Article
Stealth UAV Path Planning Based on DDQN Against Multi-Radar Detection
by Lei Bao, Zhengtao Guo, Xianzhong Gao and Chaolong Li
Aerospace 2025, 12(9), 774; https://doi.org/10.3390/aerospace12090774 - 28 Aug 2025
Viewed by 421
Abstract
Considering the dynamic RCS characteristics of stealthy UAVs, we proposed a stealthy UAV path planning algorithm based on the Double Deep Q-Network (DDQN). By introducing the reinforcement learning model that can interact with the environment, the stealth UAV adjusts the path planning strategy [...] Read more.
Considering the dynamic RCS characteristics of stealthy UAVs, we proposed a stealthy UAV path planning algorithm based on the Double Deep Q-Network (DDQN). By introducing the reinforcement learning model that can interact with the environment, the stealth UAV adjusts the path planning strategy through the rewards obtained from the environment to design the optimal path in real-time. Specifically, by considering the effect of RCS from different angles on the detection probability of the air defense radar, the stealth UAV realizes the iterative optimization of the path planning scheme to improve the reliability of the penetration path. Under the guidance of a goal-directed composite reward function proposed, the convergence speed of the stealth UAV path planning algorithm is improved. The simulation results show that the stealth UAV can reach the target position with the optimal path while avoiding the threat zone. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

30 pages, 3950 KB  
Article
A Modular Hybrid SOC-Estimation Framework with a Supervisor for Battery Management Systems Supporting Renewable Energy Integration in Smart Buildings
by Mehmet Kurucan, Panagiotis Michailidis, Iakovos Michailidis and Federico Minelli
Energies 2025, 18(17), 4537; https://doi.org/10.3390/en18174537 - 27 Aug 2025
Cited by 1 | Viewed by 579
Abstract
Accurate state-of-charge (SOC) estimation is crucial in smart-building energy management systems, where rooftop photovoltaics and lithium-ion energy storage systems must be coordinated to align renewable generation with real-time demand. This paper introduces a novel, modular hybrid framework for SOC estimation, which synergistically combines [...] Read more.
Accurate state-of-charge (SOC) estimation is crucial in smart-building energy management systems, where rooftop photovoltaics and lithium-ion energy storage systems must be coordinated to align renewable generation with real-time demand. This paper introduces a novel, modular hybrid framework for SOC estimation, which synergistically combines the predictive power of artificial neural networks (ANNs), the logical consistency of finite state automata (FSA), and an adaptive dynamic supervisor layer. Three distinct ANN architectures—feedforward neural network (FFNN), long short-term memory (LSTM), and 1D convolutional neural network (1D-CNN)—are employed to extract comprehensive temporal and spatial features from raw data. The inherent challenge of ANNs producing physically irrational SOC values is handled by processing their raw predictions through an FSA module, which constrains physical validity by applying feasible transitions and domain constraints based on battery operational states. To further enhance the adaptability and robustness of the framework, two advanced supervisor mechanisms are developed for model selection during estimation. A lightweight rule-based supervisor picks a model transparently using recent performance scores and quick signal heuristics, whereas a more advanced double deep Q-network (DQN) reinforcement-learning supervisor continuously learns from reward feedback to adaptively choose the model that minimizes SOC error under changing conditions. This RL agent dynamically selects the most suitable ANN+FSA model, significantly improving performance under varying and unpredictable operational conditions. Comprehensive experimental validation demonstrates that the hybrid approach consistently outperforms raw ANN predictions and conventional extended Kalman filter (EKF)-based methods. Notably, the RL-based supervisor exhibits good adaptability and achieves lower error results in challenging high-variance scenarios. Full article
(This article belongs to the Section G: Energy and Buildings)
Show Figures

Figure 1

24 pages, 11770 KB  
Article
Secure Communication and Resource Allocation in Double-RIS Cooperative-Aided UAV-MEC Networks
by Xi Hu, Hongchao Zhao, Dongyang He and Wujie Zhang
Drones 2025, 9(8), 587; https://doi.org/10.3390/drones9080587 - 19 Aug 2025
Viewed by 519
Abstract
In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC [...] Read more.
In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC optimization scheme, leveraging their joint reflection to build multi-dimensional signal paths, boosting legitimate link gains while suppressing eavesdropping channels. It considers double-RIS phase shifts, ground user (GU) transmission power, UAV trajectories, resource allocation, and receiving beamforming, aiming to maximize secure energy efficiency (EE) while ensuring long-term stability of GU and UAV task queues. Given random task arrivals and high-dimensional variable coupling, a dynamic model integrating queue stability and secure transmission constraints is built using Lyapunov optimization, transforming long-term stochastic optimization into slot-by-slot deterministic decisions via the drift-plus-penalty method. To handle high-dimensional continuous spaces, an end-to-end proximal policy optimization (PPO) framework is designed for online learning of multi-dimensional resource allocation and direct acquisition of joint optimization strategies. Simulation results show that compared with benchmark schemes (e.g., single RIS, non-cooperative double RIS) and reinforcement learning algorithms (e.g., advantage actor–critic (A2C), deep deterministic policy gradient (DDPG), deep Q-network (DQN)), the proposed scheme achieves significant improvements in secure EE and queue stability, with faster convergence and better optimization effects, fully verifying its superiority and robustness in complex scenarios. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

20 pages, 5378 KB  
Article
An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment
by Jiaoru Huang, Huxin Li, Chaobo Chen, Yushuang Liu and Xiaoyan Zhang
Appl. Sci. 2025, 15(16), 8942; https://doi.org/10.3390/app15168942 - 13 Aug 2025
Viewed by 561
Abstract
With the widespread application of unmanned aerial vehicle technology in search and detection, express delivery and other fields, the requirements for unmanned aerial vehicle dynamic area coverage algorithms has become higher. For an unknown dynamic environment, an improved Dual-Attention Mechanism Double Deep Q-network [...] Read more.
With the widespread application of unmanned aerial vehicle technology in search and detection, express delivery and other fields, the requirements for unmanned aerial vehicle dynamic area coverage algorithms has become higher. For an unknown dynamic environment, an improved Dual-Attention Mechanism Double Deep Q-network area coverage algorithm is proposed in this paper. Firstly, a dual-channel attention mechanism is designed to deal with flight environment information. It can extract and fuse the features of the local obstacle information and full-area coverage information. Then, based on the traditional Double Deep Q-network algorithm, an adaptive exploration decay strategy and a coverage reward function are designed based on the real-time area coverage rate to meet the requirement of a low repeated coverage rate. The proposed algorithm can avoid dynamic obstacles and achieve global coverage under low repeated coverage rate conditions. Finally, with Python 3.12 and PyTorch 2.2.1 environment as the training platform, the simulation results show that, compared with the Soft Actor–Critic algorithm, the Double Deep Q-network algorithm, and the Attention Mechanism Double Deep Q-network algorithm, the proposed algorithm in this paper can complete the area coverage task in a dynamic and complex environment with a lower repeated coverage rate and higher coverage efficiency. Full article
(This article belongs to the Special Issue Advances in Unmanned Aerial Vehicle (UAV) System)
Show Figures

Figure 1

28 pages, 4548 KB  
Article
A Deep Reinforcement Learning Framework for Strategic Indian NIFTY 50 Index Trading
by Raj Gaurav Mishra, Dharmendra Sharma, Mahipal Gadhavi, Sangeeta Pant and Anuj Kumar
AI 2025, 6(8), 183; https://doi.org/10.3390/ai6080183 - 11 Aug 2025
Viewed by 1351
Abstract
This paper presents a comprehensive deep reinforcement learning (DRL) framework for developing strategic trading models tailored to the Indian NIFTY 50 index, leveraging the temporal and nonlinear nature of financial markets. Three advanced DRL architectures deep Q-network (DQN), double deep Q-network (DDQN), and [...] Read more.
This paper presents a comprehensive deep reinforcement learning (DRL) framework for developing strategic trading models tailored to the Indian NIFTY 50 index, leveraging the temporal and nonlinear nature of financial markets. Three advanced DRL architectures deep Q-network (DQN), double deep Q-network (DDQN), and dueling double deep Q-network (Dueling DDQN) were implemented and empirically evaluated. Using a decade-long dataset of 15-min interval OHLC data enriched with technical indicators such as the exponential moving average (EMA), pivot points, and multiple supertrend configurations, the models were trained using prioritized experience replay, epsilon-greedy exploration strategies, and softmax sampling mechanisms. A test set comprising one year of unseen data (May 2024–April 2025) was used to assess generalization performance across key financial metrics, including Sharpe ratio, profit factor, win rate, and trade frequency. Each architecture was analyzed in three progressively sophisticated variants incorporating enhancements in reward shaping, exploration–exploitation balancing, and penalty-based trade constraints. DDQN V3 achieved a Sharpe ratio of 0.7394, a 73.33% win rate, and a 16.58 profit factor across 15 trades, indicating strong volatility-adjusted suitability for real-world deployment. In contrast, the Dueling DDQN V3 achieved a high Sharpe ratio of 1.2278 and a 100% win rate but with only three trades, indicating an excessive conservatism. The DQN V1 model served as a strong baseline, outperforming passive strategies but exhibiting limitations due to Q-value overestimation. The novelty of this work lies in its systematic exploration of DRL variants integrated with enhanced exploration mechanisms and reward–penalty structures, rigorously applied to high-frequency trading on the NIFTY 50 index within an emerging market context. Our findings underscore the critical importance of architectural refinements, dynamic exploration strategies, and trade regularization in stabilizing learning and enhancing profitability in DRL-based intelligent trading systems. Full article
(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)
Show Figures

Figure 1

44 pages, 6212 KB  
Article
A Hybrid Deep Reinforcement Learning Architecture for Optimizing Concrete Mix Design Through Precision Strength Prediction
by Ali Mirzaei and Amir Aghsami
Math. Comput. Appl. 2025, 30(4), 83; https://doi.org/10.3390/mca30040083 - 3 Aug 2025
Viewed by 1215
Abstract
Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework [...] Read more.
Concrete mix design plays a pivotal role in ensuring the mechanical performance, durability, and sustainability of construction projects. However, the nonlinear interactions among the mix components challenge traditional approaches in predicting compressive strength and optimizing proportions. This study presents a two-stage hybrid framework that integrates deep learning with reinforcement learning to overcome these limitations. First, a Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM) model was developed to capture spatial–temporal patterns from a dataset of 1030 historical concrete samples. The extracted features were enhanced using an eXtreme Gradient Boosting (XGBoost) meta-model to improve generalizability and noise resistance. Then, a Dueling Double Deep Q-Network (Dueling DDQN) agent was used to iteratively identify optimal mix ratios that maximize the predicted compressive strength. The proposed framework outperformed ten benchmark models, achieving an MAE of 2.97, RMSE of 4.08, and R2 of 0.94. Feature attribution methods—including SHapley Additive exPlanations (SHAP), Elasticity-Based Feature Importance (EFI), and Permutation Feature Importance (PFI)—highlighted the dominant influence of cement content and curing age, as well as revealing non-intuitive effects such as the compensatory role of superplasticizers in low-water mixtures. These findings demonstrate the potential of the proposed approach to support intelligent concrete mix design and real-time optimization in smart construction environments. Full article
(This article belongs to the Section Engineering)
Show Figures

Figure 1

31 pages, 3480 KB  
Article
The First Step of AI in LEO SOPs: DRL-Driven Epoch Credibility Evaluation to Enhance Opportunistic Positioning Accuracy
by Jiaqi Yin, Feilong Li, Ruidan Luo, Xiao Chen, Linhui Zhao, Hong Yuan and Guang Yang
Remote Sens. 2025, 17(15), 2692; https://doi.org/10.3390/rs17152692 - 3 Aug 2025
Cited by 1 | Viewed by 444
Abstract
Low Earth orbit (LEO) signal of opportunity (SOP) positioning relies on the accumulation of epochs obtained through prolonged observation periods. The contribution of an LEO satellite single epoch to positioning accuracy is influenced by multi-level characteristics that are challenging for traditional models. To [...] Read more.
Low Earth orbit (LEO) signal of opportunity (SOP) positioning relies on the accumulation of epochs obtained through prolonged observation periods. The contribution of an LEO satellite single epoch to positioning accuracy is influenced by multi-level characteristics that are challenging for traditional models. To address this limitation, we propose an Agent-Weighted Recursive Least Squares (RLS) Positioning Framework (AWR-PF). This framework employs an agent to comprehensively analyze individual epoch characteristics, assess their credibility, and convert them into adaptive weights for RLS iterations. We developed a novel Markov Decision Process (MDP) model to assist the agent in addressing the epoch weighting problem and trained the agent utilizing the Double Deep Q-Network (DDQN) algorithm on 107 h of Iridium signal data. Experimental validation on a separate 28 h Iridium signal test set through 97 positioning trials demonstrated that AWR-PF achieves superior average positioning accuracy compared to both standard RLS and randomly weighted RLS throughout nearly the entire iterative process. In a single positioning trial, AWR-PF improves positioning accuracy by up to 45.15% over standard RLS. To the best of our knowledge, this work represents the first instance where an AI algorithm is used as the core decision-maker in LEO SOP positioning, establishing a groundbreaking paradigm for future research. Full article
(This article belongs to the Special Issue LEO-Augmented PNT Service)
Show Figures

Graphical abstract

17 pages, 3062 KB  
Article
Spatiotemporal Risk-Aware Patrol Planning Using Value-Based Policy Optimization and Sensor-Integrated Graph Navigation in Urban Environments
by Swarnamouli Majumdar, Anjali Awasthi and Lorant Andras Szolga
Appl. Sci. 2025, 15(15), 8565; https://doi.org/10.3390/app15158565 - 1 Aug 2025
Viewed by 566
Abstract
This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal [...] Read more.
This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal graph, capturing the evolving intensity and distribution of criminal activity across neighborhoods and time windows. The agent’s state space incorporates synthetic AV sensor inputs—including fuel level, visual anomaly detection, and threat signals—to reflect real-world operational constraints. We evaluate and compare three learning strategies: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), and Proximal Policy Optimization (PPO). Experimental results show that DDQN outperforms DQN in convergence speed and reward accumulation, while PPO demonstrates greater adaptability in sensor-rich, high-noise conditions. Real-map simulations and hourly risk heatmaps validate the effectiveness of our approach, highlighting its potential to inform scalable, data-driven patrol strategies in next-generation smart cities. Full article
(This article belongs to the Special Issue AI-Aided Intelligent Vehicle Positioning in Urban Areas)
Show Figures

Figure 1

Back to TopTop