MDPI - Publisher of Open Access Journals

21 pages, 2353 KB

Open AccessArticle

An Adaptive Bidding Strategy for Virtual Power Plants in Day-Ahead Markets Under Multiple Uncertainties

by Wei Yang and Wenjun Wang

Energies 2026, 19(8), 1878; https://doi.org/10.3390/en19081878 - 12 Apr 2026

Viewed by 372

To address the challenges posed by multiple uncertainties in modern power systems to the market bidding of Virtual Power Plants (VPPs), this paper proposes an adaptive bidding strategy based on Deep Reinforcement Learning (DRL). First, a heterogeneous VPP aggregation model integrating dedicated energy [...] Read more.

To address the challenges posed by multiple uncertainties in modern power systems to the market bidding of Virtual Power Plants (VPPs), this paper proposes an adaptive bidding strategy based on Deep Reinforcement Learning (DRL). First, a heterogeneous VPP aggregation model integrating dedicated energy storage, Vehicle-to-Grid (V2G), and flexible loads is constructed, incorporating complex physical and operational constraints. Second, to overcome the “myopic” local optimality problem of traditional DRL in temporal arbitrage tasks, a potential-based reward shaping mechanism linked to future price trends is designed to guide the agent toward long-term optimal strategies. Finally, multi-dimensional comparative experiments and mechanism analyses are conducted in a simulated day-ahead electricity market. Simulation results demonstrate the following: (1) The proposed algorithm exhibits robust convergence stability and effectively handles stochastic noise in market prices and renewable generation. (2) Economically, the strategy significantly outperforms the rule-based strategy and remains highly competitive with the deterministic-optimization benchmark under perfect-information assumptions. (3) Mechanism analysis further reveals that the DRL agent breaks through the rigid logic of fixed thresholds, learning a non-linear dynamic game mechanism based on “Price-SOC” states, thereby achieving full-depth utilization of energy storage resources. This work provides an interpretable data-driven paradigm for intelligent VPP decision-making in uncertain environments. Full article

(This article belongs to the Special Issue Transforming Power Systems and Smart Grids with Deep Learning)

► Show Figures

Figure 1

22 pages, 1170 KB

Open AccessArticle

Adverse Drug Reaction Detection on Social Media Based on Large Language Models

by Hao Li and Hongfei Lin

Information 2026, 17(4), 352; https://doi.org/10.3390/info17040352 - 7 Apr 2026

Viewed by 345

Abstract

Adverse drug reaction (ADR) detection is essential for ensuring drug safety and effective pharmacovigilance. The rapid growth of users’ medication reviews posted on social media has introduced a valuable new data source for ADR detection. However, the large scale and high noise inherent [...] Read more.

Adverse drug reaction (ADR) detection is essential for ensuring drug safety and effective pharmacovigilance. The rapid growth of users’ medication reviews posted on social media has introduced a valuable new data source for ADR detection. However, the large scale and high noise inherent in social media text pose substantial challenges to existing detection methods. Although large language models (LLMs) exhibit strong robustness to noisy and interfering information, they are often limited by issues such as stochastic outputs and hallucinations. To address these challenges, this paper proposes two generative detection frameworks based on Chain of Thought (CoT), namely LLaMA-DetectionADR for Supervised Fine-Tuning (SFT) and DetectionADRGPT for low-resource in-context learning. LLaMA-DetectionADR automatically generates CoT reasoning sequences to construct an instruction tuning dataset, which is then used to fine-tune the LLaMA3-8B model via Quantized Low-Rank Adaptation (QLoRA). In contrast, DetectionADRGPT leverages clustering algorithms to select representative unlabeled samples and enhances in-context learning by incorporating CoT reasoning paths together with their corresponding labels. Experimental results on the Twitter and CADEC social media datasets show that LLaMA-DetectionADR achieves excellent performance, with F1 scores of 92.67% and 86.13%, respectively. Meanwhile, DetectionADRGPT obtains competitive F1 scores of 87.29% and 82.80% with only a few labeled examples, approaching the performance of fully supervised advanced models. The overall results demonstrate the effectiveness and practical value of the proposed CoT-based generative frameworks for ADR detection from social media. Full article

(This article belongs to the Topic Generative AI and Interdisciplinary Applications)

► Show Figures

Figure 1

22 pages, 5539 KB

Open AccessArticle

Artificial Neural Network-Based PID Parameter Estimation Using Black Kite Algorithm Hyperparameter Optimization for DC Motor Speed Control

by Yılmaz Seryar Arıkuşu

Biomimetics 2026, 11(4), 242; https://doi.org/10.3390/biomimetics11040242 - 3 Apr 2026

Viewed by 354

Abstract

This paper proposes a Black Kite Algorithm (BKA)-based hyperparameter optimization method for Artificial Neural Network (ANN) training, mitigating local minimum issues associated with conventional training techniques. The resulting BKA-ANN model is then employed to estimate PID controller parameters for DC motor speed regulation. [...] Read more.

This paper proposes a Black Kite Algorithm (BKA)-based hyperparameter optimization method for Artificial Neural Network (ANN) training, mitigating local minimum issues associated with conventional training techniques. The resulting BKA-ANN model is then employed to estimate PID controller parameters for DC motor speed regulation. A large-scale dataset of 100,000 samples was generated via MATLAB simulation, with reference speed and load torque stochastically varied, and optimal PID parameters determined by minimizing the ITAE criterion for each operating condition. The optimized controller was evaluated under various operating conditions including transient response, frequency domain analysis (phase margin and bandwidth), parametric robustness, and load disturbance suppression, along with control effort and energy consumption assessments. The proposed BKA-ANN approach was benchmarked against nine algorithms: hybrid atom search optimization-simulated annealing (hASO-SA), harris hawks optimization (HHO), Henry gas solubility optimization with opposition-based learning (OBL/HGSO), atom search optimization (ASO), henry gas solubility op-timization (HGSO), stochastic fractal search(SFS), grey wolf optimization (GWO), sine–cosine algorithm (SCA), and Standard ANN. Simulation results indicate that BKA-ANN achieves stable performance across all tested scenarios, with minimal oscillation and competitive settling time compared to the evaluated algorithms. Full article

(This article belongs to the Section Biological Optimisation and Management)

► Show Figures

Figure 1

28 pages, 1600 KB

Open AccessArticle

A Data-Driven Deep Reinforcement Learning Framework for Real-Time Economic Dispatch of Microgrids Under Renewable Uncertainty

by Biao Dong, Shijie Cui and Xiaohui Wang

Energies 2026, 19(6), 1481; https://doi.org/10.3390/en19061481 - 16 Mar 2026

Viewed by 322

Abstract

The real-time economic dispatch of microgrids (MGs) is challenged by the high penetration of renewable energy and the resulting source–load uncertainties. Conventional optimization-based scheduling methods rely heavily on accurate probabilistic models and often suffer from high computational burdens, which limits their real-time applicability. [...] Read more.

The real-time economic dispatch of microgrids (MGs) is challenged by the high penetration of renewable energy and the resulting source–load uncertainties. Conventional optimization-based scheduling methods rely heavily on accurate probabilistic models and often suffer from high computational burdens, which limits their real-time applicability. To address these challenges, a data-driven deep reinforcement learning (DRL) framework is proposed for real-time microgrid energy management. The MG dispatch problem is formulated as a Markov decision process (MDP), and a Deep Deterministic Policy Gradient (DDPG) algorithm is adopted to efficiently handle the high-dimensional continuous action space of distributed generators and energy storage systems (ESS). The system state incorporates renewable generation, load demand, electricity price, and ESS operational conditions, while the reward function is designed as the negative of the operational cost with penalty terms for constraint violations. A continuous-action policy network is developed to directly generate control commands without action discretization, enabling smooth and flexible scheduling. Simulation studies are conducted on an extended European low-voltage microgrid test system under both deterministic and stochastic operating scenarios. The proposed approach is compared with model-based methods (MPC and MINLP) and representative DRL algorithms (SAC and PPO). The results show that the proposed DDPG-based strategy achieves competitive economic performance, fast convergence, and good adaptability to different initial ESS conditions. In stochastic environments, the proposed method maintains operating costs close to the optimal MINLP reference while significantly reducing the online computational time. These findings demonstrate that the proposed framework provides an efficient and practical solution for the real-time economic dispatch of microgrids with high renewable penetration. Full article

(This article belongs to the Special Issue AI-Driven Sustainable Power Grids: Enhancing Cybersecurity, Operation, and Control of Conventional, Modern, and Renewable-Based Energy Systems—2nd Edition)

► Show Figures

Figure 1

26 pages, 5076 KB

Open AccessArticle

Multimodal Wildfire Classification Using Synthetic Night-Vision-like and Thermal-Inspired Image Representations

by Beyda Taşar, Ahmet Burak Tatar, Alper Kadir Tanyildizi and Oğuz Yakut

Fire 2026, 9(3), 109; https://doi.org/10.3390/fire9030109 - 2 Mar 2026

Viewed by 610

Abstract

In this study, a deep learning-based multimodal framework is presented for forest fire detection using RGB images, which synthetically generates night-vision-like, white-hot, and green-hot pseudo-thermal representations. The synthetic modalities are derived directly from RGB data and integrated into a hardware-independent multimodal learning pipeline [...] Read more.

In this study, a deep learning-based multimodal framework is presented for forest fire detection using RGB images, which synthetically generates night-vision-like, white-hot, and green-hot pseudo-thermal representations. The synthetic modalities are derived directly from RGB data and integrated into a hardware-independent multimodal learning pipeline to increase visual diversity without relying on additional sensing hardware. Each modality is processed using an ImageNet-pretrained convolutional backbone, and modality-specific feature vectors are combined through feature-level concatenation before classification. The proposed framework was evaluated using multiple backbone architectures, including ResNet18, EfficientNet-B0, and DenseNet121, which were assessed independently under a unified experimental protocol. Experiments were conducted on two datasets with substantially different scales and characteristics: the FLAME dataset (39,375 images, binary classification) and the FireStage dataset (791 images, three-class classification). For both datasets, stratified 80–20% training–validation splits were employed, and online stochastic data augmentation was applied exclusively to the training sets. On the FLAME dataset, the proposed framework achieved consistently high performance across different backbone and modality configurations. The best-performing models reached an accuracy of 99.66%, precision of 99.80%, recall of 99.66%, F1-score of 99.73%, and ROC AUC value of 0.9998. On the more challenging FireStage dataset, the framework demonstrated stable performance despite limited data availability, achieving an accuracy of 93.71% for RGB-only configurations and up to 93.08% for selected multimodal combinations, while macro-averaged F1-scores exceeded 0.92, and ROC AUC values reached up to 0.9919. Per-class analysis further indicates that early-stage fire (Start Fire) patterns can be discriminated, achieving ROC AUC values above 0.96, depending on the backbone and modality combination. Overall, the results suggest that synthetic-modality-based multimodal learning can provide competitive performance for both large-scale and data-limited fire detection scenarios, offering a flexible and hardware-independent alternative for forest fire monitoring applications. Full article

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects (2nd Edition))

► Show Figures

Figure 1

18 pages, 554 KB

Open AccessArticle

Analyzing Strategic Parental Leave Decisions Using Two-Player Multi-Agent Reinforcement Learning

by Lixue Zhao and Hyun-Rok Lee

Systems 2026, 14(2), 217; https://doi.org/10.3390/systems14020217 - 19 Feb 2026

Viewed by 343

Abstract

Despite the well-documented benefits of paid parental leave, many employees hesitate to take it. This study employs a two-player stochastic game (SG) model to analyze how various factors affect parental leave decisions. The proposed SG model incorporates (1) an employee’s perceived utility from [...] Read more.

Despite the well-documented benefits of paid parental leave, many employees hesitate to take it. This study employs a two-player stochastic game (SG) model to analyze how various factors affect parental leave decisions. The proposed SG model incorporates (1) an employee’s perceived utility from taking leave, (2) the effect of colleague’s parental leave, (3) career penalties after taking leave, and (4) a paid parental policy. To accurately obtain equilibrium strategies, we extend Nash-Q learning by incorporating backward iteration and optimistic initialization. These two methods exploit the structural properties of the model to accelerate convergence and improve solution quality. Numerical experiments reveal that a stronger willingness to take parental leave and lower career penalties increase parental leave uptake. Furthermore, the competitive career penalty, which captures interpersonal factors, is particularly influential when a colleague is less likely to take parental leave. Our results suggest that reducing career penalties can substantially increase leave uptake in typical parameter ranges, highlighting the importance of workplace policies that mitigate career penalties associated with parental leave. Full article

► Show Figures

Figure 1

28 pages, 2028 KB

Open AccessArticle

Dynamic Resource Games in the Wood Flooring Industry: A Bayesian Learning and Lyapunov Control Framework

by Yuli Wang and Athanasios V. Vasilakos

Algorithms 2026, 19(1), 78; https://doi.org/10.3390/a19010078 - 16 Jan 2026

Viewed by 327

Abstract

Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like [...] Read more.

Wood flooring manufacturers face complex challenges in dynamically allocating resources across multi-channel markets, characterized by channel conflicts, demand uncertainty, and long-term cumulative effects of decisions. Traditional static optimization or myopic approaches struggle to address these intertwined factors, particularly when critical market states like brand reputation and customer base cannot be precisely observed. This paper establishes a systematic and theoretically grounded online decision framework to tackle this problem. We first model the problem as a Partially Observable Stochastic Dynamic Game. The core innovation lies in introducing an unobservable market position vector as the central system state, whose evolution is jointly influenced by firm investments, inter-channel competition, and macroeconomic randomness. The model further captures production lead times, physical inventory dynamics, and saturation/cross-channel effects of marketing investments, constructing a high-fidelity dynamic system. To solve this complex model, we propose a hierarchical online learning and control algorithm named L-BAP (Lyapunov-based Bayesian Approximate Planning), which innovatively integrates three core modules. It employs particle filters for Bayesian inference to nonparametrically estimate latent market states online. Simultaneously, the algorithm constructs a Lyapunov optimization framework that transforms long-term discounted reward objectives into tractable single-period optimization problems through virtual debt queues, while ensuring stability of physical systems like inventory. Finally, the algorithm embeds a game-theoretic module to predict and respond to rational strategic reactions from each channel. We provide theoretical performance analysis, rigorously proving the mean-square boundedness of system queues and deriving the performance gap between long-term rewards and optimal policies under complete information. This bound clearly quantifies the trade-off between estimation accuracy (determined by particle count) and optimization parameters. Extensive simulations demonstrate that our L-BAP algorithm significantly outperforms several strong baselines—including myopic learning and decentralized reinforcement learning methods—across multiple dimensions: long-term profitability, inventory risk control, and customer service levels. Full article

(This article belongs to the Section Analysis of Algorithms and Complexity Theory)

► Show Figures

Figure 1

20 pages, 857 KB

Open AccessArticle

Hybrid Spike-Encoded Spiking Neural Networks for Real-Time EEG Seizure Detection: A Comparative Benchmark

by Ali Mehrabi, Neethu Sreenivasan, Upul Gunawardana and Gaetano Gargiulo

Biomimetics 2026, 11(1), 75; https://doi.org/10.3390/biomimetics11010075 - 16 Jan 2026

Viewed by 962

Abstract

Reliable and low-latency seizure detection from electroencephalography (EEG) is critical for continuous clinical monitoring and emerging wearable health technologies. Spiking neural networks (SNNs) provide an event-driven computational paradigm that is well suited to real-time signal processing, yet achieving competitive seizure detection performance with [...] Read more.

Reliable and low-latency seizure detection from electroencephalography (EEG) is critical for continuous clinical monitoring and emerging wearable health technologies. Spiking neural networks (SNNs) provide an event-driven computational paradigm that is well suited to real-time signal processing, yet achieving competitive seizure detection performance with constrained model complexity remains challenging. This work introduces a hybrid spike encoding scheme that combines Delta–Sigma (change-based) and stochastic rate representations, together with two spiking architectures designed for real-time EEG analysis: a compact feed-forward HybridSNN and a convolution-enhanced ConvSNN incorporating depthwise-separable convolutions and temporal self-attention. The architectures are intentionally designed to operate on short EEG segments and to balance detection performance with computational practicality for continuous inference. Experiments on the CHB–MIT dataset show that the HybridSNN attains 91.8% accuracy with an F1-score of 0.834 for seizure detection, while the ConvSNN further improves detection performance to 94.7% accuracy and an F1-score of 0.893. Event-level evaluation on continuous EEG recordings yields false-alarm rates of 0.82 and 0.62 per day for the HybridSNN and ConvSNN, respectively. Both models exhibit inference latencies of approximately 1.2 ms per 0.5 s window on standard CPU hardware, supporting continuous real-time operation. These results demonstrate that hybrid spike encoding enables spiking architectures with controlled complexity to achieve seizure detection performance comparable to larger deep learning models reported in the literature, while maintaining low latency and suitability for real-time clinical and wearable EEG monitoring. Full article

(This article belongs to the Special Issue Bioinspired Engineered Systems)

► Show Figures

Figure 1

31 pages, 10290 KB

Open AccessArticle

Enhanced Social Group Optimization Algorithm for the Economic Dispatch Problem Including Wind Power

by Dinu Călin Secui, Cristina Hora, Florin Ciprian Dan, Monica Liana Secui and Horea Nicolae Hora

Processes 2026, 14(2), 254; https://doi.org/10.3390/pr14020254 - 11 Jan 2026

Viewed by 307

Abstract

The economic dispatch (ED) problem is a major challenge in power system optimization. In this article, an Enhanced Social Group Optimization (ESGO) algorithm is presented for solving the economic dispatch problem with or without wind units, considering various characteristics related to valve-point effects, [...] Read more.

The economic dispatch (ED) problem is a major challenge in power system optimization. In this article, an Enhanced Social Group Optimization (ESGO) algorithm is presented for solving the economic dispatch problem with or without wind units, considering various characteristics related to valve-point effects, ramp-rate constraints, prohibited operating zones, and transmission power losses. The Social Group Optimization (SGO) algorithm models the social dynamics of individuals within a group—through mechanisms of collective learning, behavioral adaptation, and information exchange—and leverages these interactions to guide the population efficiently towards optimal solutions. ESGO extends SGO along three complementary directions: redefining the update relations of the original SGO, introducing stochastic operators into the heuristic mechanisms, and dynamically updating the generated solutions. These modifications aim to achieve a more robust balance between exploration and exploitation, enable flexible adaptation of search steps, and rapidly integrate improved-fitness solutions into the evolutionary process. ESGO is evaluated in six distinct cases, covering systems with 6, 40, 110, and 220 units, to demonstrate its ability to produce competitive solutions as well as its performance in terms of stability, convergence, and computational efficiency. The numerical results show that, in the vast majority of the analyzed cases, ESGO outperforms SGO and other known or improved metaheuristic algorithms in terms of cost and stability. It incorporates wind generation results at an operating cost reduction of approximately 10% compared to the thermal-only system, under the adopted linear wind power model. Moreover, relative to the size of the analyzed systems, ESGO exhibits a reduced average execution time and requires a small number of function evaluations to obtain competitive solutions. Full article

(This article belongs to the Section Energy Systems)

► Show Figures

Figure 1

26 pages, 1730 KB

Open AccessArticle

Two-Stage Game-Based Charging Optimization for a Competitive EV Charging Station Considering Uncertain Distributed Generation and Charging Behavior

by Shaohua Han, Hongji Zhu, Jinian Pang, Xuan Ge, Fuju Zhou and Min Wang

Batteries 2026, 12(1), 16; https://doi.org/10.3390/batteries12010016 - 1 Jan 2026

Viewed by 781

Abstract

The widespread adoption of electric vehicles (EVs) has turned charging demand into a substantial load on the power grid. To satisfy the rapidly growing demand of EVs, the construction of charging infrastructure has received sustained attention in recent years. As charging stations become [...] Read more.

The widespread adoption of electric vehicles (EVs) has turned charging demand into a substantial load on the power grid. To satisfy the rapidly growing demand of EVs, the construction of charging infrastructure has received sustained attention in recent years. As charging stations become more widespread, how to attract EV users in a competitive charging market while optimizing the internal charging process is the key to determine the charging station’s operational efficiency. This paper tackles this issue by presenting the following contributions. Firstly, a simulation method based on prospect theory is proposed to simulate EV users’ preferences in selecting charging stations. The selection behavior of EV users is simulated by establishing coupling relationship among the transportation network, power grid, and charging network as well as the model of users’ preference. Secondly, a two-stage joint stochastic optimization model for a charging station is developed, which considers both charging pricing and energy control. At the first stage, a Stackelberg game is employed to determine the day-ahead optimal charging price in a competitive market. At the second stage, real-time stochastic charging control is applied to maximize the operational profit of the charging station considering renewable energy integration. Finally, a scenario-based Alternating Direction Method of Multipliers (ADMM) approach is introduced in the first stage for optimal pricing learning, while a simulation-based Rollout method is applied in the second stage to update the real-time energy control strategy based on the latest pricing. Numerical results demonstrate that the proposed method can achieve as large as 33% profit improvement by comparing with the competitive charging stations considering 1000 EV integration. Full article

(This article belongs to the Special Issue Advances in Charging Systems and Charging Management Strategies for Battery Electric Vehicles)

► Show Figures

Graphical abstract

30 pages, 1488 KB

Open AccessArticle

Beyond Quaternions: Adaptive Fixed-Time Synchronization of High-Dimensional Fractional-Order Neural Networks Under Lévy Noise Disturbances

by Essia Ben Alaia, Slim Dhahri and Omar Naifar

Fractal Fract. 2025, 9(12), 823; https://doi.org/10.3390/fractalfract9120823 - 16 Dec 2025

Viewed by 608

Abstract

This paper develops a unified synchronization framework for octonion-valued fractional-order neural networks (FOOVNNs) subject to mixed delays, Lévy disturbances, and topology switching. A fractional sliding surface is constructed by combining

I^{1 - μ} e_{g}

with integral terms in powers of [...] Read more.

This paper develops a unified synchronization framework for octonion-valued fractional-order neural networks (FOOVNNs) subject to mixed delays, Lévy disturbances, and topology switching. A fractional sliding surface is constructed by combining

I^{1 - μ} e_{g}

with integral terms in powers of

| e_{g} |

. The controller includes a nonsingular term

- ρ_{2 g} s_{g}^{c_{2}} sign (s_{g})

, a disturbance-compensation term

- {\hat{θ}}_{g} sign (s_{g})

, and a delay-feedback term

- λ_{g} e_{g} (t - τ)

, while dimension-aware adaptive laws

^{C} D_{t}^{μ} ρ_{g} = k_{1 g} N {∥ s_{g} ∥}^{c_{2}}

and

^{C} D_{t}^{μ} {\hat{θ}}_{g} = k_{2 g} N ∥ s_{g} ∥

ensure scalability with network size. Fixed-time convergence is established via a fractional stochastic Lyapunov method, and predefined-time convergence follows by a time-scaling of the control channel. Markovian switching is treated through a mode-dependent Lyapunov construction and linear matrix inequality (LMI) conditions; non-Gaussian perturbations are handled using fractional Itô tools. The architecture admits observer-based variants and is implementation-friendly. Numerical results corroborate the theory: (i) Two-Node Baseline: The fixed-time design drives

{∥ e (t) ∥}_{1}

to

O (10^{- 4})

by

t \approx 0.94 s

, while the predefined-time variant meets a user-set

T_{p} = 0.5 s

with convergence at

t \approx 0.42 s

. (ii) Eight-Node Scalability: Sliding surfaces settle in an

O (1)

band, and adaptive parameter means saturate well below their ceilings. (iii) Hyperspectral (Synthetic): Reconstruction under Lévy contamination achieves a competitive PSNR consistent with hypercomplex modeling and fractional learning. (iv) Switching Robustness: under four modes and twelve random switches, the error satisfies

{max}_{t} {∥ e (t) ∥}_{1} \leq 0.15

. The results support octonion-valued, fractionally damped controllers as practical, scalable mechanisms for robust synchronization under non-Gaussian noise, delays, and time-varying topologies. Full article

(This article belongs to the Special Issue Advances in Fractional-Order Control for Nonlinear Systems)

► Show Figures

Figure 1

19 pages, 48003 KB

Open AccessArticle

Risk-Aware Distributional Reinforcement Learning for Safe Path Planning of Surface Sensing Agents

by Jihua Dou, Zhongqi Li, Yuanhao Wang, Kunpeng Ouyang, Weihao Xia, Jianxin Lin and Huachuan Wang

Electronics 2025, 14(24), 4828; https://doi.org/10.3390/electronics14244828 - 8 Dec 2025

Cited by 1 | Viewed by 894

Abstract

In spatially constrained water domains, surface sensing agents(SSAs) must achieve safe path planning, uncertain currents, and sensor noise. We present a decentralized motion planning and collision-avoidance framework based on distributional reinforcement learning (DRL) that models the full return distribution to enable risk-aware decision [...] Read more.

In spatially constrained water domains, surface sensing agents(SSAs) must achieve safe path planning, uncertain currents, and sensor noise. We present a decentralized motion planning and collision-avoidance framework based on distributional reinforcement learning (DRL) that models the full return distribution to enable risk-aware decision making. Each surface sensing agent autonomously proceeds to its designated coordinates without rigid spatial constraints, coordinating implicitly through learned policies and a lightweight safety shield that enforces separation and kinematic limits. The method integrates (i) distributional value estimation for controllable risk sensitivity near hazards, (ii) domain randomization of sea states and disturbances for robustness, and (iii) a shielded action layer compatible with standard reactive rules (e.g., velocity obstacle-style constraints) to guarantee feasible maneuvers. In simulations across cluttered maps and stochastic current fields, the proposed approach improves success rates and reduces near-miss events compared to non-distributional RL and classical planners, while maintaining competitive path length and computation time. The results indicate that DRL-based surface sensing agent navigation is a practical path toward safe, efficient environmental monitoring and surveying. Full article

(This article belongs to the Special Issue The Collaborative Perception, Localization, and Decision Control of Intelligent Vehicles)

► Show Figures

Figure 1

14 pages, 1607 KB

Open AccessArticle

Blind Image Quality Assessment Using Convolutional Neural Networks

by Mariusz Frackiewicz, Henryk Palus and Wojciech Trojanowski

Sensors 2025, 25(22), 7078; https://doi.org/10.3390/s25227078 - 20 Nov 2025

Viewed by 1090

Abstract

In the domain of image and multimedia processing, image quality is a critical factor, as it directly influences the performance of subsequent tasks such as compression, transmission, and content analysis. Reliable assessment of image quality is therefore essential not only for benchmarking algorithms [...] Read more.

In the domain of image and multimedia processing, image quality is a critical factor, as it directly influences the performance of subsequent tasks such as compression, transmission, and content analysis. Reliable assessment of image quality is therefore essential not only for benchmarking algorithms but also for ensuring user satisfaction in real-world multimedia applications. The most advanced Blind image quality assessment (BIQA) methods are typically built upon deep learning models and rely on complex architectures that, while effective, require substantial computational resources and large-scale training datasets. This complexity can limit their scalability and practical deployment, particularly in resource-constrained environments. In this paper, we revisit a model inspired by one of the early applications of convolutional neural networks (CNNs) in BIQA and demonstrate that by leveraging recent advancements in machine learning—such as Bayesian hyperparameter optimization and widely used stochastic optimization methods (e.g., Adam)—it is possible to achieve competitive performance using a simpler, more scalable, and lightweight architecture. To evaluate the proposed approach, we conducted extensive experiments on widely used benchmark datasets, including TID2013 and KADID-10k. The results show that the proposed model achieves competitive performance while maintaining a substantially more efficient design. These findings suggest that lightweight CNN-based models, when combined with modern optimization strategies, can serve as a viable alternative to more elaborate frameworks, offering an improved balance between accuracy, efficiency, and scalability. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 1540 KB

Open AccessArticle

Learning in Probabilistic Boolean Networks via Structural Policy Gradients

by Pedro Juan Rivera Torres

Entropy 2025, 27(11), 1150; https://doi.org/10.3390/e27111150 - 13 Nov 2025

Viewed by 757

Abstract

We revisit Probabilistic Boolean Networks as trainable function approximators. The key obstacle, non-differentiable structural choices (which predictors to read and which Boolean operators to apply), is addressed by casting the PBN’s structure as a stochastic policy whose parameters are optimized with score-function (REINFORCE) [...] Read more.

We revisit Probabilistic Boolean Networks as trainable function approximators. The key obstacle, non-differentiable structural choices (which predictors to read and which Boolean operators to apply), is addressed by casting the PBN’s structure as a stochastic policy whose parameters are optimized with score-function (REINFORCE) gradients. Continuous output heads (logistic/linear/softmax or policy logits) are trained with ordinary gradients. We call the resulting model a Learning PBN. We formalize the Learning Probabilistic Boolean Network, derive unbiased structural gradients with variance reduction, and prove a universal approximation property over discretized inputs. Empirically, Learning Probabilistic Boolean Networks approach ANN performance across classification (accuracy ↑), regression (RMSE ↓), representation quality via clustering (ARI ↑), and reinforcement learning (return ↑) while yielding interpretable, rule-like internal units. We analyze the effect of binning resolution, operator sets, and unit counts, and show how the learned logic stabilizes as training progresses. Our results indicate that PBNs can serve as general-purpose learners, competitive with ANNs in tabular/noisy regimes, without sacrificing interpretability. Full article

► Show Figures

Figure 1

25 pages, 989 KB

Open AccessFeature PaperArticle

A Deep Reinforcement Learning Model to Solve the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines

by Sergio Flavio Marroquín-Cano, Elías Neftalí Escobar-Gómez, Eduardo F. Morales, Elizeth Ramírez-Álvarez, Pedro Gasga-García, Eduardo Chandomí-Castellanos, J. Renán Velázquez-González, Julio Alberto Guzmán-Rabasa, José Roberto Bermúdez and Francisco Rodríguez-Sánchez

Mathematics 2025, 13(18), 3050; https://doi.org/10.3390/math13183050 - 22 Sep 2025

Viewed by 2432

Abstract

Vehicle Routing Problems are central to logistics and operational research, arising in diverse contexts such as transportation planning, manufacturing systems, and military operations. While Deep Reinforcement Learning has been successfully applied to both deterministic and stochastic variants of Vehicle Routing Problems, existing approaches [...] Read more.

Vehicle Routing Problems are central to logistics and operational research, arising in diverse contexts such as transportation planning, manufacturing systems, and military operations. While Deep Reinforcement Learning has been successfully applied to both deterministic and stochastic variants of Vehicle Routing Problems, existing approaches often neglect critical time-sensitive conditions. This work addresses the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines, a challenging formulation that is suited to model time routing conditions. The proposal, POMO-DC, integrates a novel dynamic context mechanism. At each decision step, this mechanism incorporates the vehicle’s cumulative travel time and delays—features absent in prior models—enabling the policy to adapt to changing conditions and avoid time violations. The model is evaluated on stochastic instances with 20, 30, and 50 customers and benchmarked against Google OR-Tools using multiple metaheuristics. Results show that POMO-DC reduces average delays by up to 88% (from 169.63 to 20.35 min for instances of 30 customers) and 75% (from 4352.43 to 1098.97 min for instances of 50 customers), while maintaining competitive travel times. These outcomes highlight the potential of Deep Reinforcement Learning-based frameworks to learn patterns from stochastic data and effectively manage time uncertainty in Vehicle Routing Problems. Full article

(This article belongs to the Special Issue Stochastic System Analysis and Control)

► Show Figures

Figure 1

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (49)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI