An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0

Ershadi Oskouei, Anita; Dashliboroun, Maral Keramat; Moghaddam, Pardis Sadatian; Serrano, Nuria; Hernando-Gallego, Francisco; Martín, Diego; Álvarez-Bravo, José Vicente

doi:10.3390/en19010198

Open AccessArticle

An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0

by

Anita Ershadi Oskouei

¹

,

Maral Keramat Dashliboroun

²,

Pardis Sadatian Moghaddam

³

,

Nuria Serrano

⁴,

Francisco Hernando-Gallego

⁵

,

Diego Martín

^4,*

and

José Vicente Álvarez-Bravo

⁴

¹

School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA

²

Department of Technical Communication and Interactive Design, Kennesaw State University, Kennesaw, GA 30144, USA

³

Department of Computer Science, University of Georgia, Athens, GA 30602, USA

⁴

Department of Computer Science, Escuela de Ingeniería Informática de Segovia, Universidad de Valladolid, 40005 Segovia, Spain

⁵

Department of Applied Mathematics, Escuela de Ingeniería Informática de Segovia, Universidad de Valladolid, 40005 Segovia, Spain

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(1), 198; https://doi.org/10.3390/en19010198

Submission received: 25 November 2025 / Revised: 21 December 2025 / Accepted: 26 December 2025 / Published: 30 December 2025

(This article belongs to the Special Issue Flexible and Secure Operation of Multi-Scenario Integrated Energy System Coupled with Electricity and Hydrogen)

Download

Browse Figures

Versions Notes

Abstract

The increasing complexity of multi-scale energy systems makes robust anomaly detection essential to ensure system resilience and operational continuity. Recent advances in DL enable effective modeling of high-dimensional, non-linear energy data by capturing latent spatio-temporal patterns. In this paper, we proposed an optimized deep reinforcement learning–generative adversarial network (ODRL-GAN) framework for reliable anomaly detection in multi-scale energy systems. The integration of DRL and GAN brings a key innovation: while DRL enables adaptive decision-making under dynamic operating conditions, GAN enhances detection by reconstructing normal patterns and exposing subtle deviations. To further strengthen the model, a novel multi-objective chimp optimization algorithm (NMOChOA) is employed for hyper-parameter tuning, improving accuracy, and convergence. This design allows the ODRL–GAN to effectively capture high-dimensional spatio-temporal dependencies while maintaining robustness against diverse anomaly patterns. The framework is validated on two benchmark datasets, PSML and LEAD1.0, and compared against state-of-the-art baselines including transformer, deep belief network (DBN), convolutional neural network (CNN), gated recurrent unit (GRU), and support vector machines (SVM). Experimental results demonstrate that the proposed method achieves a maximum detection accuracy of 99.58% and recall of 99.75%, significantly surpassing all baselines. Furthermore, the model exhibits superior runtime efficiency, faster convergence, and lower variance across trials, highlighting both robustness and scalability. The optimized DRL–GAN framework provides a powerful and generalizable solution for anomaly detection in complex energy systems, offering a pathway toward secure and resilient next-generation energy infrastructures.

Keywords:

Industry 4.0; smart water systems; false data injection detection; cyber–physical security; transformer; deep reinforcement learning

1. Introduction

Modern multi-scale energy systems (integrating electricity, hydrogen, thermal, and renewable resources) have become increasingly complex, heterogeneous, and tightly interconnected [1,2,3]. Such infrastructures operate with high-frequency sensor streams, nonlinear interactions, and rapidly fluctuating loads, making them highly vulnerable to faults, operational anomalies, and cyber–physical disturbances. Even minor deviations can propagate across interconnected subsystems, leading to equipment damage, stability loss, or large-scale service interruptions [4,5,6]. Ensuring early and accurate anomaly detection is therefore essential for maintaining reliability, optimizing resource utilization, and protecting the integrity of next-generation smart energy infrastructures [7,8,9].

Mitigating these challenges requires intelligent monitoring mechanisms that can continuously analyze high-dimensional data, recognize abnormal patterns, and respond before failures escalate [10]. Traditional threshold-based, statistical, or rule-driven methods often struggle with the nonlinear and multi-modal behaviors that characterize modern energy networks. As a result, recent research has shifted toward data-driven anomaly detection frameworks capable of capturing temporal dependencies, modeling system dynamics, and learning representations directly from raw sensor streams [11]. Approaches such as probabilistic modeling, shallow machine learning classifiers, and hybrid physics–data solutions have shown partial success, yet they remain limited when confronted with rapidly evolving patterns or cross-domain energy interactions [12,13,14,15].

In this paper, the term anomaly refers to abnormal system-level behaviors arising from faults, equipment malfunctions, abnormal load dynamics, or cyber–physical disturbances in multi-scale energy systems. These anomalies correspond to deviations from nominal operating conditions and reflect irregular system responses rather than variations in consumer behavior. Accordingly, the focus of this work is on operational and infrastructure-related anomalies that affect the reliability and stability of interconnected energy systems. From a power engineering perspective, such anomalies may indicate incipient equipment failures, unstable operating regimes, abnormal power and energy flows, or degraded system components. In interconnected multi-energy infrastructures, even localized faults or abnormal operating patterns can propagate across electrical, thermal, and renewable subsystems, potentially leading to cascading failures, voltage or frequency instability, and service disruptions. Early identification of these anomalies is therefore critical for ensuring grid stability, protecting physical assets, and maintaining secure system operation [11,12,13,14].

In practical terms, system-level anomalies often manifest as irregular temporal patterns in high-dimensional sensor data, such as sudden load spikes, sustained drifts from steady-state behavior, oscillatory instabilities, or unexpected correlations between different energy carriers. Unlike normal operating conditions (where sensor measurements exhibit consistent and predictable temporal dynamics) anomalous behavior introduces structural deviations in time-series signals that evolve across multiple temporal scales. Detecting such deviations requires models capable of distinguishing subtle early-stage abnormalities from normal system fluctuations [15]. The core problem addressed in this study is the early and reliable detection of system-level operational anomalies in heterogeneous, multi-energy infrastructures under dynamic operating conditions. This task is particularly challenging due to nonlinear system dynamics, multi-scale temporal variations, heterogeneous sensing modalities, and strong cross-domain coupling between energy subsystems. Effective anomaly detection therefore demands approaches that can capture temporal dependencies, learn robust representations of normal system behavior, and adapt to evolving operating conditions in real time, while remaining computationally efficient and stable for practical deployment.

Advances in machine learning (ML) and deep learning (DL) have provided more powerful alternatives by enabling automated feature extraction, temporal reasoning, and robust pattern recognition in complex environments [16,17,18]. Techniques such as convolutional neural networks (CNNs), gated recurrent units (GRUs), deep belief networks (DBNs), Transformer, generative adversarial networks (GANs), deep reinforcement learning (DRL) agents have demonstrated significant potential in capturing both spatial correlations and temporal fluctuations within large-scale energy datasets [19,20,21]. Building upon these developments, this work proposes an optimized DRL–GAN (ODRL-GAN) framework, which integrates DRL with GAN modeling and a novel multi-objective chimp optimization algorithm (NMOChOA) optimizer to achieve highly accurate, stable, and computationally efficient anomaly detection across multi-scale energy systems.

1.1. Related Works

The work [22] investigated the creation of power system machine learning (PSML), a first-of-its-kind multi-scale time-series dataset designed to support machine-learning research for future decarbonized energy grids. The authors constructed a synthetic yet physically consistent dataset reflecting interactions between transmission and distribution networks, capturing power, voltage, and current measurements across multiple spatial and temporal scales. Experimental results highlighted that PSML exposes realistic uncertainties and variability patterns crucial for testing data-driven anomaly detection and forecasting pipelines. Overall, the dataset provides a comprehensive foundation for developing reliable, generalizable ML techniques in next-generation low-carbon energy infrastructures.

The work [23] proposed a multimodal anomaly-detection framework for smart power grids based on graph-regularized multimodal subspace support vector data description (MS-SVDD). The authors modeled grid measurements as multiple heterogeneous modalities and projected them into a shared low-dimensional subspace to better preserve intrinsic correlations across sensors. They further introduced graph-embedded regularizers for each modality, enabling the model to exploit topological relationships and optimize modality-specific contributions during training. Experiments on a three-modality dataset demonstrated improved reliability and earlier detection of power-grid events, supported by a carefully designed preprocessing pipeline for one-class learning. Overall, their findings highlight the benefits of multimodal subspace learning combined with graph constraints for robust event-level anomaly detection in complex power networks.

The study [24] examined an interpretable hybrid anomaly-detection framework based on an extended multivariate exponential smoothing LSTM (MES-LSTM), combining statistical forecasting with recurrent deep learning. The authors addressed common challenges in deep anomaly detection by integrating uncertainty modeling and transparent temporal reasoning. Using renewable-energy generation time-series data as the application domain, the extended MES-LSTM was benchmarked against state-of-the-art detectors and shown to perform competitively while being less sensitive to spurious correlations. The work demonstrates that hybrid statistical–DL models can deliver both reliable detection and enhanced explainability in energy anomaly-monitoring tasks.

The work [25] developed the multivariate spatial-temporal graph convolutional informer (MST-GCI), a fault-detection framework designed to capture both spatial topology and long-range temporal dependencies in power distribution networks. The model integrates a graph convolutional informer (GCformer), multivariate time-series graph learning (MTGL), and a VAE-based scoring mechanism to jointly detect faults and localize their root causes. Evaluations on public power-grid datasets showed that MST-GCI consistently outperformed eight state-of-the-art baselines across multiple fault-detection accuracy metrics. The approach also demonstrated precise localization of fault-origin buses in complex distribution structures. The results indicate that combining graph learning with transformer-based long-term modeling yields a robust solution for spatial–temporal fault detection.

In [26], the authors designed an interpretable LSTM-based fault-classification model for the PSML benchmark dataset, aiming to enhance both reliability and transparency in electrical fault detection. The study introduced novel disentanglement-based interpretability metrics and employed multi-objective Bayesian optimization to jointly maximize accuracy and interpretability across multiple Pareto-optimal architectures. Using Shapley Additive Explanations, the model further revealed how specific temporal subsequences contribute to classification outcomes, improving root-cause understanding. The most accurate Pareto-front configuration delivered highly competitive performance on the PSML dataset while maintaining strong interpretability.

The work in [27] focused on developing the winning solution for the LEAD energy-anomaly detection competition, which provided 200 labeled building energy time series for training and 206 unseen series for evaluation. The authors introduced a tree-based supervised anomaly classifier supported by extensive feature engineering, particularly emphasizing value-changing features that captured temporal variability. Their model achieved a leading AUC score of 0.9866 on the private leaderboard, significantly outperforming competing approaches. The study also analyzed multiple enhancement strategies and demonstrated that feature engineering contributes the most to performance gains.

The work [28] investigated an unsupervised anomaly-detection framework that combines a time series forest (TSF) with a reinforcement-learning–based model-selection strategy to dynamically choose the most suitable detector without relying on ground-truth labels. The approach was evaluated on real-world time-series datasets and achieved superior F1 performance compared to all competing AD models, while on a synthetic dataset it reached an F1 score of 0.989, outperforming all baselines except k-nearest neighbor (KNN). The authors further showed that their RL-driven selector also surpassed GPT-4 when prompted to detect anomalies on synthetic data, highlighting its robustness across diverse anomaly types. Experiments on three additional datasets revealed that the proposed model-selection mechanism maintained strong performance while individual AD models varied significantly.

The work [29] proposed a transformer-based framework for detecting anomalous electricity consumption, targeting the growing challenge of non-technical losses in modern smart-grid environments. The model incorporates multi-head attention, layer normalization, and point-wise feed-forward components to effectively capture temporal consumption patterns. To mitigate class imbalance, the authors introduced a synthetic-anomaly generation method that enriches minority anomalous samples during training. Experiments on a real dataset released by the State Grid Corporation of China demonstrated strong performance, achieving 93.9% precision, 96.3% recall, 95.1% F1-score, and 95.6% accuracy, outperforming state-of-the-art baselines. The results show that combining Transformer architectures with synthetic anomaly enrichment yields robust detection of electricity-consumption anomalies in large-scale smart-meter data.

The study [30] examined the use of generative neural networks for detecting point anomalies in energy consumption data, focusing on building realistic reference profiles and comparing multiple unsupervised methods on two datasets. Leveraging unsupervised ML algorithms including generative models, the authors found that simpler statistical methods often outperformed complex neural networks when contamination parameters were misconfigured. Their analysis underscored the importance of parameter tuning and threshold calibration in point-anomaly detection tasks. The work suggests that even within advanced neural-network setups, effective anomaly detection in energy consumption data requires careful contamination management and interpretable modeling choices.

The work [31] developed an empirical assessment of whether emerging time-series foundation models can effectively generalize to energy anomaly-detection tasks. The authors evaluated several foundation-model variants against conventional DL and statistical baselines using multiple building-energy datasets containing labeled consumption anomalies. Their results showed that foundation models exhibited strong zero-shot and few-shot performance but often underperformed task-specific detectors when fine-tuned data were abundant, highlighting a trade-off between generalization and task specialization. The analysis concluded that while foundation models provide promising adaptability and reduced training cost, they require careful calibration to match the accuracy of domain-tailored anomaly-detection architectures.

In [32], the authors designed an explore–exploit, workload-bounded strategy for detecting rare events in large-scale energy-sensor time series, addressing the difficulty of identifying anomalies that occur infrequently and under heavy data loads. Their method adaptively balances exploration and exploitation while ensuring computational efficiency under strict workload constraints. Evaluated on massive multivariate sensor datasets, the proposed approach outperformed several state-of-the-art detectors in identifying rare anomalies, particularly under limited computational budgets. The study highlights that coupling workload-aware scheduling with adaptive exploration significantly enhances robustness and responsiveness in real-world energy-monitoring environments.

The analysis in [33] explored a comprehensive evaluation of unsupervised ML techniques for detecting point anomalies in energy-consumption data, focusing on their behavior across multiple benchmark datasets. The authors compared clustering-based, density-based, and reconstruction-based methods, examining their sensitivity to contamination levels, temporal patterns, and distributional irregularities. Experimental results showed that no single unsupervised model consistently dominated; instead, performance depended strongly on anomaly type and data characteristics, with certain lightweight statistical methods rivaling more complex ML approaches in specific scenarios. The study emphasized the importance of method selection and parameter tuning when applying unsupervised detectors to real-world energy datasets.

The work in [34] focused on evaluating the effectiveness of the Prophet forecasting model for predicting building-level electricity consumption using benchmark smart-meter time-series datasets. The authors conducted a rigorous computational study with nested cross-validation, ensuring reliable hyperparameter selection and robust generalization across multiple buildings and time periods. Their experiments showed that Prophet achieved competitive accuracy relative to common forecasting baselines, particularly in scenarios with strong seasonality and limited training data. The study also highlighted the importance of structured model evaluation when deploying forecasting tools for real-world building-energy management. The findings demonstrate that Prophet remains a strong, lightweight, and interpretable baseline for energy-consumption prediction tasks.

The work [35] proposed a robust Transformer-based unsupervised anomaly detection framework for multivariate time-series data in safety-critical nuclear power systems. The authors introduced a modified Transformer architecture trained using a maximum correntropy criterion (MCC)–based loss function to improve robustness against outliers and non-Gaussian noise, which commonly contaminate real-world industrial datasets. Unlike conventional Transformer models optimized with mean squared error (MSE), the proposed Trans-MCC model suppresses the influence of large reconstruction errors during training, thereby enhancing the stability of normal pattern learning. The method was evaluated on both simulated nuclear power plant datasets (including PFPR, SLBIC, SGTR, and SG2L fault scenarios) and real operational plant data consisting of 30 sensor variables. Quantitative results demonstrated that Trans-MCC consistently outperformed strong baselines such as DAGMM, USAD, OmniAnomaly, and PCA, achieving F1-score improvements of up to 0.31 compared to the standard Transformer-MSE under noisy training conditions.

1.2. Research Gaps and Motivation

Despite significant progress in data-driven anomaly detection, existing methods still face major limitations when applied to multi-scale and heterogeneous energy systems. Most traditional deep learning models (such as CNNs, GRUs, and DBNs) rely on static feature extraction or sequence modeling that struggles to capture the rapidly evolving, cross-domain dynamics present in modern energy infrastructures. These architectures typically learn representations in a feed forward or recurrent manner and therefore lack mechanisms for adaptive decision-making when the system deviates from expected operating conditions. Prior works also reveal that generative approaches, while capable of modeling normal behavior, often suffer from instability during training and fail to capture long-term operational dependencies. This creates a substantial gap in building a unified architecture that can simultaneously learn generative structure, temporal dynamics, and adaptive response policies.

Another critical research gap concerns generalization and stability. Multi-energy datasets such as PSML and LEAD1.0 exhibit nonlinear interactions, heterogeneous modalities, and multi-scale temporal fluctuations. Many existing models perform well on specific subsets of data but degrade significantly when evaluated across different operating modes or energy carriers. Studies also show high variance across runs due to sensitivity to initialization and hyper-parameter settings, limiting the deployment of such models in real-world environments. Furthermore, accuracy-centric evaluations in prior research overlook essential operational factors such as inference latency, runtime efficiency, and robustness under repeated sampling.

A further gap arises in the area of optimization and hyper-parameter tuning. Most prior anomaly detection frameworks rely on manual tuning or simplistic search techniques such as grid search or random search, which are computationally inefficient and often inadequate for high-dimensional hybrid models. Meta-heuristic techniques have been explored, but existing single-objective optimizers fail to account for the multi-objective nature of real-world anomaly detection (where accuracy, convergence speed, model stability, and latency must be optimized simultaneously). These limitations highlight the need for a more capable optimization mechanism that can navigate complex search spaces while maintaining balanced trade-offs between conflicting objectives.

Beyond algorithmic limitations, an important research gap lies in the inadequate modeling of cross-domain coupling and inter-energy dependencies in multi-scale energy systems. Modern infrastructures exhibit strong interactions between electrical, thermal, and mechanical subsystems, where anomalies often propagate across domains rather than remaining isolated within a single energy carrier. However, most existing anomaly detection approaches treat each modality independently or rely on loosely coupled fusion strategies, failing to capture causal dependencies and cascading effects. This limitation reduces detection sensitivity for early-stage or compound anomalies and undermines the interpretability of model outputs in real operational scenarios.

Another underexplored gap concerns the operational robustness and deployability of anomaly detection models in real-world energy management systems. Many existing studies implicitly assume stationary data distributions and offline inference, whereas practical deployments must handle concept drift, seasonal variations, sensor aging, and missing or corrupted measurements. Furthermore, limited attention has been paid to the stability of detection performance under repeated inference cycles, model retraining, or incremental updates. The absence of mechanisms to explicitly address these operational challenges significantly limits the reliability and long-term applicability of current data-driven solutions. The above analysis highlights several critical research gaps in existing anomaly detection studies for multi-scale energy systems:

Limited capability of conventional DL models to capture rapidly evolving, multi-scale, and cross-domain dynamics;
Lack of adaptive decision-making mechanisms to respond to deviations from nominal operating conditions;
Instability and insufficient long-term dependency modeling in generative-based approaches;
Poor generalization across operating modes, energy carriers, and temporal scales in heterogeneous datasets;
High sensitivity to initialization and hyper-parameter settings, resulting in unstable and non-reproducible performance;
Overemphasis on accuracy-centric metrics while neglecting runtime efficiency, inference latency, and robustness;
Inefficiency of manual or single-objective optimization techniques for high-dimensional hybrid architectures;
Inadequate modeling of cross-domain coupling and anomaly propagation across interconnected energy subsystems;

Motivated by these gaps, the present study proposes a comprehensive solution that integrates DRL, GAN, and NMOChOA optimization into a unified architecture. DRL enables adaptive policy refinement under dynamic conditions, GANs capture deep structural representations of normal system behavior, and the NMOChOA provides an efficient optimization layer capable of tuning high-dimensional hyper-parameters while preserving convergence stability. This synergy directly addresses the shortcomings of earlier models by combining temporal reasoning, generative reconstruction, and robust optimization. As a result, the proposed ODRL–GAN framework not only improves anomaly detection accuracy but also enhances generalization, reduces computational overhead, and increases reliability.

1.3. Paper Contribution and Organization

Therefore, the main contributions of this paper are as follows:

A novel anomaly-detection framework (ODRL–GAN) is proposed, designed specifically for complex multi-scale energy systems. The model unifies adaptive policy learning, generative reconstruction, and multi-objective optimization to address the nonlinear, heterogeneous, and dynamic characteristics of modern energy infrastructures. This tri-layer design provides a more discriminative and stable detection mechanism than traditional standalone deep learning models;
A refined evolutionary optimizer is developed by enhancing the standard ChOA, incorporating a navigator-guided multi-objective search strategy that improves exploration–exploitation balance. This enhancement enables more reliable hyper-parameter tuning and significantly strengthens convergence stability for hybrid deep architectures;
Extensive evaluation is conducted on the PSML and LEAD1.0 datasets, two real-world multi-carrier energy benchmarks, and the proposed framework is compared against Transformer, CNN, GRU, DBN, DRL, GAN, and support vector machine (SVM) baselines. This provides a broad and rigorous assessment across diverse modeling paradigms;
A comprehensive performance analysis is performed using accuracy, recall, area under the curve (AUC), root mean squared error (RMSE), runtime, inference latency, variance, and t-tests, demonstrating that ODRL–GAN consistently outperforms all competing methods. The results highlight the model’s superiority in precision, robustness, computational efficiency, and statistical reliability.

The remainder of this paper is organized as follows. Section 2 presents the proposed methodology and all required materials, including detailed descriptions of the datasets, preprocessing pipeline, individual architectural modules, and the integrated ODRL–GAN framework. Section 3 reports the experimental results and comparative analyses across all baseline models. Section 4 provides an in-depth discussion of the findings, examining computational efficiency, stability, statistical significance, and real-world implications. Section 5 concludes the study by summarizing the key insights and highlighting the overall effectiveness of the proposed approach.

2. Materials and Proposed Methods

The overall workflow of the proposed methodology is illustrated in Figure 1, which summarizes how real-world multi-scale energy data are transformed into a robust anomaly detection model. The figure outlines the complete pipeline, starting from the acquisition of raw data from two benchmark sources followed by essential preprocessing steps such as imputation, normalization, feature selection, and label encoding. The processed dataset is then split into training and testing subsets, which serve as the foundation for training the anomaly detection model. Figure 1 also highlights the core architecture of the proposed framework, where a DRL agent interacts with a GAN within a unified environment. The GAN module generates synthetic patterns and attempts to reconstruct normal behaviors, while the DRL agent receives state information, selects suitable actions, and optimizes classification decisions based on reward signals.

Once trained, the integrated model is further refined using the proposed NMOChOA, which tunes critical hyper-parameters to maximize accuracy, convergence stability, and runtime efficiency. The final component of the framework performs model evaluation using metrics such as accuracy, recall, and AUC, comparing the performance of ODRL–GAN against strong baseline models. It is important to note that the detailed internal mechanics of each module (DRL, GAN, and NMOChOA) are fully expanded in the subsequent subsections. The interactions in the framework highlight three essential relationships:

1.: Data–Model Interaction, where the processed dataset continuously feeds the agent–environment loop;
2.: GAN–DRL Collaboration, where GAN enriches the agent’s understanding of normal vs. anomalous behaviors, and DRL adapts to dynamic operating conditions through reward-driven policy learning;
3.: Optimization Feedback Loop, where NMOChOA iteratively updates hyperparameters to ensure the global optimality of the ODRL–GAN across accuracy, convergence speed, and stability.

This layered integration enables the model to capture the nonlinear, high-dimensional, and time-dependent nature of energy system data.

2.1. Dataset

This study employs two benchmark datasets (PSML and LEAD1.0) both selected for their richness, multi-dimensional structure, and suitability for anomaly detection in modern multi-scale energy systems. These datasets offer diverse operating conditions, heterogeneous measurement sources, and well-labeled abnormal events, making them ideal for training and evaluating a deep learning–based anomaly detection model. Their combination allows the proposed ODRL–GAN framework to be validated across both traditional power system environments and next-generation integrated energy networks involving electricity, hydrogen, and renewable assets [22,36].

The PSML dataset [22] provides high-resolution operational measurements collected from various components of electrical grids, including loads, distributed generators, substations, and control devices. It contains multiple classes of events such as normal operation, transient disturbances, equipment faults, and cyber–physical anomalies. The dataset covers a wide variety of system conditions with thousands of labeled samples, allowing the model to learn both steady-state and dynamic patterns. Its multi-class structure (commonly including normal, line fault, load anomaly, voltage irregularity, and cyber intrusion) makes it an excellent testbed for evaluating sophisticated learning architectures under realistic power system dynamics.

For the PSML dataset, the features used in this study primarily correspond to electrical power system measurements, including active and reactive power flows, voltage magnitudes, load demand variations, and selected power quality indicators. All measurements are sampled at a fixed temporal resolution of 1 s, enabling the model to capture both steady-state and transient behaviors. After preprocessing and feature selection, a total of 52,800 samples are retained, of which 36,960 samples (70%) are used for training and 15,840 samples (30%) are used for testing.

The LEAD1.0 dataset [36] extends the scope further by incorporating multi-carrier energy flows such as electricity, hydrogen, gas, and renewable sources, reflecting the operational complexity of emerging integrated energy systems. LEAD1.0 offers multi-scale measurements from sensors, market signals, energy storage units, hydrogen converters, and renewable generators. It includes well-defined anomaly categories such as sensor faults, equipment degradation, energy imbalance, abnormal hydrogen flow, and coordinated cyber–physical attacks. With its hybrid structure and rich temporal behavior, LEAD1.0 provides a more challenging environment that captures cross-domain interactions and multi-layer anomalies, enabling the proposed method to demonstrate scalability and generalization across complex and interconnected energy infrastructures.

For the LEAD1.0 dataset, the selected features span multiple energy carriers, including electrical variables (power flow and voltage), hydrogen-related measurements (flow rate and pressure), renewable generation outputs, and energy storage indicators. The dataset is sampled at a temporal resolution of 5 s, reflecting the multi-scale dynamics of integrated energy systems. Following preprocessing and feature selection, the final dataset consists of 64,800 samples, with 45,360 samples (70%) allocated for training and 19,440 samples (30%) reserved for testing.

These two datasets collectively cover the operational spectrum of both conventional smart grids and future integrated energy systems, ensuring that the proposed ODRL–GAN model is evaluated under diverse, realistic, and multi-scale conditions. Before training the proposed ODRL–GAN framework, the PSML and LEAD1.0 datasets undergo a structured data preparation pipeline to ensure consistency, reliability, and suitability for deep learning–based anomaly detection. Given the multi-source and multi-scale nature of the data, preprocessing plays a crucial role in reducing noise, standardizing feature distributions, and preserving the temporal and operational relationships within the measurements. The preparation process includes four key stages: imputation, normalization, feature selection, and label encoding.

Imputation is first applied to address missing values that commonly arise from sensor outages, communication delays, or measurement inconsistencies in large-scale energy systems. Missing entries are replaced using statistically consistent strategies such as mean or median substitution for continuous variables, ensuring that no artificial trends or discontinuities are introduced. This step maintains the integrity of time-series patterns, preventing the reinforcement learning agent and GAN discriminator from misinterpreting gaps as anomalies. Following imputation, normalization is performed to scale heterogeneous features (such as voltage, power flow, hydrogen pressure, or renewable output) into a unified numeric range. This prevents features with large numeric magnitudes from dominating the learning process and significantly stabilizes both the DRL policy updates and the GAN training dynamics. Min–max scaling is used to preserve relative variations while enforcing comparability across features.

Next, feature selection is applied to extract the most informative variables from both datasets. Energy systems often contain redundant or weakly correlated measurements that can degrade training efficiency and increase computational cost. Using correlation analysis and domain knowledge, irrelevant or low-impact features are removed, ensuring that the ODRL–GAN focuses on critical operational indicators such as load fluctuations, renewable variability, hydrogen flow anomalies, and power quality metrics. Finally, label encoding transforms the event categories (normal operation, equipment faults, cyber–physical anomalies, sensor faults, and others) into numerical labels suitable for classification. This step standardizes class identifiers across both datasets and facilitates a consistent learning objective for the discriminator and the DRL agent.

2.2. RL

RL provides a powerful learning paradigm for sequential decision-making in dynamic and uncertain environments, making it particularly suitable for anomaly detection in multi-scale energy systems. Unlike supervised learning (which relies on static labeled samples) RL continuously interacts with the environment, learns optimal behaviors through trial and feedback, and adapts to evolving operating conditions. This adaptability allows the RL agent to capture subtle temporal deviations that may reflect abnormal system behavior, while progressively refining its policy to maximize long-term detection performance [37].

Figure 2 provides an overview of the standard RL interaction cycle used in our framework. As the environment (the energy system data stream) evolves, it provides the agent with a state representation. The agent’s policy determines the best action to take in response to this state, which is then executed in the environment. Based on the consequences of that action (whether it improves or worsens anomaly detection) the environment produces a reward. This reward is used by the agent’s learning algorithm to update the policy, forming a closed-loop system that gradually improves the detection accuracy through continuous policy refinement [38].

The mathematical foundation of this RL process is described through several key equations. Equation (1) characterizes the cumulative return the agent aims to maximize over time [37]:

G_{t} = \sum_{k = 0}^{\infty} γ^{t} r_{t + k + 1}

(1)

where

G_{t}

is the cumulative return at time step

t

;

r_{t + k + 1}

is the reward received

k + 1

steps after time

t

; and

γ

is the discount factor that controls the influence of future rewards, with

0 < γ \leq 1

.

Equation (2) defines the value function under a given policy, representing the expected long-term return when starting from a particular state [37]:

V^{π} (s) = E [G_{t} | s_{0} = s]

(2)

where

V^{π} (s)

is the expected return from state

s

under policy

π

;

E

denotes the expectation over all possible future trajectories.

Equation (3) introduces the action-value function, which evaluates the expected return for taking a specific action in a given state:

Q^{π} (s, a) = E [G_{t} | s_{t} = s, a_{t} = a]

(3)

where

Q^{π} (s, a)

is the expected return for taking action

a

in state

s

and then following policy

π

;

s_{t}

is the current state;

a_{t}

is the selected action.

Finally, Equation (4) presents the Q-learning update rule, which iteratively refines the estimate of the action-value function by combining new reward information with prior knowledge [38]:

Q (s, a) \leftarrow Q (s, a) + α (r + γ {m a x}_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a))

(4)

where

Q (s, a)

is the current estimate of the action-value function;

a

is the learning rate controlling how much new information overrides old estimates;

r

is the immediate reward received after action

a

;

s^{'}

is the next state;

{m a x}_{a^{'}} Q (s^{'}, a^{'})

is the maximum estimated return achievable from state

s^{'}

.

Together, these formulations define how the RL agent evaluates states and actions, accumulates long-term rewards, and progressively updates its policy to improve anomaly detection performance in complex and dynamic energy environments.

2.3. GAN

GANs provide a powerful framework for learning the underlying distribution of complex datasets through an adversarial training mechanism. A GAN consists of two neural networks (a generator and a discriminator) that compete against each other in a minimax game. The generator aims to synthesize realistic samples that resemble the true data distribution, while the discriminator aims to distinguish between real and generated samples. This adversarial setup enables GANs to capture high-dimensional, non-linear structures contained in energy system measurements, making them highly effective for reconstructing normal operational patterns and exposing subtle anomalies [39].

Figure 3 illustrates the core structure of a standard GAN, consisting of a generator network G and a discriminator network D. The generator receives a latent noise vector and produces synthetic samples intended to mimic real operational data. Meanwhile, the discriminator processes both real samples from the dataset and synthetic samples generated by G, outputting a probability that reflects whether an input is real or fake. During training, the generator continuously adjusts its parameters to fool the discriminator, while the discriminator simultaneously improves its ability to detect fabricated inputs. This adversarial learning loop gradually forces the generator to synthesize increasingly realistic data patterns [39].

The mathematical foundation of GAN training is formulated through a minimax objective. Equation (5) represents the fundamental value function optimized by the generator and discriminator during adversarial training [40]:

\begin{matrix} m i n \\ G \end{matrix} \begin{matrix} m a x \\ D \end{matrix} V (D, G) = E_{X ~ p_{d a t a} (x)} [l o g (D (X))] + E_{Z ~ p_{Z} (z)} [l o g (1 - D (G (Z)))]

(5)

where

p_{d a t a} (x)

represents the distribution of real data;

p_{Z} (z)

denotes the prior distribution over the latent noise vectors;

D (X)

refers to the probability output of the discriminator for a real input

X

;

D (G (Z))

denotes the probability assigned to a sample synthesized by the generator.

Next, Equation (6) describes the loss function of the discriminator, which learns to assign high scores to real samples and low scores to generated ones [40]:

L_{D} = - (E_{X ~ p_{d a t a} (x)} [l o g (D (X))] + E_{Z ~ p_{Z} (z)} [l o g (1 - D (G (Z)))])

(6)

where

L_{D}

is the loss functions for the discriminator.

Finally, Equation (7) defines the generator’s loss function, which encourages the generator to produce samples that can successfully mislead the discriminator:

L_{G} = - E_{Z ~ p_{Z} (z)} [l o g (1 - D (G (Z)))]

(7)

where

L_{G}

is the loss functions for the generator.

Although the basic GAN is known to suffer from challenges such as mode collapse and training instability, the role of the GAN in the proposed ODRL–GAN framework fundamentally differs from that of conventional data-synthesis-oriented GAN applications. In this work, the primary objective of the GAN is not to generate highly diverse samples, but to learn a compact representation of normal operating behavior and to enhance the discriminator’s sensitivity to distributional deviations that indicate anomalies. To mitigate the inherent limitations of standard GANs, several stabilization mechanisms are incorporated. First, the GAN is pre-trained offline using normal operational data, which significantly stabilizes adversarial learning before integration with the DRL agent. Second, the generator architecture is intentionally kept lightweight, while discriminator regularization and early stopping are employed to prevent overfitting and collapse. Moreover, the integration of the GAN within the DRL decision loop further alleviates instability, as anomaly decisions are not solely driven by the GAN output but are refined through reward-based policy learning. Finally, the NMOChOA optimizer plays a critical role by tuning GAN-related hyper-parameters to avoid unstable regions of the search space. Collectively, these design choices enable the use of a stabilized standard GAN that remains computationally efficient while maintaining high sensitivity to subtle system-level anomalies.

In addition to the architectural and optimization-level strategies described above, the training procedure of the GAN is carefully structured to ensure stable adversarial dynamics throughout learning. Specifically, discriminator updates are prioritized during early training epochs to prevent generator dominance and to establish a reliable decision boundary for normal operating behavior. Batch sizes and update frequencies are kept consistent across training stages to avoid oscillatory behavior. Furthermore, the discriminator outputs are not used in isolation; instead, they are incorporated as contextual cues within the DRL state representation, which reduces the impact of transient adversarial fluctuations. This interaction ensures that anomaly detection decisions are driven by temporally consistent patterns rather than instantaneous discriminator responses. As a result, GAN instability is mitigated at both the training and decision levels, reinforcing robustness without introducing additional loss formulations or architectural complexity.

2.4. NMOChOA

ChOA is a population-based meta-heuristic introduced by Khishe and Mosavi in 2020, inspired by the cooperative hunting strategies of chimpanzees. Its success stems from its balance between exploration and exploitation, modeling how chimps collaborate, chase, encircle, and attack prey. ChOA has demonstrated strong optimization capabilities across various engineering problems due to its dynamic search behavior, ability to avoid premature convergence, and robustness in high-dimensional spaces [41].

In ChOA, the population consists of four types of chimps (attacker, barrier, chaser, and driver) each responsible for a specific role in the collaborative search process. The attacker leads the exploration toward the best-known region, the barrier obstructs the escape directions of potential solutions, the chaser tracks the prey’s updated location, and the driver pushes the search agents toward the target. This multi-role cooperation enhances coverage of the search space and improves the algorithm’s ability to converge toward global optima.

The mathematical modeling of ChOA begins with the update mechanism described in Equations (8)–(12), which represent the distance computation, movement update, and the adaptive coefficient vectors. Equation (8) defines the distance between a chimp and the prey based on dynamically varying parameters. Equation (9) provides the position update rule, expressing how each chimp moves toward the prey according to the computed distance. Equations (10) and (11) generate dynamic coefficients that regulate convergence behavior, while Equation (12) introduces a chaotic component that enriches exploration by preventing the algorithm from stagnation [41].

d = | {c \cdot X}_{p r e y} (t) - {m \cdot X}_{c h i m p} (t) |,

(8)

X_{c h i m p} (t + 1) = X_{p r e y} (t) - a \cdot d,

(9)

a = 2 \cdot f \cdot r_{1} - f,

(10)

c = 2 \cdot r_{2},

(11)

m = C h a o t i c_v a l u e

(12)

where

X_{p r e y} (t)

is the prey’s position vector;

X_{c h i m p} (t)

denotes the chimp’s position vector;

r_{1} a n d r_{2}

are the random vectors

\in [0, 1]

;

a, c, a n d m

are the coefficient vectors;

m

indicates a chaotic vector; and

f

is the dynamic vector

\in [0, 2.5]

.

Next, Equations (13)–(15) formalize the cooperative hunting mechanism among the four chimp roles. Equation (13) computes the distances between each role (attacker, barrier, chaser, and driver) and the candidate solution. Equation (14) updates the estimated position of each role using its respective adaptive coefficient. Finally, Equation (15) fuses the contributions of all four roles by averaging their updated positions, producing the final position update for each chimp at every iteration. This fusion step encapsulates the cooperative nature of chimp hunting and ensures a stable transition toward optimal solutions [41].

{\begin{matrix} d_{A t t a c k e r} = | c_{1} \cdot X_{A t t a c k e r} - m_{1} \cdot X | \\ d_{B a r r i e r} = | c_{2} \cdot X_{B a r r i e r} - m_{2} \cdot X | \\ d_{C h a s e r} = | c_{3} \cdot X_{C h a s e r} - m_{3} \cdot X | \\ d_{D r i v e r} = | c_{4} \cdot X_{D r i v e r} - m_{4} \cdot X | \end{matrix},

(13)

{\begin{matrix} X_{1} = X_{A t t a c k e r} - a_{1} (d_{A t t a c k e r}) \\ X_{2} = X_{B a r r i e r} - a_{2} (d_{B a r r i e r}) \\ X_{3} = X_{C h a s e r} - a_{3} (d_{C h a s e r}) \\ X_{4} = X_{D r i v e r} - a_{4} (d_{D r i v e r}) \end{matrix},

(14)

X (t + 1) = \frac{X_{1} + X_{2} + X_{3} + X_{4}}{4}

(15)

where

X_{A t t a c k e r}

presents the best search agent,

X_{B a r r i e r}

is the second-best search agent,

X_{C h a s e r}

denotes the third-best search agent,

X_{D r i v e r}

is the fourth-best search agent, and

X (t + 1)

is the updated position of each chimp.

The overall behavior of ChOA is visualized in Figure 4, which illustrates the movement of the four chimp roles toward the prey. Each role approaches the target from a different direction, effectively surrounding the prey and preventing escape routes. This coordinated movement mimics real-world chimp hunting, leading to a balance between directional exploration and guided exploitation [41].

Equation (16) provides an alternative distance formulation using a random scaling factor. This variation introduces stochasticity into the movement dynamics, allowing the algorithm to diversify search trajectories and escape local optima when necessary.

d = | {c \cdot X}_{p r e y} (t) - {m \cdot X}_{c h i m p} (t) |

(16)

where

μ

is the random number ∈ [0, 1].

Finally, Figure 5 depicts the convergence and divergence behavior governed by the magnitude of the coefficient parameter

| a |

. When

| a | < 1

, the chimps converge toward the prey, enabling strong exploitation around promising solutions. Conversely, when

| a | > 1

, the chimps diverge, pushing the search outward into unexplored regions and improving the algorithm’s exploration capacity. This adaptive shift between convergence and divergence forms the core strength of ChOA, enabling it to maintain diversity in the population while steadily guiding the search toward global optimum solutions [41].

According to the No Free Lunch (NFL) theorem, no single meta-heuristic optimization algorithm can consistently outperform others across all optimization problems [41]. This fundamental result implies that the effectiveness of a meta-heuristic is inherently problem-dependent and must be justified in the context of the target application. In this work, the optimization task focuses on high-dimensional hyper-parameter tuning of a coupled DRL–GAN framework under multiple conflicting objectives, including detection accuracy, convergence stability, robustness, and computational efficiency. Such characteristics necessitate an optimizer with strong global exploration, cooperative search behavior, and adaptive control over the exploration–exploitation trade-off.

The ChOA is selected as the base optimizer due to its distinctive multi-role search mechanism and cooperative hunting strategy, which naturally support diversified population movement and parallel exploration of the search space. Unlike swarm-based optimizers that rely on a single collective update rule, ChOA employs multiple behavioral roles, enabling heterogeneous search dynamics that are particularly suitable for navigating complex, non-convex, and irregular optimization landscapes. These properties make ChOA well aligned with the hyper-parameter optimization problem considered in this study, where different regions of the search space may exhibit vastly different sensitivities and convergence characteristics. To ensure a fair and rigorous evaluation, several well-established multi-objective meta-heuristics are also considered as reference optimizers, including multi-objective particle swarm optimization (MOPSO), multi-objective gray wolf optimizer (MOGWO), and the multi-objective orchard algorithm (MOOA). These algorithms represent diverse search philosophies, such as velocity-based swarm intelligence, leader-driven hierarchical search, role-based cooperation, and ecological migration strategies. Their inclusion provides a comprehensive benchmark set to assess whether the proposed optimizer offers tangible advantages beyond existing methods.

Although the standard ChOA exhibits strong exploration and cooperative search capabilities, it still suffers from several limitations that restrict its performance in high-dimensional and complex optimization tasks. First, the algorithm tends to lose population diversity in later iterations, causing premature convergence toward suboptimal regions. Second, the chaotic term improves randomness but lacks directional control, which sometimes leads to unstable oscillations and inconsistent convergence behavior. Third, the four predefined roles (attacker, barrier, chaser, and driver) do not dynamically adjust their influence over the optimization phases, resulting in imbalanced exploration–exploitation transitions, especially in problems with irregular or multimodal landscapes. These issues motivate the design of an enhanced variant with more adaptive search dynamics [42].

To overcome the limitations observed in the standard ChOA, we introduce a new adaptive search agent referred to as the navigator chimp. This agent is designed to enhance the algorithm’s ability to balance exploration and exploitation across different stages of the optimization process. Unlike the four classical roles which primarily imitate cooperative hunting behavior, the navigator chimp plays a higher-level supervisory role by continuously evaluating the population’s search trends and adaptively steering the search trajectory. Its primary purpose is to guide the chimp population between promising regions of the search space and unexplored areas, thereby alleviating premature convergence and improving global search capability.

The navigator chimp improves ChOA in several essential ways. First, it monitors the diversity and spatial distribution of the search population. When the algorithm begins to over-converge toward a specific region (often leading to stagnation or trapping in local minima) the navigator chimp promotes controlled diversification by adjusting movement intensity or expanding search boundaries. Conversely, if the search becomes excessively scattered, the navigator chimp reinforces exploitation by directing chimps toward high-quality zones identified by the leading agents. This adaptive directional guidance helps maintain a healthy balance between global exploration and local refinement, which is crucial for achieving stable convergence in complex optimization landscapes.

Second, the navigator chimp dynamically regulates the transition between exploration and exploitation phases based on the optimization progress. During early iterations, when global exploration is most critical, the navigator chimp promotes broader movement across the search space to locate potential high-quality regions. As the algorithm progresses and promising solutions begin to emerge, the navigator chimp gradually shifts its influence toward exploitation by encouraging search agents to refine their positions around these regions. This phase-aware behavior ensures that the algorithm does not waste computational effort on excessive exploration in later stages, while also preventing early stagnation.

The enhanced ChOA formulation incorporating the navigator chimp is mathematically expressed through Equations (17)–(19). The navigator chimp enhances robustness by providing corrective feedback whenever the movements of the main four chimps become inconsistent or misaligned with the overall search direction. This stabilizing effect reduces erratic oscillations, improves convergence smoothness, and increases the likelihood of reaching truly optimal solutions.

{\begin{matrix} d_{A t t a c k e r} = | c_{1} . X_{A t t a c k e r} - m_{1} . X | \\ d_{B a r r i e r} = | c_{2} . X_{B a r r i e r} - m_{2} . X | \\ d_{C h a s e r} = | c_{3} . X_{C h a s e r} - m_{3} . X | \\ d_{D r i v e r} = | c_{4} . X_{D r i v e r} - m_{4} . X | \\ d_{n a v i g a t o r} = | c_{5} . X_{n a v i g a t o r} - m_{5} . X | \end{matrix},

(17)

{\begin{matrix} X_{1} = X_{A t t a c k e r} - a_{1} (d_{A t t a c k e r}) \\ X_{2} = X_{B a r r i e r} - a_{2} (d_{B a r r i e r}) \\ X_{3} = X_{C h a s e r} - a_{3} (d_{C h a s e r}) \\ X_{4} = X_{D r i v e r} - a_{4} (d_{D r i v e r}) \\ X_{5} = X_{n a v i g a t o r} - a_{5} (d_{n a v i g a t o r}) \end{matrix},

(18)

X (t + 1) = \frac{X_{1} + X_{2} + X_{3} + X_{4} + X_{5}}{5}

(19)

In the proposed NMOChOA framework, the optimization process is extended to handle multiple conflicting objectives simultaneously, rather than focusing on a single performance metric. This multi-objective setting is essential in DL hyper-parameter tuning. To achieve this, the algorithm employs non-dominated sorting, which ranks candidate solutions based on Pareto dominance. Solutions that are not dominated by any other are assigned to the first Pareto front, while subsequent fronts contain increasingly dominated candidates. This mechanism ensures that the search process explores a diverse set of trade-off solutions, preventing the optimizer from collapsing onto a single objective and enabling it to discover well-balanced configurations. By integrating non-dominated sorting with the enhanced five-chimp cooperation model, the NMOChOA framework achieves a more comprehensive and globally optimal balance among competing optimization criteria. Algorithm 1 presents the pseudocode of the proposed NMOChOA, which extends the standard multi-objective ChOA by introducing an additional adaptive agent, the navigator chimp, and by explicitly managing a non-dominated archive.

From a broader perspective, the proposed NMOChOA can be viewed as a task-oriented optimization mechanism that aligns with recent trends in automated model design, such as AutoML and neural architecture search (NAS). While AutoML and NAS frameworks aim to automatically discover optimal model architectures or hyper-parameter configurations in a generic and data-agnostic manner, NMOChOA is specifically tailored to the requirements of anomaly detection in multi-scale energy systems. By focusing on multi-objective optimization of learning stability, detection accuracy, convergence behavior, and computational efficiency, the proposed optimizer provides a lightweight and domain-aware alternative to fully automated AutoML pipelines. This design choice enables effective hyper-parameter tuning for complex hybrid architectures without incurring the high computational overhead typically associated with large-scale NAS or AutoML frameworks [43,44].

Algorithm 1 Pseudo-codes of NMOChOA

Begin NMOChOA

%% Parameter Setting

Initialization of chimp population, control parameters f, m, a, and c

%% Initial population

for n = 1 to population size do

Create population

Calculate the fitness function

end

%% Sorting

Randomly assign chimps into five groups (attacker, barrier, chaser, driver, navigator)

Initialize archive with first non-dominated solutions

%%Loop

for i = 1 to Max iteration do

for each chimp do

Determine group of chimp

Update control parameters f, m, and c based on group strategy

Compute parameters a and d

end

for each explorer chimp do

if µ < 0.5

if |a| < 1

Update the position by Equation (9)

else if

Select a random chimp

end if

else if µ > 0.5

Update the position by Chaotic value

end if

end

Update f, m, a, and c

Update chimps position

Find non-dominated solutions and update archive

if archive is full

Remove one solution from most crowded region

Store new non-dominated solutions in archive

end if

Return best chimp

end

End NMOChOA

2.5. Proposed ODRL-GAN

Figure 6 presents the complete structure of the proposed ODRL–GAN model, which integrates DRL, GAN, and the NMOChOA. This hybrid architecture is designed to extract high-level spatio-temporal patterns, adapt to dynamic operational conditions, and automatically optimize hyper-parameters for improved anomaly detection performance in multi-scale energy systems. At the core of the framework lies the agent–environment interaction loop. The environment corresponds to the real-time state of the energy system, constructed from the processed PSML and LEAD1.0 datasets. Each state contains multi-domain measurements such as power flow, voltage levels, hydrogen flow rates, renewable generation fluctuations, and other operational indicators. This multi-scale feature vector is fed into the DRL agent, which evaluates the state and selects an action based on its learned policy. The action represents a classification decision (normal or anomalous) or an update in the internal representation that guides GAN-assisted decision-making. The environment then produces a reward based on the correctness of the agent’s decision, reinforcing actions that correctly detect anomalies and penalizing misclassified states. This continuous reward-driven feedback allows the DRL agent to dynamically adapt to evolving system conditions and learn optimal anomaly detection behaviors.

Embedded within the agent is the GAN module, which plays a critical supportive role. The generator synthesizes artificial samples that mimic the distribution of normal operational data, while the discriminator attempts to distinguish between real measurements and the generator’s output. This adversarial learning process forces the discriminator to become highly sensitive to distributional deviations—precisely the kind that characterize subtle faults or cyber–physical anomalies. The discriminator’s confidence scores and reconstruction errors are fed back into the DRL agent as enriched state features, enabling the agent to capture both temporal dependencies and structural abnormalities. In this way, the GAN complements the DRL policy by providing deeper insight into the underlying data manifold.

In the ODRL–GAN framework, the action selected by the DRL agent represents a policy-level decision rather than a mere final classification output. Specifically, the action determines how GAN-derived anomaly cues are interpreted and weighted under different operating conditions. In this sense, the DRL agent does not explicitly tune reconstruction thresholds or loss-function weights; instead, it implicitly learns a context-aware decision policy that balances generative discrepancy information against raw system measurements based on the observed state. This policy-level control enables the agent to dynamically adjust its sensitivity to anomalies in response to changing system dynamics. For example, during transient operating conditions or short-lived fluctuations, the DRL policy can suppress over-reactive decisions that would otherwise lead to false alarms, whereas under sustained deviations or gradual drifts, the policy increases the emphasis on GAN-derived discrepancy cues. Consequently, the DRL agent functions as an adaptive decision coordinator that refines anomaly detection behavior over time, providing interpretability in terms of how and when anomaly evidence is acted upon rather than relying on static decision boundaries.

The reward function is explicitly designed to guide the DRL agent toward reliable anomaly detection while balancing detection sensitivity and operational stability. At each time step

t

, the agent observes the system state

s_{t}

and selects an action

a_{t}

, which corresponds to an anomaly-related decision. The reward

r_{t}

is then computed based on the agreement between the predicted label

{\hat{y}}_{t}

and the ground-truth label

y_{t}

. A positive reward is assigned when the agent correctly identifies the system state (normal or anomalous), reinforcing accurate decision-making. Conversely, misclassifications incur negative rewards with asymmetric penalties. False positives are penalized to discourage excessive alarm generation, which can lead to unnecessary operational interventions and reduced trust in monitoring systems. False negatives receive a larger penalty, reflecting their higher operational risk, as missed anomalies may propagate across interconnected subsystems and compromise system reliability. This asymmetric reward design encourages the agent to learn a context-aware policy that minimizes false alarms while maintaining high sensitivity to genuinely abnormal operating conditions.

The NMOChOA optimizer, positioned at the top of the architecture, ensures that all hyper-parameters governing the DRL agent and GANs (such as learning rates, discount factors, discriminator thresholds, and generator depth) are tuned automatically. Starting with an initial population of candidate parameter sets, the optimizer evaluates their fitness using multi-objective criteria including accuracy, convergence stability, runtime efficiency, and variance reduction. Through non-dominated sorting and cooperative updates involving attacker, barrier, chaser, driver, and the newly introduced navigator chimp, the optimizer iteratively refines the solution space. Once the stopping criterion is met, the best-performing hyper-parameter set is selected and fed into the ODRL–GAN model, significantly strengthening its overall performance.

The proposed ODRL–GAN model offers several key advantages compared to existing anomaly detection frameworks. First, the bidirectional coupling of DRL and GAN allows the model to simultaneously leverage temporal reasoning and distributional reconstruction, which dramatically enhances robustness against both gradual drifts and abrupt anomalies. The DRL agent excels in adapting to dynamic conditions, while the GAN discriminator captures subtle irregularities that traditional classifiers often overlook. Second, the integration of NMOChOA with the Navigator Chimp ensures a globally optimized configuration of all learning components. Unlike manual or grid-based tuning, which is computationally expensive and prone to suboptimal outcomes, the multi-objective optimizer explores the search space adaptively and finds parameter sets that balance accuracy, convergence speed, and computational efficiency. Third, the architecture is highly scalable and generalizable. By combining unsupervised GAN learning with reward-driven RL behavior, the model can generalize to unseen operational patterns and maintain performance even when system conditions fluctuate significantly.

To clarify the learning workflow in Figure 6, the proposed ODRL–GAN follows a staged training strategy in which the GAN module is first trained to model the nominal operating behavior of the multi-scale energy system. Specifically, the generator–discriminator pair is trained using normal operational samples extracted from the processed PSML and LEAD1.0 data streams, so that the discriminator becomes sensitive to distributional deviations and the GAN outputs (e.g., discriminator confidence and reconstruction-related signals) become reliable indicators of abnormality. This pre-training step stabilizes adversarial learning and ensures that the GAN can provide informative anomaly-aware features before policy learning begins.

After the GAN has converged, it is embedded inside the DRL agent to enrich the state representation used for policy learning. At each time step, the environment constructs the system state from multi-domain measurements (e.g., power-flow/voltage-related indicators, hydrogen/thermal variables, renewable fluctuations, and other operational signals), and the agent receives this state together with GAN-derived cues (such as discriminator confidence scores and deviation-related measures). The DRL agent then selects an action according to its policy π, where the action corresponds to the anomaly decision (normal vs. anomalous) and/or an internal decision update guided by GAN-assisted evidence. The environment returns a reward based on detection correctness, thereby reinforcing decisions that correctly identify subtle faults and cyber–physical disturbances and penalizing misclassifications; through this reward-driven loop, the policy is refined to handle evolving operational conditions.

The NMOChOA block at the top of Figure 6 operates as the hyper-parameter optimization layer that determines the best configuration of the coupled DRL and GAN components prior to final training. Starting from an initial population of candidate parameter sets, NMOChOA evaluates each set using multi-objective fitness criteria, and iteratively updates the population through non-dominated sorting and cooperative position updates. Once the stopping criterion is satisfied, the best parameter set is selected and used to train the final ODRL–GAN model. During inference, the tuned and trained GAN continues to provide anomaly-sensitive feature cues within the agent, while the trained DRL policy performs real-time decision-making on incoming system states, enabling robust anomaly detection across heterogeneous multi-scale operating regimes.

3. Results

All experiments in this study were implemented in Python 3.10, using a consistent software environment to ensure reproducibility across the proposed ODRL–GAN model and all baseline architectures. DL components (including the DRL, GAN, CNN, GRU, transformer, and DBN) were developed using PyTorch 2.1.0, while classical ML baselines such as SVM were implemented using scikit-learn 1.3.2. Data preprocessing steps (imputation, normalization, feature extraction, and encoding) were carried out using NumPy 1.26.0, Pandas 2.1.1, and SciPy 1.11.3. The multi-objective optimization process and search dynamics were executed in a standalone Python module built on NumPy, ensuring consistent numerical behavior. All simulations were conducted on a workstation equipped with an Intel Core i7-12700K CPU (12 cores, 4.9 GHz boost), 32 GB DDR4 RAM, and an NVIDIA GeForce RTX 3080 GPU (10 GB VRAM). To maintain fairness, all architectures (proposed and baseline) were trained using the same preprocessing pipeline, identical train–test splits for PSML and LEAD1.0, and matched stopping criteria.

To provide a comprehensive and fair comparison, the proposed ODRL–GAN framework is evaluated against seven widely used and representative baseline models: Transformer, DRL, DBN, GAN, CNN, GRU, and SVM. These models were selected because they collectively cover a diverse spectrum of learning paradigms which allows for rigorous benchmarking across multiple dimensions of the anomaly detection task. Transformer is included due to its strong performance in capturing long-range temporal dependencies, making it a natural candidate for multi-scale energy systems with complex temporal dynamics. DRL serves as a baseline to highlight the added value of integrating policy learning with adversarial reconstruction in the proposed method. DBN, as a deep probabilistic model, provides insight into how traditional layered generative architectures perform compared to modern adversarial networks. GAN alone is used to benchmark the reconstruction and generative capabilities independent of reinforcement learning. CNN is included for its proven effectiveness in extracting local patterns and structural correlations in multivariate sensor data, while GRU provides a lightweight and efficient recurrent baseline for modeling short- and mid-term temporal dependencies. Finally, SVM represents a classical, non-deep learning baseline, enabling evaluation against traditional decision-boundary–based anomaly detection approaches.

To rigorously evaluate the performance of the proposed ODRL–GAN model and all baseline methods, several quantitative metrics were employed, including accuracy, AUC, recall, RMSE, variance, runtime, inference latency, and a two-sample t-test for statistical significance. Equation (20) defines the Accuracy, which measures the proportion of correctly identified samples, reflecting the model’s overall classification capability:

A c c u r a c y = \frac{t r u e p o s i t i v e + t r u e n e g a t i v e}{t r u e p o s i t i v e + t r u e n e g a t i v e + f a l s e p o s i t i v e + f a l s e n e g a t i v e}

(20)

Equation (21) represents the area under the receiver operating characteristic (ROC) curve, computed as the integral of the ROC curve across all thresholds. AUC quantifies the model’s ability to discriminate between normal and anomalous samples across varying decision thresholds. Higher AUC values indicate stronger separability and reduced sensitivity to class imbalance.

A U C = \int_{0}^{1} R O C (t) d t

(21)

where

R O C (t)

is ROC curve at threshold

t

.

Equation (22) defines Recall, which measures the proportion of actual anomalies that the model correctly identifies. Recall is particularly important in anomaly detection, where missing a true anomaly (false negative) can be significantly more costly than raising a false alarm.

R e c a l l = \frac{t r u e p o s i t i v e}{t r u e p o s i t i v e + f a l s e n e g a t i v e}

(22)

Equation (23) specifies the RMSE, which captures the discrepancy between observed values and the model’s predictions.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {[x_{i} - {\hat{x}}_{i}]}^{2}},

(23)

where

x_{i}

is the observed value;

{\hat{x}}_{i}

is the calculated value.

Beyond these standard metrics, variance across multiple runs was measured to evaluate training stability and model robustness. Lower variance indicates more consistent performance under different initializations, random seeds, and data shuffling. Runtime was recorded to quantify computational efficiency during training, while inference latency measured the time required for the model to classify a single new state. Finally, a two-sample t-test was performed to determine whether the improvements achieved by the proposed ODRL–GAN model over baseline methods were statistically significant.

Hyper-parameter tuning plays a critical role in training DL–based anomaly detection models. Parameters such as learning rate, Batch size, and network depth directly influence the stability, convergence behavior, and generalization capability of the model. Without proper tuning, even advanced architectures may perform suboptimally, diverge during training, or become overly sensitive to noise and data imbalance. Therefore, fine-tuning hyper-parameters is essential for achieving reliable and high-performance anomaly detection. Table 1 summarizes the final hyper-parameter values selected for all models. To ensure optimal performance of the proposed ODRL–GAN framework, hyper-parameters were optimized using the navigator-augmented NMOChOA. This choice is motivated by the algorithm’s strong ability to balance accuracy, convergence speed, and stability while navigating complex and high-dimensional search spaces. In contrast, the baseline models were tuned using a systematic grid search approach. Grid search provides a straightforward and fair method for baseline tuning, ensuring consistent comparison without introducing additional meta-heuristic advantages to competing models. It is important to emphasize that the values listed in the table represent only the final optimized selections, whereas each parameter initially spanned a wide range of candidate values during the search process.

For the proposed ODRL–GAN, the optimal learning rate was found to be 0.005, which provided the best balance between convergence speed and gradient stability. The discount factor γ = 0.92 enabled the DRL agent to incorporate long-term reward information effectively. An ε-greedy value of 0.46 ensured a stable exploration–exploitation trade-off, while a batch size of 64 offered consistent mini-batch gradient behavior. The optimizer converged with a threshold of 0.072, using Tanh and Sigmoid activations to stabilize GAN training. Proposed NMOChOA identified an optimal population size of 80, with 300 iterations, and dynamically tuned coefficient vectors

a

and

f

within the ranges shown in the table. For the standard DBN, the optimal learning rate was 0.002, with a batch size of 64, dropout rate of 0.2, and three hidden layers of 32 neurons each. The combination of Tanh and Sigmoid activations produced the most stable likelihood gradients.

The transformer achieved its best performance with a learning rate of 0.003, feed-forward dimension of 2048, weight decay of 0.02, dropout of 0.2, six attention heads, and four encoder layers. The Gaussian error linear unit (GELU) activation function provided smooth nonlinearities beneficial for multi-scale energy data. For the GRU, the optimal configuration included a sequence length of 6, learning rate of 0.005, six GRU layers with 64 units each, and a dropout rate of 0.2, using Adam as the optimizer. The CNN achieved maximum stability using 8 convolutional layers, kernel size 5 × 5, max-pooling (2 × 2), and 4 hidden layers of 64 neurons, with Tanh activation and Adam optimizer. Finally, the SVM performed best with a linear and RBF kernel combination, gamma value of 0.002, and 300 estimators, which allowed the model to establish non-linear decision boundaries suitable for anomaly detection.

Table 2 summarizes the classification performance of all evaluated models across the PSML and LEAD1.0 datasets using accuracy, recall, and AUC metrics. The results clearly show that the proposed ODRL–GAN framework consistently delivers the highest performance across both datasets. On PSML, ODRL–GAN achieves an Accuracy of 99.79%, Recall of 99.89%, and AUC of 99.95%, substantially outperforming all other deep learning and classical baselines. The nearest competitor, Transformer, reaches 91.23% Accuracy and 93.05% AUC. Traditional models such as DBN, CNN, and SVM exhibit weaker performance, reflecting their limited capability in modeling the nonlinear and high-dimensional dynamics of multi-scale energy systems. A similar trend appears on the more challenging LEAD1.0 dataset, where model performance naturally decreases due to its higher heterogeneity and multi-carrier complexity. Even so, ODRL–GAN maintains superior results with 99.51% accuracy, 99.83% recall, and 99.91% AUC, significantly ahead of the next best performers. Models such as GRU and GAN show moderate performance, while DRL and Transformer perform reasonably but still fall short of capturing the intricate anomaly patterns present in LEAD1.0. Figure 7 provides a visual representation of the numerical results reported in Table 2 using grouped bar charts. From the visual trend, it is immediately evident that ODRL–GAN consistently dominates all baselines, outperforming them across all three evaluation metrics in both datasets.

Figure 8 presents the ROC curves for all evaluated models on the PSML and LEAD1.0 datasets. The ROC curve plots sensitivity against (1–specificity), showing how each model’s discriminative ability changes across varying decision thresholds. Models with curves that bow closer to the top-left corner exhibit stronger anomaly-detection capability and more reliable separation between normal and anomalous samples. The curves clearly demonstrate the superior behavior of the proposed ODRL–GAN model, whose ROC line consistently dominates all others, especially in the low–false-positive region. This strong performance stems from the combined strength of GAN-based reconstruction (which enhances subtle anomaly differentiation) and DRL-based policy learning, which adaptively refines decision boundaries in dynamic conditions. Baseline architectures such as Transformer and DRL show moderate curvature improvement due to their temporal modeling capabilities, yet they still cannot match the synergistic generative–policy integration of ODRL–GAN. Models like DBN, CNN, and SVM show significantly weaker ROC profiles, reflecting their limited capacity for capturing the multi-scale nonlinear interactions present in energy systems. The ROC plots visually reinforce that the hybrid DRL–GAN architecture, supported by NMOChOA-driven fine-tuning, provides the most robust and reliable anomaly detection across both datasets.

Figure 9 illustrates the training dynamics of all competing architectures by tracking the RMSE as a function of training epochs for both datasets. Each curve represents how rapidly and smoothly a model reduces its prediction error, allowing a direct comparison of convergence rate, stability, and final error level. The ODRL-GAN consistently demonstrates the steepest decline in RMSE and reaches its minimum value significantly earlier than the other models, reflecting the strong synergy between the GAN-based synthetic data generation, the DRL policy optimization, and the NMOChOA-driven parameter tuning. The convergence patterns further highlight several performance insights. ODRL-GAN stabilizes at near-zero RMSE within roughly 80–100 epochs, whereas Transformer and DRL require almost 150–180 epochs to reach comparable stability and still plateau at noticeably higher RMSE. Classical deep models such as DBN, CNN, and GRU converge more slowly and remain less accurate throughout training, while SVM exhibits the highest error and the slowest decay due to its limited capacity for modeling nonlinear temporal-statistical dependencies. These results confirm that the proposed architecture not only achieves lower final error but also learns more efficiently, demonstrating superior convergence speed, robustness, and optimization effectiveness across both datasets.

To explicitly assess the necessity of each component and address concerns regarding potential over-engineering, a comprehensive ablation study is conducted by systematically enabling and disabling DRL, GAN, and the optimization module, as summarized in Table 3. By systematically enabling or disabling each module (DRL, GAN, and NMOChOA) and by replacing the proposed optimizer with several competing meta-heuristics, the table quantifies the individual and combined contributions of these components. This evaluation also allows us to assess the effectiveness of the proposed optimizer relative to standard or well-established alternatives. The meta-heuristic optimizers evaluated in Table 3 include the NMOChOA, MOOA, MOChOA, MOGWO, and MOPSO, providing a diverse set of search dynamics for a fair comparison within the DRL–GAN framework.

The results clearly show that ODRL–GAN equipped with the proposed NMOChOA achieves the highest performance across all metrics and datasets. On PSML, ODRL–GAN (NMOChOA) reaches 99.79% accuracy, 99.89% recall, and 99.95% AUC. Competing optimizers such as MOOA, MOChOA, MOGWO, and MOPSO improve the baseline DRL–GAN performance but still remain 3–6 percentage points below NMOChOA. This performance gap highlights the superiority of the navigator-augmented search strategy, which enhances exploration–exploitation balance and prevents premature convergence, especially in high-dimensional hyper-parameter landscapes. The partial configurations further confirm the complementary roles of DRL and GAN. Models such as NMOChOA–DRL and NMOChOA–GAN outperform their non-optimized counterparts but remain significantly weaker than the full ODRL–GAN (NMOChOA). On average, these variants lag by 7–8% in Accuracy and AUC, demonstrating that DRL alone cannot capture deep generative structure, and GAN alone cannot perform adaptive decision-making.

From an ablation perspective, the results in Table 3 clearly demonstrate that the proposed ODRL–GAN framework is not an unnecessarily over-engineered solution, but rather a carefully integrated architecture in which each component plays a distinct and indispensable role. Removing the GAN module significantly reduces the model’s sensitivity to distributional deviations, particularly for subtle or early-stage anomalies. Conversely, excluding the DRL component limits the framework’s ability to adaptively refine detection decisions under dynamic operating conditions. Furthermore, replacing the proposed NMOChOA with alternative meta-heuristic optimizers or fixed hyper-parameters leads to noticeable degradation in accuracy, stability, and convergence consistency. These findings confirm that the superior performance of ODRL–GAN arises from the synergistic interaction of DRL, GAN, and NMOChOA, rather than from excessive architectural complexity.

As shown in Table 3, configurations that exclude the DRL component consistently exhibit higher variance and reduced robustness, particularly under dynamic and heterogeneous operating conditions. This behavior indicates that the DRL agent contributes beyond static classification by enforcing temporal consistency and context-aware decision refinement. In practice, the DRL policy learns to differentiate between benign transients and genuinely abnormal operating patterns by modulating the influence of GAN-derived anomaly cues over time. This adaptive behavior is reflected in the reduced false-alarm rates and improved recall stability observed across both PSML and LEAD1.0 datasets. Therefore, the interpretability of the DRL agent emerges at the decision-policy level: rather than reshaping the feature space explicitly, the agent learns when anomaly evidence should be trusted and how detection sensitivity should evolve under changing system conditions.

The same trend persists on the more challenging LEAD1.0 dataset, reinforcing the robustness of the proposed design. While all optimizers experience a natural performance drop due to the dataset’s heterogeneity and multi-carrier complexity, ODRL–GAN (NMOChOA) still achieves 99.51% Accuracy and 99.83% Recall, outperforming the best competitor by a substantial margin. Other optimizers improve performance relative to untuned DRL–GAN or standalone DRL/GAN models but remain unable to match the stability, discriminative power, and detection sharpness offered by the NMOChOA-enhanced framework. Table 3 confirms that the triple combination of DRL + GAN + NMOChOA delivers the strongest, most stable, and most generalizable performance.

To rigorously validate whether the proposed NMOChOA represents a genuine algorithmic improvement rather than a task-specific tuning, a comprehensive benchmark evaluation is conducted using 23 widely adopted standard test functions. These benchmark problems are designed to cover a broad spectrum of optimization characteristics, including unimodal and multimodal landscapes, low- and high-dimensional search spaces, smooth and highly irregular objective surfaces, as well as varying degrees of nonlinearity and local optima density. Such diversity enables an objective assessment of the optimizer’s generalization capability beyond the target application. Table 4 summarizes the key characteristics of the selected benchmark functions, including their dimensionality, modality type, and search ranges. Specifically, the benchmark set includes classical functions such as Ackley, Rastrigin, Rosenbrock, Schwefel, Levy–Montalvo, and Powell, alongside several challenging low-dimensional multimodal problems (e.g., Bohachevsky, Hosaki, Camel Back, and Schaffer functions). This combination ensures that the optimizer is evaluated under both exploratory-dominated and exploitation-dominated scenarios, which is essential for assessing the balance between global search ability and convergence precision. By applying NMOChOA to this benchmark suite, the effectiveness of the proposed enhancements can be systematically examined. Table 5 reports the quantitative benchmarking results of the proposed NMOChOA and competing meta-heuristic algorithms on the 23 standard test functions described in Table 4. For each benchmark problem, all algorithms were independently executed 30 times to ensure statistical reliability. The best objective value (Best) and the standard deviation (SD) across runs are reported to evaluate both optimization accuracy and convergence stability.

The results demonstrate that NMOChOA consistently outperforms the competing algorithms across the majority of benchmark functions. Specifically, the proposed NMOChOA achieves the best objective value on 20 out of 23 test functions, indicating superior global optimization capability. In addition, NMOChOA attains the lowest standard deviation on 19 test functions, reflecting significantly improved convergence stability and robustness against random initialization. These findings confirm that the navigator-guided multi-objective search mechanism effectively preserves population diversity while enabling precise convergence toward global optima.

In comparison, MOOA achieves the best objective value on 6 functions, MOChOA on 5 functions, MOGWO on 4 functions, and MOPSO on 2 functions, demonstrating that while these algorithms perform competitively on selected problems, their performance remains inconsistent across diverse optimization landscapes. This variability is particularly evident in highly multimodal functions, where premature convergence and unstable search dynamics are more pronounced. For the remaining three benchmark functions where NMOChOA does not strictly reach the global optimum, the obtained solutions remain very close to the optimal values, with negligible performance gaps. This indicates that the proposed algorithm maintains strong approximation capability even in the most challenging scenarios. Overall, the results in Table 5 provide compelling evidence that the proposed NMOChOA offers superior optimization accuracy, stability, and generalization performance across a wide range of unimodal and multimodal benchmark problems.

To demonstrate the practical relevance of the proposed ODRL–GAN framework from a power engineering perspective, representative anomaly patterns detected in the PSML and LEAD1.0 datasets are analyzed and interpreted in terms of real operational events. Rather than treating anomalies as abstract classification outputs, the detected patterns are examined based on their temporal characteristics, affected variables, and potential impact on system operation. In the PSML dataset, several detected anomalies correspond to abrupt deviations in power flow and voltage-related measurements under otherwise stable operating conditions. For example, the proposed method identifies short-duration power-flow spikes with amplitudes exceeding normal operating ranges by approximately 18–25%, which are typically associated with switching events, incipient faults, or abnormal coupling between electrical and thermal subsystems. The ODRL–GAN framework detects these events at an early stage, with an average detection delay of fewer than 3 sampling intervals, whereas conventional DL baselines exhibit delayed responses or misclassify such events as normal transients.

In the LEAD1.0 dataset, the proposed approach successfully captures gradual drifts and oscillatory patterns in multi-carrier energy variables, such as sustained deviations in renewable generation and hydrogen flow rates. These anomalies often indicate equipment degradation, sensor faults, or unstable operating regimes that evolve over longer time horizons. Quantitatively, ODRL–GAN achieves a reduction in false alarms by approximately 15–22% compared to standalone GAN and DRL models, while maintaining high recall for slowly developing anomalies. From an operational standpoint, the early detection of these anomalies is critical for preventing fault propagation, maintaining grid stability, and protecting physical assets. The combined use of GAN-based distributional sensitivity and DRL-based adaptive decision-making enables the proposed framework to distinguish genuine abnormal events from benign fluctuations, thereby improving reliability under real-world operating conditions.

Table 6 summarizes representative anomaly patterns detected by the proposed ODRL–GAN framework in the PSML and LEAD1.0 datasets and maps them to their corresponding power-engineering interpretations and practical implications. The results demonstrate that the proposed model does not merely flag abstract outliers, but consistently identifies anomaly signatures that align with physically meaningful operating events in multi-energy systems. In particular, short-term spikes in power-flow and voltage-related variables observed in the PSML dataset are indicative of switching events or incipient faults, which, if left undetected, may trigger local instability or propagate across interconnected subsystems. Similarly, cross-domain deviations involving coupled electrical and thermal signals highlight abnormal subsystem interactions that can lead to cascading failures. For the LEAD1.0 dataset, gradual drifts in renewable generation and hydrogen flow variables are associated with equipment degradation, while oscillatory patterns reflect unstable operating regimes under multi-carrier coupling. These findings confirm that the proposed ODRL–GAN framework effectively captures both abrupt and slowly evolving anomalies and provides actionable insights that are directly relevant to system reliability, asset protection, and grid stability in real-world power and energy infrastructures.

4. Discussion

The previous section presented the predictive performance of all evaluated architectures across the PSML and LEAD1.0 datasets based on accuracy-oriented metrics. These results demonstrated clear performance differences among the models and highlighted the superiority of the proposed ODRL–GAN framework in terms of detection capability. However, accuracy-based metrics alone do not fully reflect a model’s suitability for deployment in real-world, latency-sensitive energy management environments. To understand whether an architecture can reliably operate under operational constraints, additional dimensions (such as computational efficiency, temporal responsiveness, and statistical stability) must be considered alongside the core predictive metrics.

Accordingly, this discussion extends the evaluation toward a more holistic assessment by examining several crucial dimensions: runtime and computational complexity, which determine scalability for large-scale or high-frequency data streams; inference latency, a critical factor for real-time anomaly response; variance across multiple runs, which reflects the stability and robustness of the training process; and statistical significance testing through the two-sample t-test, ensuring that observed improvements arise from genuine algorithmic advantages rather than random fluctuation. These complementary evaluations help determine not only the predictive strength but also the operational reliability and generalizability of the proposed ODRL–GAN, thereby clarifying its practical viability for real industrial and multi-energy management applications.

Table 7 and Table 8 present a detailed comparison of execution time for all competing architectures when training is terminated at different RMSE thresholds. These tables collectively evaluate computational efficiency, early-training convergence, and the scalability of each model under progressively stricter accuracy constraints. Table 7 corresponds to PSML, while Table 8 reports the same metrics for LEAD1.0. Across both datasets, the ODRL-GAN model consistently exhibits the lowest runtime in every stopping condition, demonstrating outstanding training efficiency. For PSML, ODRL-GAN reaches RMSE < 12 in only 36 s, RMSE < 9 in 68 s, and RMSE < 6 in 126 s, outperforming all baselines by a large margin. LEAD1.0 shows a similar behavior, with ODRL-GAN achieving RMSE < 12 in 45 s and RMSE < 9 in 82 s. This efficiency stems from the synergy between the DRL policy, GAN-based synthetic enhancement, and the NMOChOA optimizer, which accelerates gradient stabilization and reduces redundant exploration during training. The presence of the GAN component also helps the discriminator converge faster by providing structured synthetic samples early in training.

Other architectures show noticeably slower convergence. The Transformer maintains reasonable performance but still requires significantly more runtime—e.g., 128–749 s on PSML depending on the threshold, due to its heavy attention operations and deeper representation layers. DRL alone shows moderate efficiency but lacks the stabilizing contribution of GAN-generated samples, causing training oscillations that extend runtime (e.g., 389–826 s on PSML for RMSE < 9 and <6). DBN, GAN, GRU, and CNN fall even further behind, often unable to reach stricter RMSE thresholds, especially RMSE < 6 or RMSE < 3. Their slower or incomplete convergence reflects architectural limitations such as shallow temporal modeling (CNN), vanishing-gradient issues (DBN, GRU), or unstable adversarial training (GAN). SVM, lacking iterative gradient-based convergence, is the slowest overall and fails to reach most RMSE targets.

Table 9 reports the average inference latency of each model when deployed on the PSML and LEAD1.0 datasets, offering insight into their real-time operational efficiency. The results show that SVM achieves the lowest latency (5.6 ms on PSML and 6.1 ms on LEAD1.0), consistent with its lightweight structure and absence of deep representation learning. In contrast, GRU presents the highest latency (8.1 and 8.2 ms), attributed to its recurrent nature and sequential gating operations. The ODRL-GAN model maintains a competitive latency profile (7.2 and 7.6 ms), remaining only slightly higher than shallow architectures yet significantly outperforming other deep models such as Transformer (7.3 and 7.9 ms) and GAN (7.1 and 7.4 ms). This balance demonstrates that although ODRL-GAN integrates both DRL decision-making and GAN-based generative modeling, its optimization via NMOChOA keeps inference overhead controlled. Meanwhile, DBN and CNN remain within moderate latency ranges, reflecting their fixed-layer feed forward computations. These results indicate that ODRL-GAN sustains high predictive accuracy while preserving practical inference latency suitable for real-time anomaly detection in smart grid applications. Its latency remains low enough for time-sensitive monitoring scenarios, ensuring that enhanced performance does not come at the cost of deployment feasibility.

Table 10 reports the variance of each architecture over 35 independent executions, offering a direct indicator of model stability and robustness against initialization noise and stochastic training dynamics. The values show that ODRL-GAN consistently achieves the lowest variance on both PSML (0.00009) and LEAD1.0 (0.00013), demonstrating extremely stable behavior with almost negligible fluctuations between runs. In contrast, conventional learning-based models such as Transformer, DRL, DBN, GAN, GRU, and CNN show noticeably higher variance, reflecting greater sensitivity to training randomness. The SVM model exhibits the highest variance on both datasets, indicating poor robustness under repeated sampling and retraining. Overall, the analysis confirms that proposed ODRL-GAN not only attains superior accuracy but also delivers exceptional stability, making it a reliable choice for real-world applications where consistency across deployments is essential.

Table 11 presents the results of pairwise statistical t-tests conducted between the proposed ODRL-GAN framework and each competing model on both datasets (PSML and LEAD1.0). The table reports p-values and their corresponding significance decisions, using a strict significance threshold of 0.01. This analysis evaluates whether the performance improvements delivered by ODRL-GAN are statistically meaningful rather than occurring by chance. The reported p-values across all comparisons are consistently far below 0.01 for both datasets, demonstrating that the performance differences between proposed ODRL-GAN and every baseline model (including transformer, DRL, DBN, GAN, GRU, CNN, and SVM) are statistically significant. This confirms that the observed accuracy, recall, AUC, RMSE, and stability gains are not random fluctuations but reflect genuine superiority in the model’s learning dynamics and optimization behavior. The consistently significant outcomes also highlight the strong contribution of the integrated DRL–GAN synergy and the proposed NMOChOA hyper-parameter optimizer, which together yield more robust generalization and reliably improved detection performance across multiple experimental runs.

The extended analysis (covering run-time behavior, inference latency, variance, and t-tests) confirms that the proposed ODRL-GAN is not only accurate but also computationally practical and highly reliable for real-world deployment. The model consistently reaches low RMSE levels far faster than competing architectures, maintains one of the lowest inference delays across both datasets, and exhibits variance values that are orders of magnitude smaller than all baselines, highlighting exceptional stability and repeatability. Moreover, the t-test results demonstrate that the improvements are statistically meaningful rather than incidental. When combined with the superior accuracy, recall, and AUC observed earlier, these characteristics indicate that ODRL-GAN offers the rare balance of precision, robustness, and efficiency required for real-time, safety-critical, and large-scale energy system applications, where both reliability and operational responsiveness are essential.

From a deployment perspective, the proposed ODRL–GAN framework is well suited for edge and embedded energy monitoring environments. In practical settings, the computationally intensive training phase, including DRL policy learning, GAN pre-training, and NMOChOA-based hyper-parameter optimization, can be performed offline on centralized servers or cloud infrastructure. Once trained, the resulting lightweight inference model can be deployed on edge devices such as substation controllers, energy gateways, or industrial embedded platforms. During online operation, anomaly detection relies only on forward inference and policy execution, which significantly reduces computational overhead and memory requirements. This design enables real-time monitoring close to the data source, minimizes communication latency, and enhances resilience against network disruptions. Such an edge-oriented deployment paradigm aligns with the requirements of modern smart grids and integrated energy systems, where fast local decision-making and scalability are critical.

5. Conclusions

Modern smart power systems increasingly rely on high-volume, non-linear, and rapidly fluctuating data streams, making accurate and stable anomaly detection essential for operational safety and reliability. To address these challenges, this paper introduced a novel ODRL-GAN framework, which integrates deep reinforcement learning, GAN-based synthetic data enhancement, and a navigator-based NMOChOA for hyper-parameter tuning. The proposed model was extensively evaluated on two real-world benchmark datasets (PSML and LEAD1.0) and supported by a comprehensive data-preparation pipeline and a multi-stage training strategy.

Across both PSML and LEAD1.0 datasets, the proposed ODRL-GAN consistently achieved the strongest numerical performance among all compared methods, reaching up to 99.79% accuracy, 99.89% recall, and 99.95% AUC on PSML and 99.51%, 99.83%, 99.91%, respectively, on LEAD1.0. The ROC curves further showed a dominant sensitivity–specificity trade-off, with ODRL-GAN hugging the top-left region far more tightly than Transformer, DRL, DBN, GAN, GRU, CNN, and SVM. Training curves confirmed this advantage, with RMSE dropping to near zero within ~60 epochs, whereas competing models required ~200–300 epochs and still converged to notably higher errors. Runtime analysis demonstrated that ODRL-GAN reached low-error thresholds faster, needing only 36–68 s for RMSE < 12 or RMSE < 9 on PSML, compared to 128–351 s for Transformer and 193–593 s for GRU. Despite its deep architecture, inference latency remained highly competitive (≈7.2 and 7.6 ms), close to lightweight models like SVM (≈5.6 and 6.1 ms) while offering dramatically higher accuracy. Variance values were also exceptionally low (0.00009–0.00013), demonstrating unmatched stability across 35 repeated runs. Collectively, these numerical results validate ODRL-GAN as the most accurate, stable, and computationally efficient solution across all evaluation criteria.

From a practical perspective, the final outcomes confirm that the proposed model achieves a rare balance between predictive strength and operational reliability (offering high detection precision, fast convergence, low inference overhead, and strong run-to-run consistency). Such characteristics are essential for real-time monitoring in modern energy infrastructures where decisions often need to be made within milliseconds and under conditions of uncertainty. The demonstrated robustness across both datasets indicates that the framework can generalize across different energy modalities, making it suitable for deployment in multi-carrier systems, renewable-integrated grids, and large-scale monitoring platforms. Ultimately, the study validates ODRL-GAN as a dependable and high-performing solution for safeguarding critical energy infrastructures against anomalous behaviors.

Looking ahead, several promising directions can extend the capabilities of the proposed framework. Incorporating more advanced generative models (such as diffusion-based architectures) or integrating model-based RL components could further enhance reconstruction fidelity and decision stability under extreme operating conditions. Additionally, exploring lightweight or compressed variants of ODRL-GAN may enable efficient deployment at the edge in resource-constrained environments. Expanding the analysis to multi-node, fully distributed energy networks and incorporating spatiotemporal graph structures would allow the model to capture inter-device dependencies more effectively. Likewise, evaluating the framework under adversarial scenarios or real-time streaming conditions would provide deeper insight into its resilience in operational smart-grid environments.

Author Contributions

Conceptualization, A.E.O., M.K.D., P.S.M., N.S., F.H.-G., D.M., J.V.Á.-B.; methodology, A.E.O., M.K.D., P.S.M. and F.H.-G.; software, A.E.O., N.S. and F.H.-G.; validation, A.E.O., M.K.D., F.H.-G. and D.M.; formal analysis, M.K.D., P.S.M., N.S. and F.H.-G.; investigation, A.E.O., M.K.D., P.S.M., N.S., F.H.-G., D.M. and J.V.Á.-B.; resources, D.M. and J.V.Á.-B.; data curation, A.E.O., P.S.M., N.S. and F.H.-G.; writing—original draft preparation, A.E.O., M.K.D., P.S.M., N.S., F.H.-G., D.M. and J.V.Á.-B.; Writing—Review and Editing, F.H.-G., D.M. and J.V.Á.-B.; visualization, A.E.O., M.K.D., N.S. and F.H.-G.; supervision, D.M. and J.V.Á.-B.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Allen, R.C.; Iseri, F.; Demirhan, C.D.; Pappas, I.; Pistikopoulos, E.N. Improvements for decomposition-based methods utilized in the development of multi-scale energy systems. Comput. Chem. Eng. 2023, 170, 108135. [Google Scholar] [CrossRef]
Kaveh, M.; Ghadi, F.R.; Li, Z.; Yan, Z.; Jäntti, R. Secure backscatter communications through RIS: Modeling and performance. IEEE Trans. Veh. Technol. 2025, in press. [Google Scholar]
Chen, Z.; Li, X.; Liu, H.; Zhang, Y.; Wang, T.; Yang, J. Load prediction of integrated energy systems for energy saving and carbon emission based on a novel multi-scale fusion convolutional neural network. Energy 2024, 290, 130181. [Google Scholar] [CrossRef]
Khalid, M. Smart grids and renewable energy systems: Perspectives and grid integration challenges. Energy Strategy Rev. 2024, 51, 101299. [Google Scholar] [CrossRef]
Kaveh, M.; Ghadi, F.R.; Zhang, Y.; Yan, Z.; Jäntti, R. Voltage profile-driven physical layer authentication for RIS-aided backscattering tag-to-tag networks. IEEE Internet Things J. 2025, 12, 51099–51113. [Google Scholar] [CrossRef]
Diaba, S.Y.; Shafie-Khah, M.; Elmusrati, M. Cyber-physical attack and the future energy systems: A review. Energy Rep. 2024, 12, 2914–2932. [Google Scholar] [CrossRef]
Hseiki, H.A.; El-Hajj, A.M.; Ajra, Y.O.; Hija, F.A.; Haidar, A.M. A secure and resilient smart energy meter. IEEE Access 2024, 12, 3114–3125. [Google Scholar] [CrossRef]
Kaveh, M.; Yan, Z.; Jäntti, R. Secrecy performance analysis of RIS-aided smart grid communications. IEEE Trans. Ind. Inform. 2024, 20, 5415–5427. [Google Scholar] [CrossRef]
Aghazadeh Ardebili, A.; Ghaemi, N.; Fahimi, M.; Mirzaei, F.; Bressan, S.; Calì, A. Enhancing resilience in complex energy systems through real-time anomaly detection: A systematic literature review. Energy Inform. 2024, 7, 96. [Google Scholar] [CrossRef]
Yao, Y.; Han, T.; Yu, J.; Xie, M. Uncertainty-aware deep learning for reliable health monitoring in safety-critical energy systems. Energy 2024, 291, 130419. [Google Scholar] [CrossRef]
Sun, M.; He, L.; Zhang, J. Deep learning-based probabilistic anomaly detection for solar forecasting under cyberattacks. Int. J. Electr. Power Energy Syst. 2022, 137, 107752. [Google Scholar] [CrossRef]
Reshadi, M.; Li, W.; Xu, W.; Omashor, P.; Dinh, A.; Xiao, J.; Dick, S.; She, Y.; Lipsett, M. Deep–shallow metaclassifier with synthetic minority oversampling for anomaly detection in a time series. Algorithms 2024, 17, 114. [Google Scholar] [CrossRef]
Kaveh, M.; Mosavi, M.R. A lightweight mutual authentication for smart grid neighborhood area network communications based on physically unclonable function. IEEE Syst. J. 2020, 14, 4535–4544. [Google Scholar] [CrossRef]
Gaggero, G.B.; Girdinio, P.; Marchese, M. Artificial intelligence and physics-based anomaly detection in the smart grid: A survey. IEEE Access 2025, 13, 23597–23606. [Google Scholar] [CrossRef]
Kaveh, M.; Mosavi, M.R.; Martin, D.; Aghapour, S. An efficient authentication protocol for smart grid communication based on on-chip-error-correcting physical unclonable function. Sustain. Energy Grids Netw. 2023, 36, 101228. [Google Scholar] [CrossRef]
Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep learning for time series anomaly detection: A survey. ACM Comput. Surv. 2024, 57, 1–42. [Google Scholar] [CrossRef]
Najafi, F.; Kaveh, M.; Mosavi, M.R.; Brighente, A.; Conti, M. EPUF: An entropy-derived latency-based DRAM physical unclonable function for lightweight authentication in Internet of Things. IEEE Trans. Mob. Comput. 2024, 24, 2422–2436. [Google Scholar] [CrossRef]
Rafique, S.H.; Abdallah, A.; Musa, N.S.; Murugan, T. Machine learning and deep learning techniques for internet of things network anomaly detection—Current research trends. Sensors 2024, 24, 1968. [Google Scholar] [CrossRef] [PubMed]
Merlino, V.; Allegra, D. Energy-based approach for attack detection in IoT devices: A survey. Internet Things 2024, 27, 101306. [Google Scholar] [CrossRef]
Fährmann, D.; Martín, L.; Sánchez, L.; Damer, N. Anomaly detection in smart environments: A comprehensive survey. IEEE Access 2024, 12, 64006–64049. [Google Scholar] [CrossRef]
Ghadi, F.R.; Kaveh, M.; Martin, D.; Hernando-Gallego, F.; Wong, K. UAV-relay assisted RSMA fluid antenna system: Outage probability analysis. IEEE Wirel. Commun. Lett. 2025, 14, 2907–2911. [Google Scholar] [CrossRef]
Zheng, X.; Xu, N.; Trinh, L.; Wu, D.; Huang, T.; Sivaranjani, S.; Liu, Y.; Xie, L. A multi-scale time-series dataset with benchmark for machine learning in decarbonized energy grids. Sci. Data 2022, 9, 359. [Google Scholar] [CrossRef]
Debelle, T.; Sohrab, F.; Abrahamsson, P.; Gabbouj, M. Anomaly detection in smart power grids with graph-regularized MS-SVDD: A multimodal subspace learning approach. arXiv 2025, arXiv:2502.15793. [Google Scholar]
Mathonsi, T.; van Zyl, T.L. Statistics and deep learning-based hybrid model for interpretable anomaly detection. arXiv 2022, arXiv:2202.12720. [Google Scholar] [CrossRef]
Du, J.; Chen, N.; Gao, D.; Huang, Z. Spatial–temporal fault detection in power distribution networks via multivariate time series analysis. In Proceedings of the International Conference on Intelligent Computing; Singapore, 23–26 July 2025; Springer Nature: Singapore, 2025; pp. 197–213. [Google Scholar]
Biju, G.M.; Pillai, G.N. Hyperparameter optimization of long short-term memory models for interpretable electrical fault classification. IEEE Access 2023, 11, 123688–123704. [Google Scholar]
Fu, C.; Arjunan, P.; Miller, C. Trimming outliers using trees: Winning solution of the large-scale energy anomaly detection (LEAD) competition. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Boston, MA, USA, 9–10 November 2022; ACM: New York, NY, USA, 2022; pp. 456–461. [Google Scholar]
Ghanim, J.; Awad, M. An unsupervised anomaly detection in electricity consumption using reinforcement learning and time series forest-based framework. J. Artif. Intell. Soft Comput. Res. 2025, 15, 5–24. [Google Scholar] [CrossRef]
Mu, T.; Yu, Y.; Feng, G.; Luo, H.; Yang, H. Detecting anomalous electricity consumption with transformer and synthesized anomalies. PeerJ Comput. Sci. 2023, 9, e1721. [Google Scholar] [CrossRef]
Maryasin, O.Y.; Tihomirov, L. Using generative neural networks to detect point anomalies in energy consumption data. In Proceedings of the 2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia, 13–15 November 2024; IEEE: New York, NY, USA, 2024; pp. 636–641. [Google Scholar]
Hela, B.; Handigol, P.P.; Arjunan, P. Are time series foundation models good for energy anomaly detection? In Proceedings of the 16th ACM International Conference on Future and Sustainable Energy Systems, Singapore, 10–13 June 2025; ACM: New York, NY, USA, 2025; pp. 656–665. [Google Scholar]
Ting, L.P.Y.; Chao, R.; Chang, C.S.; Chuang, K.T. An explore–exploit workload-bounded strategy for rare event detection in massive energy sensor time series. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–25. [Google Scholar] [CrossRef]
Maryasin, O.Y.E.; Tihomirov, L.I. Detecting point anomalies in energy consumption data using unsupervised machine learning methods. Large-Scale Syst. Control 2025, 113, 232–272. [Google Scholar]
Shcherbakova, A.; Philipp, P.; Altherr, L.C. Predicting building energy consumption from electricity meters using Prophet: A computational study with nested cross-validation on benchmark time series data. In Proceedings of the 2024 8th International Symposium on Innovative Approaches in Smart Technologies (ISAS), Barcelona, Spain, 5–6 December 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Yi, S.; Zheng, S.; Yang, S.; Zhou, G.; He, J. Robust Transformer-Based Anomaly Detection for Nuclear Power Data Using Maximum Correntropy Criterion. Nucl. Eng. Technol. 2024, 56, 1284–1295. [Google Scholar] [CrossRef]
Gulati, M.; Arjunan, P. LEAD1.0: A large-scale annotated dataset for energy anomaly detection in commercial buildings. In Proceedings of the 13th ACM International Conference on Future Energy Systems, Virtual Event, 28 June–1 July 2022; ACM: New York, NY, USA, 2022; pp. 485–488. [Google Scholar]
Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement learning algorithms: A brief survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
Hu, K.; Li, M.; Song, Z.; Xu, K.; Xia, Q.; Sun, N.; Xia, M. A review of research on reinforcement learning algorithms for multi-agents. Neurocomputing 2024, 599, 128068. [Google Scholar] [CrossRef]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
Liu, X.; Huang, H.; Bian, J.; Zhou, R.; Wei, Z.; Zhou, H. Generating intersection pre-crash trajectories for autonomous driving safety testing using Transformer time-series generative adversarial networks. Eng. Appl. Artif. Intell. 2025, 160, 111995. [Google Scholar] [CrossRef]
Khishe, M.; Mosavi, M.R. Chimp optimization algorithm. Expert Syst. Appl. 2020, 149, 113338. [Google Scholar] [CrossRef]
Nasayreh, A.; Alawad, N.A.; Jaradat, A. Enhanced chimp optimization algorithm using crossover and mutation techniques with machine learning for IoT intrusion detection system. Clust. Comput. 2025, 28, 455. [Google Scholar] [CrossRef]
He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
Salmani Pour Avval, S.; Eskue, N.D.; Groves, R.M.; Yaghoubi, V. Systematic Review on Neural Architecture Search. Artif. Intell. Rev. 2025, 58, 73. [Google Scholar] [CrossRef]

Figure 1. Overall workflow of the proposed methodology in multi-scale energy systems.

Figure 2. The interaction loop between the RL agent and its environment.

Figure 3. The architecture of standard GAN.

Figure 4. Position update in the ChOA algorithm.

Figure 5. Position updating of chimps and the effect of

| a |

on convergence and divergence.

Figure 5. Position updating of chimps and the effect of

| a |

on convergence and divergence.

Figure 6. The proposed ODRL-GAN model.

Figure 7. Bar-chart comparison of accuracy, recall, and AUC on: (a) PSML; (b) LEAD1.0 datasets.

Figure 8. ROC curves of all models: (a) PSML; (b) LEAD1.0 datasets.

Figure 9. Training convergence curves of all models across 300 epochs: (a) PSML; (b) LEAD1.0.

Table 1. Hyper-parameter settings of the proposed models.

Model	Parameter	Value
ODRL-GAN	Learning rate	0.005
	Discount factor (γ)	0.92
	ε-greedy	0.46
	Batch size	64
	Momentum term	0.05
	Convergence threshold	0.072
	Activation	Tanh and sigmoid
	Optimizer	NMOChOA
	a	[−1, 1]
	f	Linearly from 2 to 0
	Population size	80
	Iteration	300
DBN	Learning rate	0.002
	Batch size	64
	Dropout rate	0.2
	Number of hidden layers	3
	Number of neurons in hidden layers	32
	Activation	Tanh and sigmoid
	Optimizer	SGD
Transformer	Learning rate	0.003
	Batch size	64
	Feed forward hidden size	2048
	Weight decay	0.02
	Dropout rate	0.2
	Number of attention heads	6
	Number of encoder layers	4
	Activation function	GELU
	Optimizer	SGD
GRU	Learning rate	0.005
	Sequence length	6
	Hidden units per layer	64
	Number of GRU layers	6
	Dropout rate	0.2
	Optimizer	Adam
CNN	Number of convolution layers	8
	Kernel size	5 × 5
	Pooling type	Max pooling (2 × 2)
	Number of hidden layers	4
	Number of neurons	64
	Activation	Tanh
	Optimizer	Adam
SVM	Kernel type	Linear and RBF
	Gamma	0.002
	Number of estimators	300

Table 2. Performance comparison of the proposed ODRL–GAN and baseline models.

Model	Dataset
	PSML			LEAD1.0
	Accuracy	Recall	AUC	Accuracy	Recall	AUC
ODRL-GAN	99.79	99.89	99.95	99.51	99.83	99.91
Transformer	91.23	92.18	93.05	90.08	91.24	92.08
DRL	90.38	91.47	92.41	89.31	90.38	91.46
DBN	88.37	89.36	90.37	87.06	88.16	89.08
GAN	87.19	88.24	89.34	86.52	87.60	88.84
GRU	86.91	87.60	88.61	87.16	88.18	89.34
CNN	85.34	86.27	87.30	84.31	85.08	86.93
SVM	81.27	82.09	82.74	80.07	81.46	82.64

Table 3. Ablation study of ODRL–GAN under different component combinations and optimizers.

Model	Dataset
	PSML			LEAD1.0
	Accuracy	Recall	AUC	Accuracy	Recall	AUC
ODRL-GAN (NMOChOA)	99.79	99.89	99.95	99.51	99.83	99.91
MOOA-DRL-GAN	96.08	96.75	97.28	95.08	95.79	96.24
MOChOA-DRL-GAN	95.47	96.24	96.90	94.28	95.09	96.34
MOGWO-DRL-GAN	95.21	96.01	96.84	94.11	95.33	96.18
MOPSO-DRL-GAN	94.19	95.39	96.05	93.19	94.66	95.46
DRL-GAN	93.57	94.63	95.18	92.67	93.51	94.08
NMOChOA-DRL	92.78	93.93	94.86	91.73	92.80	93.91
NMOChOA-GAN	92.34	93.29	94.45	91.12	92.53	93.67
DRL	90.38	91.47	92.41	89.31	90.38	91.46
GAN	87.19	88.24	89.34	86.52	87.60	88.84

Table 4. Characteristics of the 23 benchmark test functions used for validating the NMOChOA.

Number	Name of the Benchmark Function	Dimension	Type	Range
F1	Ackley’s Problem (ACK)	10	Multimodal	[−30, 30]
F2	Aluffi-Pentini’s Problem (AP)	2	Multimodal	[−10, 10]
F3	Becker and Lago Problem (BL)	2	Multimodal	[−10, 10]
F4	Bohachevsky 1 Problem (BF1)	2	Multimodal	[−50, 50]
F5	Bohachevsky 2 Problem (BF2)	2	Multimodal	[−50, 50]
F6	Camel Back-3 Three Hump Problem (CB3)	2	Multimodal	[−5, 5]
F7	Camel Back-6 Six Hump Problem (CB6)	2	Multimodal	[−5, 5]
F8	Cosine Mixture Problem (CM)	2	Unimodal	[−20, 20]
F9	Cosine Mixture Problem (CM)	4	Unimodal	[−20, 20]
F10	Dekkers and Aarts Problem (DA)	2	Multimodal	[−20, 20]
F11	Exponential Problem (EXP)	10	Unimodal	[−1, 1]
F12	Hosaki Problem (HSK)	2	Multimodal	0 ≤ x₁ ≤ 5, 0 ≤ x₂ ≤ 6
F13	Levy and Montalvo 2 Problem (LM2)	5	Multimodal	[−10, 10]
F14	Levy and Montalvo 2 Problem (LM2)	10	Multimodal	[−10, 10]
F15	Miele and Cantrell Problem (MCP)	4	Multimodal	[−1, 1]
F16	Modified Rosenbrock Problem (MRP)	2	Unimodal	[−5, 5]
F17	Multi-Gaussian Problem (MGP)	2	Multimodal	[−2, 2]
F18	Powell’s Quadratic Problem (PWQ)	4	Multimodal	[−10, 10]
F19	Rastrigin Problem (RG)	10	Multimodal	[−5.12, 5.12]
F20	Rosenbrock Problem (RB)	10	Unimodal	[−30, 30]
F21	Salomon Problem (SAL)	5	Multimodal	[−100, 100]
F22	Schaffer 1 Problem (SF1)	2	Multimodal	[−100, 100]
F23	Schwefel Problem (SWF)	10	Unimodal	[−500, 500]

Table 5. Benchmark results of NMOChOA and competing meta-heuristics on 23 test functions.

Function	F (x *)	Metric	Meta-Heuristics
Function	F (x *)	Metric	MOPSO	MOGWO	MOChOA	MOOA	NMOChOA
F1	0	Best	1.17 × 10⁻⁵	2.31 × 10⁻⁶	1.46 × 10⁻⁷	6.04 × 10⁻⁸	0
F1	0	SD	3.12 × 10⁻³	5.02 × 10⁻⁴	3.24 × 10⁻⁴	2.83 × 10⁻⁶	0
F2	=−0.35239	Best	−0.35239	−0.35239	−0.35239	−0.35238	−0.35238
F2	=−0.35239	SD	3.35 × 10⁻⁵	3.83 × 10⁻⁷	4.10 × 10⁻⁹	2.21 × 10⁻¹¹	0
F3	0	Best	3.14 × 10⁻⁸	2.05 × 10⁻⁷	4.35 × 10⁻⁹	0	0
F3	0	SD	7.28 × 10⁻⁵	3.43 × 10⁻³	5.12 × 10⁻⁷	0	0
F4	0	Best	4.34 × 10⁻⁵	4.55 × 10⁻⁶	1.10 × 10⁻⁹	8.32 × 10⁻¹²	0
F4	0	SD	9.03 × 10⁻²	6.16 × 10⁻⁴	8.36 × 10⁻⁵	5.07 × 10⁻⁹	0
F5	0	Best	2.12 × 10⁻⁸	6.35 × 10⁻⁵	0	6.19 × 10⁻¹⁰	0
F5	0	SD	5.33 × 10⁻⁶	7.11 × 10⁻³	0	9.82 × 10⁻⁶	0
F6	0	Best	5.20 × 10⁻⁶	0	7.10 × 10⁻⁸	3.14 × 10⁻⁹	0
F6	0	SD	3.14 × 10⁻³	6.18 × 10⁻⁸	2.25 × 10⁻⁵	7.14 × 10⁻⁶	0
F7	=−1.03163	Best	−1.03162	−1.03162	−1.03162	−1.03162	−1.03163
F7	=−1.03163	SD	9.06 × 10⁻²	4.74 × 10⁻⁶	1.32 × 10⁻⁸	5.15 × 10⁻⁹	5.21 × 10⁻¹¹
F8	0.2	Best	0.19997	0.19998	0.2	0.19999	0.2
F8	0.2	SD	4.55 × 10⁻⁴	3.06 × 10⁻⁶	5.47 × 10⁻¹¹	3.32 × 10⁻⁹	0
F9	0.4	Best	0.39991	0.39995	0.39998	0.4	0.4
F9	0.4	SD	5.63 × 10⁻²	2.06 × 10⁻⁴	6.99 × 10⁻⁵	0	3.15 × 10⁻¹³
F10	−24777	Best	−24776.520	−24776.510	−24776.510	−24776.510	−24776.510
F10	−24777	SD	7.09 × 10⁻¹³	6.19 × 10⁻⁵	9.21 × 10⁻⁶	6.17 × 10⁻⁹	3.14 × 10⁻¹¹
F11	1	Best	0.99992	0.99995	0.99996	0.99987	1
F11	1	SD	5.08 × 10⁻⁶	6.04 × 10⁻⁸	1.88 × 10⁻¹⁰	3.51 × 10⁻⁴	0
F12	−2.34580	Best	−2.34581	−2.34581	−2.34581	−2.34581	−2.34580
F12	−2.34580	SD	4.09 × 10⁻⁶	6.21 × 10⁻⁴	2.54 × 10⁻²	3.42 × 10⁻⁸	5.35 × 10⁻¹²
F13	0	Best	3.25 × 10⁻¹⁰	0	0	2.36 × 10⁻¹³	0
F13	0	SD	2.74 × 10⁻⁹	1.25 × 10⁻¹⁴	0	9.08 × 10⁻¹¹	0
F14	0	Best	7.08 × 10⁻⁹	2.85 × 10⁻⁴	9.65 × 10⁻⁷	2.36 × 10⁻⁹	0
F14	0	SD	4.36 × 10⁻⁵	4.31 × 10⁻²	7.46 × 10⁻⁵	3.24 × 10⁻⁶	0
F15	0	Best	0	5.90 × 10⁻¹⁰	7.14 × 10⁻⁹	7.23 × 10⁻¹¹	4.03 × 10⁻¹³
F15	0	SD	0	1.20 × 10⁻⁵	6.28 × 10⁻⁶	4.32 × 10⁻⁹	5.92 × 10⁻¹¹
F16	0	Best	1.05 × 10⁻⁸	0	6.28 × 10⁻¹²	0	0
F16	0	SD	4.96 × 10⁻⁵	0	7.79 × 10⁻⁹	0	0
F17	1.29695	Best	1.29693	1.29589	1.29695	1.29690	1.29695
F17	1.29695	SD	9.07 × 10⁻⁷	3.67 × 10⁻²	5.28 × 10⁻⁹	4.29 × 10⁻⁴	2.47 × 10⁻¹³
F18	0	Best	8.05 × 10⁻⁶	7.18 × 10⁻¹³	7.32 × 10⁻⁸	0	0
F18	0	SD	1.28 × 10⁻³	5.36 × 10⁻⁹	5.21 × 10⁻⁶	0	0
F19	0	Best	7.19 × 10⁻²	9.25 × 10⁻⁸	6.17 × 10⁻⁵	1.25 × 10⁻⁶	0
F19	0	SD	5.64 × 10⁻¹	6.09 × 10⁻⁶	8.07 × 10⁻³	2.05 × 10⁻⁵	0
F20	0	Best	4.39 × 10⁻²	9.11 × 10⁻³	0	5.05 × 10⁻⁷	1.28 × 10⁻¹³
F20	0	SD	1.28 × 10⁻¹	4.89 × 10⁻²	0	4.66 × 10⁻⁵	4.36 × 10⁻¹¹
F21	0	Best	4.08 × 10⁻¹¹	0	5.20 × 10⁻¹³	0	0
F21	0	SD	4.32 × 10⁻⁹	0	6.32 × 10⁻¹⁰	0	0
F22	0	Best	6.35 × 10⁻⁷	2.38 × 10⁻⁹	8.05 × 10⁻¹³	6.05 × 10⁻¹²	0
F22	0	SD	4.89 × 10⁻⁵	1.99 × 10⁻⁶	4.96 × 10⁻¹¹	2.36 × 10⁻¹⁰	0
F23	−4189.829	Best	−4169.148	−4173.963	−4182.412	−4183.243	−4189.578
F23	−4189.829	SD	6.32 × 10⁻²	4.28 × 10⁻³	7.05 × 10⁻⁴	6.32 × 10⁻⁶	3.05 × 10⁻⁸

The symbol “*” denotes the global optimal value of the corresponding test function.

Table 6. Power-engineering interpretation of representative anomalies detected by ODRL–GAN.

Dataset	Detected Anomaly Pattern	Affected Variables	Power-Engineering Interpretation	Practical Implication
PSML	Short-term spike	Active power flow, voltage magnitude	Switching event or incipient electrical fault	Risk of local instability and fault propagation
PSML	Cross-domain deviation	Electrical–thermal coupled measurements	Abnormal interaction between interconnected subsystems	Potential cascading failures across energy domains
LEAD1.0	Gradual drift	Renewable generation, hydrogen flow rate	Progressive equipment degradation or sensor aging	Reduced efficiency and increased maintenance demand
LEAD1.0	Oscillatory behavior	Multi-carrier energy signals	Unstable operating regime under multi-energy coupling	Long-term grid stability and reliability concerns

Table 7. Runtime comparison on PSML dataset across multiple RMSE-based stopping thresholds.

Proposed Methods	Run Time (s)
Proposed Methods	RMSE < 12	RMSE < 9	RMSE < 6	RMSE < 3
ODRL-GAN	36	68	126	263
Transformer	128	351	749	-
DRL	109	389	826	-
DBN	153	429	-	-
GAN	171	486	-	-
GRU	193	593	-	-
CNN	188	608	-	-
SVM	243	-	-	-

Table 8. Runtime comparison on LEAD1.0 under different RMSE-based stopping conditions.

Proposed Methods	Run Time (s)
Proposed Methods	RMSE < 12	RMSE < 9	RMSE < 6	RMSE < 3
ODRL-GAN	45	82	148	274
Transformer	152	371	806	-
DRL	145	406	953	-
DBN	176	461	-	-
GAN	193	514	-	-
GRU	215	683	-	-
CNN	205	-	-	-
SVM	289	-	-	-

Table 9. Comparative inference latency (ms) of all evaluated models.

Model	Inference Latency (ms)
Model	PSML	LEAD1.0
ODRL-GAN	7.2	7.6
Transformer	7.3	7.9
DRL	6.9	7.2
DBN	6.3	6.5
GAN	7.1	7.4
GRU	8.1	8.2
CNN	6.8	7.1
SVM	5.6	6.1

Table 10. Variance values across 35 independent runs for all evaluated models.

Model	Variance
Model	PSML	LEAD1.0
ODRL-GAN	0.00009	0.00013
Transformer	1.42365	1.96523
DRL	2.01756	2.74150
DBN	3.18605	3.89652
GAN	4.02543	4.81452
GRU	5.76325	5.92145
CNN	7.73265	8.32058
SVM	11.05563	13.08521

Table 11. Statistical t-test comparison between ODRL-GAN with others at a 0.01 significance level.

Model	Statistical t-Tests
	PSML		LEAD1.0
	p-Value	Results	p-Value	Results
ODRL-GAN vs. Transformer	0.0007	Significant	0.0006	Significant
ODRL-GAN vs. DRL	0.0005	Significant	0.0004	Significant
ODRL-GAN vs. DBN	0.0003	Significant	0.0002	Significant
ODRL-GAN vs. GAN	0.0002	Significant	0.0001	Significant
ODRL-GAN vs. GRU	0.00008	Significant	0.00006	Significant
ODRL-GAN vs. CNN	0.00005	Significant	0.00003	Significant
ODRL-GAN vs. SVM	0.000003	Significant	0.000002	Significant

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ershadi Oskouei, A.; Dashliboroun, M.K.; Moghaddam, P.S.; Serrano, N.; Hernando-Gallego, F.; Martín, D.; Álvarez-Bravo, J.V. An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0. Energies 2026, 19, 198. https://doi.org/10.3390/en19010198

AMA Style

Ershadi Oskouei A, Dashliboroun MK, Moghaddam PS, Serrano N, Hernando-Gallego F, Martín D, Álvarez-Bravo JV. An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0. Energies. 2026; 19(1):198. https://doi.org/10.3390/en19010198

Chicago/Turabian Style

Ershadi Oskouei, Anita, Maral Keramat Dashliboroun, Pardis Sadatian Moghaddam, Nuria Serrano, Francisco Hernando-Gallego, Diego Martín, and José Vicente Álvarez-Bravo. 2026. "An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0" Energies 19, no. 1: 198. https://doi.org/10.3390/en19010198

APA Style

Ershadi Oskouei, A., Dashliboroun, M. K., Moghaddam, P. S., Serrano, N., Hernando-Gallego, F., Martín, D., & Álvarez-Bravo, J. V. (2026). An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0. Energies, 19(1), 198. https://doi.org/10.3390/en19010198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Optimized DRL-GAN Approach for Robust Anomaly Detection in Multi-Scale Energy Systems: Insights from PSML and LEAD1.0

Abstract

1. Introduction

1.1. Related Works

1.2. Research Gaps and Motivation

1.3. Paper Contribution and Organization

2. Materials and Proposed Methods

2.1. Dataset

2.2. RL

2.3. GAN

2.4. NMOChOA

2.5. Proposed ODRL-GAN

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI