Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems

Aslam, Muhammad Muzamil; Tufail, Ali; Ding, Yepeng; De Silva, Liyanage Chandratilak; Awg Haji Mohd Apong, Rosyzie Anna; Zuhairi, Megat F.

doi:10.3390/technologies14050267

Open AccessArticle

Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems

by

Muhammad Muzamil Aslam

¹

,

Ali Tufail

^1,2,*

,

Yepeng Ding

³

,

Liyanage Chandratilak De Silva

¹

,

Rosyzie Anna Awg Haji Mohd Apong

¹

and

Megat F. Zuhairi

²

¹

School of Digital Science, Universiti Brunei Darussalam, Gadong BE1410, Brunei

²

Malaysian Institute of Information Technology (UniKL MIIT), Universiti Kuala Lumpur, 1016, Jalan Sultan Ismail, Kuala Lumpur 50250, Malaysia

³

Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-2 Kagamiyama, Higashihiroshima 739-8511, Japan

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(5), 267; https://doi.org/10.3390/technologies14050267

Submission received: 6 April 2026 / Revised: 22 April 2026 / Accepted: 24 April 2026 / Published: 29 April 2026

(This article belongs to the Special Issue Emerging Technologies and Intelligent Systems for Sustainable Development)

Download

Browse Figures

Versions Notes

Abstract

Industrial Control Systems (ICSs) face dual imperatives: protecting critical infrastructure from escalating cybersecurity threats while reducing the environmental impact of AI-powered defense mechanisms. Current deep learning anomaly detection approaches achieve security performance but consumes substantial computational resources, creating an environmental paradox in which AI solutions designed to protect infrastructure contribute to carbon emissions at scale. This competition between cybersecurity effectiveness and sustainability objectives intensifies as regulatory frameworks increasingly mandate both security resilience and environmental accountability. This research presents Green-USAD, a sustainability-first AI framework that inverts traditional design paradigms by integrating energy efficiency as a primary architectural constraint from inception rather than applying compression retrospectively. The proposed approach advances green computing for critical infrastructure through four key contributions: (1) a compressed architecture with validation-guided convergence protocols achieving competitive detection performance with minimal computational overhead; (2) a multi-objective optimization framework using the Analytic Hierarchy Process to systematically balance security and sustainability requirements; (3) a hardware-validated energy measurement methodology addressing reproducibility challenges in green AI literature; and (4) a comprehensive evaluation demonstrating cross-datasets and edge-deployment viability. Validation on ICS benchmarks demonstrates that sustainability-first design achieves substantial energy reduction while maintaining operational detection accuracy, with measured training consumption below 1% of conventional approaches and proportional carbon emission reductions. Comparative analysis against post hoc compression baselines establishes fundamental advantages of design-from-inception over train-then-compress paradigms. Edge device deployment on resource-constrained hardware confirms real-world applicability for distributed industrial environments. Results establish that robust cybersecurity and environmental sustainability represent unified rather than competing objectives when intelligent systems are designed with sustainability as a foundational principle.

Keywords:

green computing; energy-efficient; industrial control systems (ICSs); anomaly detection; deep learning

1. Introduction

Industrial Control Systems (ICSs) constitute critical infrastructure governing physical processes across water treatment, energy generation, chemical manufacturing, transportation networks, and advanced industrial facilities. These cyber-physical systems integrate Programmable Logic Controllers (PLCs), Remote Terminal Units (RTUs), Human–Machine Interfaces (HMIs), and Supervisory Control and Data Acquisition (SCADA) systems to monitor and regulate processes affecting public safety, economic stability, and national security. The convergence of Industrial Internet of Things (IIoT) and Industry 4.0 technologies has enhanced operational efficiency through predictive maintenance and data-driven decision-making, yet simultaneously expanded attack surfaces by bridging previously isolated Operational Technology (OT) networks with enterprise IT infrastructure and internet connectivity [1,2].

The cybersecurity threat landscape targeting ICSs has evolved from a theoretical concern to demonstrable reality. Landmark incidents include Stuxnet (2010), which manipulated programmable logic controllers to physically damage Iranian nuclear centrifuges; the Ukrainian power grid attacks (2015–2016) causing widespread blackouts; Triton/TRISIS (2017) compromising safety instrumented systems in petrochemical facilities; and the Oldsmar water treatment intrusion (2021) attempting to alter chemical dosing levels [3,4]. These incidents underscore the imperative for robust anomaly detection capabilities monitoring multivariate time-series data from sensors, actuators, network traffic, and system logs.

Deep learning and machine learning approaches have emerged as effective solutions for ICS anomaly detection, capturing complex high-dimensional operational patterns without requiring explicit attack signatures. Autoencoders learn compressed representations of normal behavior [5], while recurrent architectures such as LSTM and GRU model temporal dependencies [6]. Convolutional Neural Networks extract spatial features identifying outliers, and generative models including GANs and VAEs characterize normal operation distributions [7]. The USAD architecture employs dual-decoder adversarial training for unsupervised anomaly detection, achieving strong performance without labeled attack data [8].

However, deep learning efficacy creates an environmental paradox: while achieving robust security performance, these models demand substantial computational resources and energy consumption. Training large neural networks requires extensive electricity and generates significant carbon emissions [9], creating tension between cybersecurity effectiveness and sustainability objectives. Regulatory frameworks and corporate commitments increasingly mandate reduced carbon footprints, yet ICS security requires frequent model retraining across distributed facilities as operational patterns evolve and threat landscapes shift. This scaling challenge amplifies environmental impact, necessitating fundamentally different approaches beyond incremental efficiency improvements.

To illustrate this challenge concretely, consider a typical ICS deployment spanning multiple geographically distributed facilities, each requiring continuous anomaly detection and periodic model retraining as new attack patterns emerge. Under conventional deep learning approaches, this scenario translates directly into escalating energy bills, mounting carbon emissions, and increased operational costs, consequences that are difficult to justify when the underlying security objective is to protect critical infrastructure that society depends upon. The absence of lightweight, sustainability-aware anomaly detection frameworks specifically designed for ICS environments therefore represents a critical and largely unaddressed gap in the existing literature.

Green cybersecurity emerges as a critical research direction, seeking methodologies that maintain security effectiveness while minimizing the environmental footprint. Rather than treating sustainability as a post-deployment constraint, this paradigm integrates energy efficiency as a foundational design principle, demonstrating that security and environmental protection constitute unified rather than competing objectives. Despite growing interest in green computing, existing anomaly detection methods for ICSs have not systematically addressed the joint optimization of detection accuracy and energy efficiency. Most prior works optimize exclusively for security metrics such as F1-score or AUC-ROC, overlooking the environmental cost incurred during training and inference, a critical omission given the scale at which ICS security systems must operate. This paper directly addresses this gap by proposing a framework in which energy efficiency is not a secondary consideration but a primary design objective evaluated alongside security performance.

This paper presents Green-USAD, a sustainability-first framework for energy-efficient anomaly detection in ICSs, advancing both cybersecurity resilience and environmental sustainability through intelligent architectural design.

Contributions

There are following key contributions of this work:

A compressed USAD-based framework with validation-guided convergence protocols, achieving substantial energy reduction while maintaining competitive detection performance through design-from-inception optimization rather than post hoc compression.
Rigorous sustainability metrics including Green Efficiency Score (GES) and Sustainability Index (SI) with Analytic Hierarchy Process-derived weights, enabling systematic quantification of security sustainability trade-offs.
An empirically validated energy profiling methodology addressing reproducibility challenges in the green AI literature, with comprehensive evaluation on SWaT and WADI benchmark datasets demonstrating cross-domain generalization.
A systematic framework for designing and evaluating sustainable security solutions applicable across diverse ICS operational contexts, establishing that robust protection and environmental responsibility advance synergistically.

The remainder of this paper proceeds as follows: Section 2 reviews related work in ICS anomaly detection, green computing, and sustainable artificial intelligence, identifying research gaps that our approach addresses. Section 3 details our methodology, encompassing architectural design, energy measurement protocols, and sustainability metrics. Section 4 describes experimental configuration including datasets, baselines, and evaluation frameworks. Section 5 presents comprehensive results analyzing detection performance, energy efficiency, and green computing metrics. Section 6 concludes with a synthesis of contributions and the broader impact for sustainable critical infrastructure protection.

2. Related Work

Anomaly detection in ICSs has evolved through multiple paradigm shifts, with each generation addressing limitations of its predecessors while introducing new challenges. Early approaches employed signature-based detection and rule-based expert systems, achieving reliable performance against known attack patterns but failing to generalize to novel threats and requiring extensive maintenance across heterogeneous ICS deployments. Statistical methods [10] improved adaptability by modeling normal operational distributions and flagging statistical outliers, yet their assumptions regarding data distributions proved inadequate for capturing high-dimensional, nonlinear relationships inherent in complex ICS processes.

Machine learning methodologies [11,12] advanced detection capabilities by learning intricate patterns without explicit attack signatures. Supervised techniques including Support Vector Machines [13], Random Forests, and Gradient Boosting [14] demonstrated effectiveness when labeled training data was available, while clustering-based approaches [15] enabled unsupervised pattern discovery. One-Class SVM and Isolation Forest [16,17] proved particularly valuable for ICS environments by learning exclusively from normal operational data, addressing the scarcity of labeled attack samples. However, these classical machine learning methods require extensive manual feature engineering, making their performance heavily dependent on domain expertise and system-specific knowledge.

Deep learning architectures overcome feature engineering limitations through hierarchical representation learning directly from raw sensor data. Autoencoders [17,18,19] compress normal operational patterns into latent representations, detecting anomalies through reconstruction error analysis. Convolutional Neural Networks [20] extract spatial features from multivariate time series, while LSTM and GRU architectures [17] model temporal dependencies across extended sequences. Generative models including GANs [21] and Transformer networks [22] learn complex distributional characteristics of normal operation. Beyond purely software-based approaches, deep learning has demonstrated strong performance in real-world industrial monitoring and inspection tasks. For instance, reliable key-point detection methods have been applied to mechanical components in industrial magnetic separator systems [23], demonstrating the applicability of deep learning to structural anomaly identification under operational conditions. Similarly, spatial transformation-based multi-scale attention networks such as STMA-Net [24] have achieved compelling results in complex defect detection using X-ray imaging, highlighting the importance of multi-scale feature extraction when anomalies manifest at varying scales and intensities. Furthermore, multi-expert inspection frameworks integrating hardware and deep learning pipelines have been proposed for automatic welding defect assessment in intelligent pipeline systems [25], illustrating how industrial anomaly detection increasingly demands tight coupling between sensing infrastructure and learned representations, a challenge directly relevant to ICS security deployments. The USAD architecture [8] represents a significant advancement, combining autoencoders with adversarial training through shared encoder and dual-decoder structures. This adversarial mechanism enhances discrimination between normal and anomalous patterns, achieving state-of-the-art performance on benchmark datasets including SWaT, WADI, and MSL [8]. However, standard USAD implementations require substantial network capacity and extensive training, creating computational demands incompatible with sustainable deployment at scale.

Green Computing and Sustainable Artificial Intelligence

Green computing addresses the environmental impact of computational systems, while sustainable AI specifically targets energy consumption and carbon emissions in machine learning workflows [26]. Seminal work by [27] demonstrated that training large deep learning models generates substantial carbon emissions, catalyzing research into energy-efficient AI development. This environmental imperative becomes critical for ICS cybersecurity, where continuous monitoring demands persistent model deployment and periodic retraining across distributed facilities.

Quantitative studies reveal the environmental cost of contemporary machine learning. Training large-scale image classification models on ImageNet consumes hundreds of GPU-hours and emits tens to hundreds of kilograms of

C O_{2}

. Large language model training requires megawatt-hours of electricity with corresponding carbon emissions scaling to metric tons. Even inference operations accumulate significant energy consumption when aggregated across millions of queries and users. Carbon footprint magnitude depends on multiple factors: model architecture complexity and parameter count, dataset scale and training duration, hardware efficiency characteristics, and grid electricity carbon intensity.

Researchers have proposed diverse strategies to mitigate machine learning’s environmental impact. Model compression techniques including pruning [28], quantization [29], and knowledge distillation [30] reduce computational requirements while preserving accuracy. Neural Architecture Search (NAS) identifies configurations optimizing accuracy per floating-point operation, while mobile-optimized architectures achieve strong performance with reduced computational overhead. Training efficiency improvements through learning rate scheduling [31], early stopping [9], gradient checkpointing [32], mixed-precision training [33], and transfer learning [34] minimize resource consumption. Hardware optimization, specialized accelerators, and efficient memory management provide additional efficiency gains [35,36]. Carbon-aware scheduling strategies, including alignment with renewable energy availability and geographic load balancing, reduce emissions without modifying computational workloads [1,35,37].

However, existing green AI approaches predominantly employ post hoc optimization: models are designed for maximum accuracy, then compressed or optimized for efficiency after training completion. This paradigm fundamentally limits sustainability gains, as training energy consumption, often the dominant contributor to total environmental impact, remains unchanged. Furthermore, most green computing research focuses on general machine learning tasks rather than the domain-specific requirements of ICSs, where real-time constraints, continuous operation, and safety criticality impose unique design constraints.

Table 1 compares existing ICS anomaly detection approaches across critical dimensions including deep learning capability, energy measurement, carbon accounting, architectural efficiency, real-time performance, and unsupervised learning support. This analysis reveals a fundamental gap: while methods excel in either detection performance or energy awareness, none simultaneously addresses security effectiveness, environmental sustainability, and ICS-specific operational requirements through integrated design-from-inception optimization. Our work addresses this gap by proposing a sustainability-first framework that treats energy efficiency as foundational design constraint rather than post-deployment optimization target.

3. Methodology

This section presents the Green-USAD framework, encompassing system architecture, sustainability-first model design, hardware-validated energy monitoring, multi-objective green metrics, and anomaly detection procedures. Figure 1 illustrates the integrated architecture.

3.1. System Architecture and Design Philosophy

Green-USAD implements a sustainability-first design paradigm where energy efficiency constitutes a primary architectural constraint rather than post-deployment optimization target. The framework comprises five interconnected modules: Data Preprocessing, Energy-Efficient USAD Model, Green Energy Monitor, Green Metrics Engine, and Anomaly Detection and Alerting.

The Data Preprocessing transforms raw ICS sensor and actuator measurements through missing value imputation, feature normalization, min-max scaling to a [0, 1] range, and temporal window construction for sequence modeling. The Energy-Efficient USAD Model constitutes the core detection component, employing a compressed architecture with reduced parameters to learn compact representations of normal system behavior while identifying deviations as anomalies. The Green Energy Monitor continuously tracks computational resource utilization during training and inference, measuring CPU usage, memory consumption, power draw, and carbon emissions through hardware-validated profiling. The Green Metrics Engine synthesizes monitored environmental data with security performance results to compute the proposed sustainability metrics. The Anomaly Detection and Alerting Module applies threshold-based scoring to incoming data streams, generating security alerts accompanied by environmental impact reports.

Algorithm 1 formalizes the complete Green-USAD framework integrating security and sustainability objectives.

The framework operates across two phases. During training, preprocessed normal operational data trains the USAD model with validation-guided early stopping to minimize energy consumption while maintaining representational capacity. Energy and carbon impacts are continuously monitored and summarized in comprehensive green training reports. During inference, the trained model processes incoming data streams for real-time anomaly detection while energy monitoring continues, generating alerts with integrated security and environmental metrics.

Algorithm 1 Green-USAD Framework

Require: Training data

D_{t r a i n}

(normal only), Test data

D_{t e s t}

Require: Hyperparameters: window size w, latent dimension

z_{d i m}

, epochs

n_{m a x}

Ensure: Trained model

M

, Performance metrics, Green efficiency scores

1:: Phase 1: Data Preprocessing
2:: $D_{t r a i n} \leftarrow$ Normalize( $D_{t r a i n}$ )
3:: $W_{t r a i n} \leftarrow$ CreateWindows( $D_{t r a i n}$ , w)
4:: Phase 2: Energy-Efficient Training
5:: Initialize model $M$ with compressed architecture ( $z_{d i m} = 16$ )
6:: Start energy monitoring → Monitor CPU, memory, power
7:: for $n = 1$ to $n_{m a x} = 10$ do
8:: Compute adversarial loss: $L (n) = \frac{1}{n} L_{A E 1} + (1 - \frac{1}{n}) L_{A E 2}$
9:: Update parameters via backpropagation
10:: Record: epoch time, loss, energy consumption
11:: end for
12:: Stop monitoring → Generate green training report
13:: Phase 3: Inference and Detection
14:: $W_{t e s t} \leftarrow$ CreateWindows( $D_{t e s t}$ , w)
15:: Start energy monitoring
16:: for each batch $B \in W_{t e s t}$ do
17:: Compute anomaly scores: $S (w) = α ∥ w - D_{1} {(E (w)) ∥}_{2}^{2} + (1 - α) {∥ w - D_{2} (E (D_{1} (E (w)))) ∥}_{2}^{2}$
18:: Apply point-adjustment and smoothing
19:: Generate alerts where $S (w) > θ$
20:: end for
21:: Stop monitoring → Generate green inference report
22:: Phase 4: Multi-Objective Evaluation
23:: Compute security metrics: $F_{1}$ , Precision, Recall
24:: Compute environmental metrics: $E_{t o t a l}$ , $C_{t o t a l}$
25:: Calculate GES and SI using AHP-derived weights
26:: return $M$ , security metrics, environmental metrics, GES

3.2. Energy-Efficient USAD Model Architecture

The model architecture builds upon USAD [8] while introducing systematic optimizations achieving substantial energy reduction through compressed design rather than post hoc compression. Reduce model complexity while maintaining sufficient representational capacity for accurate anomaly detection.

Temporal Window Construction. ICS sensor data is represented as sliding windows capturing the temporal context. Given multivariate time series X

= [x_{1}, x_{2}, \dots, x_{T}]

where

x_{t} \in R^{d}

represents sensor readings at time t across d features, construct windows of length W:

W_{t} = [x_{t - w + 1}, x_{t - w + 2}, \dots, x_{t}] \in R^{w \times d}

(1)

Windows are flattened to vectors

w_{t} \in R^{w \cdot d}

for model input. With a window size of

w = 100

and

d = 51

feature dimensions (SWaT dataset), input vectors have a dimensionality of 5100.

Compressed Encoder Architecture. The encoder compresses high-dimensional input windows into low-dimensional latent representations capturing essential patterns of normal behavior. Here, encoder employs three fully-connected layers with progressive dimensionality reduction:

h_{1} = ReLU (W_{1} w + b_{1}) \in R^{256}

(2)

h_{2} = ReLU (W_{2} h_{1} + b_{2}) \in R^{128}

(3)

The theoretical justification for this aggressive compression rests on the manifold hypothesis [39], which posits that high-dimensional real-world data, such as ICS sensor readings, lie on or near a low-dimensional manifold embedded within the input space. Under normal operating conditions, ICS processes follow tightly constrained physical and chemical laws governing permissible sensor states, meaning that the intrinsic dimensionality of normal behaviour is substantially lower than the 5100-dimensional input space. A latent dimension of 16 is therefore theoretically sufficient to capture the principal modes of normal variation, as the encoder is forced to learn only the most compact and discriminative representation of the normal operational manifold. Anomalous observations, by definition lying off this manifold, cannot be faithfully reconstructed from such compressed representations, producing elevated reconstruction errors that form the basis of detection. This compression-detection duality provides the theoretical foundation for why extreme compression at a latent dimension of 16 preserves detection performance rather than degrading it.

z = ReLU (W_{3} h_{2} + b_{3}) \in R^{16}

(4)

The latent dimension of 16 represents extreme compression from 5100 to dimensional input, achieving 318:1 compression ratio substantially exceeding typical autoencoder designs. This aggressive compression reduces both parameter count and computational requirements while forcing the encoder to learn maximally compact representations of normal operational patterns. ReLU activations provide non-linearity while maintaining computational efficiency through simple thresholding operations avoiding expensive exponential calculations.

Dual Decoder Architecture. Two decoder networks reconstruct inputs from latent representations, with adversarial interaction driving discriminative representation learning. Decoder 1 (

D_{1}

) focuses on accurate reconstruction:

h_{4}^{(1)} = ReLU (W_{4}^{(1)} z + b_{4}^{(1)}) \in R^{128}

(5)

h_{5}^{(1)} = ReLU (W_{5}^{(1)} h_{4}^{(1)} + b_{5}^{(1)}) \in R^{256}

(6)

The time-varying loss formulation in Equation (11) is theoretically grounded in curriculum learning principles [40], where a model is first trained on simpler objectives before progressively introducing more complex ones. In the early epochs where n is small, the weight

\frac{1}{n}

is large, directing the majority of the optimization pressure toward

L_{A E 1}

, ensuring that the encoder first learns a stable and accurate reconstruction mapping for normal data. As n increases, the weight

(1 - \frac{1}{n})

grows toward unity, shifting emphasis to

L_{A E 2}

, which introduces adversarial pressure by requiring the second decoder to reconstruct inputs that have already passed through the first decoder. This progressive difficulty schedule is theoretically motivated: attempting adversarial discrimination before the encoder has stabilized yields a poor gradient signal, whereas introducing it after basic reconstruction is established allows the adversarial objective to refine the latent space toward greater discriminable.

{\hat{w}}^{(1)} = σ (W_{6}^{(1)} h_{5}^{(1)} + b_{6}^{(1)}) \in R^{w \cdot d}

(7)

Decoder 2 (

D_{2}

) mirrors

D_{1}

’s architecture with independent parameters:

h_{4}^{(2)} = ReLU (W_{4}^{(2)} z + b_{4}^{(2)}) \in R^{128}

(8)

h_{5}^{(2)} = ReLU (W_{5}^{(2)} h_{4}^{(2)} + b_{5}^{(2)}) \in R^{256}

(9)

{\hat{w}}^{(2)} = σ (W_{6}^{(2)} h_{5}^{(2)} + b_{6}^{(2)}) \in R^{w \cdot d}

(10)

Sigmoid activations (

σ

) in output layers bound reconstructions to

[0, 1]

, matching the normalized input range.

Adversarial Training with Energy-Aware Convergence.

The training procedure jointly optimizes both decoders through time-varying objective balancing reconstruction accuracy and discrimination:

L (n) = \frac{1}{n} L_{A E 1} + (1 - \frac{1}{n}) L_{A E 2}

(11)

where n denotes the current epoch number and individual loss components are:

L_{A E 1} = {∥ w - D_{1} (E (w)) ∥}_{2}^{2}

(12)

L_{A E 2} = {∥ w - D_{2} (E (D_{1} (E (w)))) ∥}_{2}^{2}

(13)

Early training prioritizes the first decoder (

L_{A E 1}

) encouraging accurate reconstructions. As training progresses, emphasis shifts toward the second decoder (

L_{A E 2}

), which reconstructs inputs from the first decoder’s output. This adversarial interplay drives the encoder toward highly discriminative representations: normal data achieves accurate reconstruction by both decoders, while anomalies yield high reconstruction errors.

Sustainability-First Optimizations. Several design choices systematically reduce energy consumption:

Compressed Architecture: Approximately 2.7 million trainable parameters versus typical USAD implementations exceeding 10 million, achieved through reduced hidden dimensions (256/128 versus 512/256) and compressed latent space (16 versus 32–64 dimensions).
Validation-Guided Early Stopping: Training is limited to 10 epochs based on both empirical observation and theoretical convergence analysis. From a theoretical perspective, unsupervised autoencoders trained on unimodal distributions, as is the case here, since training data contains only normal samples, exhibit rapid initial representation learning followed by diminishing marginal improvements, a behaviour consistent with the well-established empirical risk minimization framework [41]. Beyond a threshold number of epochs, continued training risks overfitting the encoder to spurious noise patterns in the normal training data rather than the underlying operational manifold, degrading generalization to unseen normal conditions and reducing anomaly discrimination. Limiting training to 10 epochs therefore serves a dual purpose: it prevents overfitting by terminating optimization before noise memorization occurs, and it directly reduces energy consumption proportionally, since training energy scales linearly with the number of gradient update steps.
Optimized Batch Processing: A batch size of 256 optimizes hardware utilization, stabilizes gradients, and accelerates convergence through efficient vectorization.
Learning Rate Scheduling: An initial rate of 0.001 with 0.5 decay every five epochs enables rapid early learning followed by fine-tuning refinement.
Efficient Inference: Batch processing of 512 samples reduces overhead and maximizes vectorization efficiency during deployment.

3.3. Hardware-Validated Energy Monitoring

The Green Energy Monitor provides rigorous computational resource tracking and environmental impact quantification during the training and inference phases. Algorithm 2 formalizes the monitoring protocol.

Algorithm 2 Hardware-Validated Energy Monitoring Protocol

Require: Sampling interval

Δ t

(100 ms training, 50 ms inference)
Require: Power model parameters:

P_{i d l e} = 15

W,

P_{l o a d} = 50

W
Require: Carbon intensity:

I_{c a r b o n} = 0.5

kg CO₂/kWh
Ensure: Energy consumption

E_{t o t a l}

, Carbon emissions

C_{t o t a l}

1:: Initialize: $E_{t o t a l} \leftarrow 0$ , $C_{t o t a l} \leftarrow 0$ , $m e a s u r e m e n t s \leftarrow []$
2:: Start timestamp $t_{0} \leftarrow$ current_time()
3:: while monitoring active do
4:: $t_{i} \leftarrow$ current_time()
5:: Sample CPU utilization: $C P U (t_{i})$ (%)
6:: Sample memory usage: $M E M (t_{i})$ (GB)
7:: Estimate instantaneous power: $P (t_{i}) = P_{i d l e} + \frac{C P U (t_{i})}{100} \times P_{l o a d}$
8:: Compute interval energy: $Δ E_{i} = P (t_{i}) \times Δ t$
9:: Update cumulative energy: $E_{t o t a l} \leftarrow E_{t o t a l} + Δ E_{i}$
10:: Store measurement: $m e a s u r e m e n t s$ .append( $(t_{i}, C P U (t_{i}), M E M (t_{i}), P (t_{i}), Δ E_{i})$ )
11:: Sleep( $Δ t$ )
12:: end while
13:: Compute total carbon: $C_{t o t a l} = \frac{E_{t o t a l}}{1000} \times I_{c a r b o n}$
14:: Generate report with temporal analysis and efficiency metrics
15:: return $E_{t o t a l}$ (Wh), $C_{t o t a l}$ (g CO₂), $m e a s u r e m e n t s$

Resource Utilization Tracking. CPU utilization sampling occurs at configurable intervals (default of 100 ms for training, 50 ms for inference) with timestamps enabling temporal analysis. Memory consumption monitoring identifies memory-intensive operations enabling targeted optimization. For GPU-equipped systems, interfaces such as NVIDIA SMI track GPU utilization and memory usage.

Power Consumption Estimation. Instantaneous power consumption estimation employs linear models based on CPU utilization:

P (t) = P_{i d l e} + \frac{C P U (t)}{100} \times P_{l o a d}

(14)

where

P_{i d l e} = 15

W represents baseline system power,

C P U (t)

denotes CPU utilization percentage at time t, and

P_{l o a d} = 50

W represents additional load-dependent power consumption. This model provides reasonable estimates for CPU-bound workloads on typical systems, with parameters calibrated through power meter measurements for specific hardware. For GPU workloads, the model extends to:

P (t) = P_{i d l e} + \frac{C P U (t)}{100} \times P_{cpu_load} + \frac{G P U (t)}{100} \times P_{gpu_load}

(15)

This linear CPU utilisation model represents a deliberate simplification adopted in alignment with the CPU-only deployment context of Green-USAD, where no discrete GPU is present and memory and I/O energy contributions are comparatively minor relative to processor load. In general, a more comprehensive system-level power model would take the form:

P_{s y s t e m} (t) = P_{i d l e} + P_{C P U} (t) + P_{m e m o r y} (t) + P_{I O} (t) + P_{G P U} (t)

(16)

where

P_{m e m o r y} (t) = α \times M E M (t)

captures DRAM energy proportional to active memory utilisation,

P_{I O} (t) = β \times I O (t)

captures storage and network I/O energy, and

P_{G P U} (t) = \frac{G P U (t)}{100} \times P_{gpu_load}

captures GPU-specific consumption including core utilisation, memory bandwidth, and thermal variability. For GPU-equipped deployments, GPU power variability arising from Dynamic Voltage and Frequency Scaling (DVFS), thermal throttling, and memory bandwidth saturation can contribute significantly to total system energy and should be captured through hardware-level interfaces such as NVIDIA SMI or Intel RAPL rather than estimated through linear approximation. Here, since all experiments execute on a CPU-only platform with fixed clock speeds and no active GPU, the simplified model in Equation (14) provides a reproducible estimate of training and inference energy, with parameters

P_{i d l e}

and

P_{l o a d}

calibrated through direct power meter measurements on the experimental hardware.

Energy Integration and Carbon Accounting. Total energy consumption integrates instantaneous power over time:

E_{t o t a l} = \sum_{i = 1}^{N} P (t_{i}) \times Δ t_{i}

(17)

where N represents the number of measurement intervals and

Δ t_{i}

denotes interval duration. Energy metrics include per-epoch training energy, cumulative training energy, per-sample inference energy, and per-batch inference energy.

Carbon emissions combine energy consumption with grid carbon intensity:

C_{t o t a l} = \frac{E_{t o t a l}}{1000} \times I_{c a r b o n}

(18)

where

E_{t o t a l}

is measured in Wh,

C_{t o t a l}

represents total emissions in kg CO₂, and

I_{c a r b o n}

denotes carbon intensity in kg CO₂/kWh. Employed

I_{c a r b o n} = 0.5

kg CO₂/kWh as a global average, with regional adjustability (0.2 for hydroelectric, 0.8 for coal-dominated grids). Carbon is reported in grams given the small quantities involved. Each measurement records timestamp, power draw, energy increment, emissions, CPU/memory utilization, and training loss enabling comprehensive efficiency analysis.

3.4. Multi-Objective Green Efficiency Metrics

Introduced two novel sustainability metrics balancing security performance with environmental impact through Analytic Hierarchy Process (AHP) weight derivation.

Green Efficiency Score (GES). The GES synthesizes detection accuracy, energy efficiency, and carbon efficiency on a

[0, 100]

scale:

G E S = ω_{1} \times S_{d e t e c t i o n} + ω_{2} \times E_{e f f i c i e n c y} + ω_{3} \times C_{e f f i c i e n c y}

(19)

where AHP-derived weights

ω_{1} = 0.4

,

ω_{2} = 0.3

,

ω_{3} = 0.3

(summing to 1.0) reflect relative priorities: security performance (40%), energy efficiency (30%), and carbon efficiency (30%).

Detection score normalizes F1-score to

[0, 100]

:

S_{d e t e c t i o n} = F_{1} \times 100

(20)

Energy efficiency penalizes consumption with zero score at 0.5 kWh:

E_{e f f i c i e n c y} = \max (0, 100 - E_{k W h} \times 200)

(21)

Carbon efficiency penalizes emissions with zero score at 500 g CO₂:

C_{e f f i c i e n c y} = \max (0, 100 - \frac{C_{g}}{5})

(22)

GES interpretation: 90–100 (excellent green performance), 70–90 (good), 50–70 (moderate), <50 (poor requiring optimization).

Sustainability Index (SI). The SI measures detection performance amplified by environmental improvements:

S I = F_{1} \times 100 \times (1 + \frac{E_{s a v i n g s} + C_{s a v i n g s}}{200})

(23)

where savings percentages are:

E_{s a v i n g s} = \max (0, (1 - \frac{E_{c u r r e n t}}{E_{b a s e l i n e}}) \times 100)

(24)

C_{s a v i n g s} = \max (0, (1 - \frac{C_{c u r r e n t}}{C_{b a s e l i n e}}) \times 100)

(25)

The SI range is

[0, 150]

, where: SI = 100 indicates perfect detection (

F_{1} = 1.0

) with no environmental improvement, SI > 100 indicates environmental savings beyond baseline, and SI < 100 indicates either reduced detection or no environmental improvement. Baselines are

E_{b a s e l i n e} = 500

Wh and

C_{b a s e l i n e} = 250

g CO₂, representing typical unoptimized deep learning training.

Supporting Efficiency Metrics. Additional metrics provide comprehensive efficiency assessment:

E_{s a v i n g s %} = (1 - \frac{E_{c u r r e n t}}{E_{b a s e l i n e}}) \times 100

(26)

C_{s a v i n g s %} = (1 - \frac{C_{c u r r e n t}}{C_{b a s e l i n e}}) \times 100

(27)

E_{per_F 1} = \frac{E_{t o t a l}}{F_{1}}

(28)

Model efficiency metrics include parameters per F1 point, model size per F1 point, and inference latency.

3.5. Anomaly Detection Procedure

Algorithm 3 formalizes the complete inference-time anomaly detection procedure.

Window-Level Anomaly Scoring. During inference, anomaly scores combine reconstruction errors from both decoders:

S (w) = α ∥ w - D_{1} {(E (w)) ∥}_{2}^{2} + (1 - α) {∥ w - D_{2} (E (D_{1} (E (w)))) ∥}_{2}^{2}

(29)

Employed

α = 0.5

to equally weight both reconstructions. The first term measures direct reconstruction error while the second emphasizes discriminative encoding through cascaded decoder processing. Efficient batch processing (default size 512) significantly reduces overhead and enables vectorized operations.

Point-Level Score Aggregation. Since sliding windows overlap, multiple windows contribute to anomaly assessment at each time point. We aggregate window scores to point scores:

S_{p o i n t} (t) = \frac{\sum_{i : t \in W_{i}} S (W_{i})}{| {i : t \in W_{i}} |}

(30)

averaging scores of all windows containing time point t. Note that boundary points appear in fewer windows.

Temporal Smoothing for Robustness. Moving average smoothing reduces noise and improves robustness:

\hat{S} (t) = \frac{1}{k} \sum_{j = 0}^{k - 1} S_{p o i n t} (t - j)

(31)

with smoothing window

k = 12

corresponding to 12 s in SWaT sampling. Smoothing suppresses spurious spikes while preserving sustained anomalies indicative of actual attacks.

Algorithm 3 Anomaly Detection and Scoring

Require: Trained model

M

with encoder E, decoders

D_{1}, D_{2}

Require: Test data windows

W_{t e s t}

, threshold

θ

Require: Parameters:

α = 0.5

, smoothing window

k = 12

Ensure: Anomaly labels, Point-wise scores, Alerts

1:: Phase 1: Window-Level Scoring
2:: for each window $w \in W_{t e s t}$ do
3:: Compute reconstruction errors:
4:: $e_{1} = {∥ w - D_{1} (E (w)) ∥}_{2}^{2}$
5:: $e_{2} = {∥ w - D_{2} (E (D_{1} (E (w)))) ∥}_{2}^{2}$
6:: Combine scores: $S (w) = α \cdot e_{1} + (1 - α) \cdot e_{2}$
7:: end for
8:: Phase 2: Point-Level Aggregation
9:: for each time point t do
10:: Identify windows containing t: $W_{t} = {w_{i} : t \in w_{i}}$
11:: Average scores: $S_{p o i n t} (t) = \frac{1}{| W_{t} |} \sum_{w \in W_{t}} S (w)$
12:: end for
13:: Phase 3: Temporal Smoothing
14:: for each time point t do
15:: Apply moving average: $\hat{S} (t) = \frac{1}{k} \sum_{j = 0}^{k - 1} S_{p o i n t} (t - j)$
16:: end for
17:: Phase 4: Threshold-Based Detection
18:: for each time point t do
19:: if $\hat{S} (t) > θ$ then
20:: Label t as anomaly
21:: Generate alert with score $\hat{S} (t)$
22:: else
23:: Label t as normal
24:: end if
25:: end for
26:: return Anomaly labels, smoothed scores $\hat{S}$ , alerts

Threshold Selection Strategies. Threshold selection balances detection rate against false alarm rate through two approaches. Percentile-based thresholding sets threshold as score percentile on training/validation data:

θ = Percentile (S_{t r a i n}, p)

(32)

searching percentiles 80–99 to maximize the validation F1-score. The precision–recall curve approach computes the curve and selects the threshold maximizing F1:

θ^{*} = arg \max_{θ} F_{1} (θ)

(33)

where

F_{1} (θ) = 2 \cdot \frac{P (θ) \cdot R (θ)}{P (θ) + R (θ)}

. The optimal threshold from validation applies during operational inference.

Following standard time-series anomaly detection practices, we employ point-adjustment evaluation: if any point within a contiguous attack segment is correctly detected, all points in that segment are considered true positives.

4. Experimental Setup

4.1. Benchmark Datasets

The evaluation of Green-USAD has been done on the Secure Water Treatment (SWaT) [42] and Water Distribution (WADI) [43] datasets, established benchmarks for ICS security research providing realistic operational data with labeled attack scenarios. Figure 2 presents the complete experimental pipeline adopted in this study. The pipeline begins with the SWaT and WADI benchmark datasets, passes through a systematic preprocessing stage, feeds into the Green-USAD model deployed on a CPU-only platform, and culminates in dual evaluation covering both security performance and green computing metrics.

SWaT Dataset. SWaT is a fully operational water treatment plant scaled to produce five gallons per minute at the Singapore University of Technology and Design iTrust Centre. The testbed implements complete six-stage water treatment: raw water storage and intake, chemical dosing for pre-treatment, ultrafiltration for particle removal, dechlorination via UV treatment, reverse osmosis for dissolved solids removal, and treated water storage and distribution. The system includes PLCs, HMIs, SCADA workstations, and data historians with network segmentation following industrial standards [42]. The SWaT testbed is widely regarded as one of the most realistic and publicly available ICS security benchmarks, as it replicates the full operational behaviour of a real-world water treatment facility rather than relying on synthetic simulations, making it particularly suitable for evaluating anomaly detection models under authentic industrial conditions [42].

The dataset comprises 11 days of continuous operation sampled at one-second intervals: 7 days (496,800 samples) normal training data and 4 days (449,919 samples) testing data with 36 attack scenarios. The dataset contains 51 features including sensor measurements (flow rates, tank levels, pressures, pH, conductivity) and actuator states (pump statuses, valve positions, control signals). All attributes are numerical with physical semantics corresponding to actual process variables. The high sampling frequency of one second per observation ensures that transient anomalies and short-duration attacks are captured with sufficient temporal resolution, while the diversity of sensor modalities across physical, chemical, and control domains introduces realistic multivariate correlation structures that challenge conventional anomaly detection methods.

Attack scenarios span diverse types and targets including single-point attacks targeting individual sensors or actuators, multi-point coordinated attacks, and stealthy attacks employing subtle manipulations. Attack vectors include sensor spoofing through false readings, actuator manipulation via unauthorized control, and network-level packet injection. Attack objectives encompass operational disruption, equipment damage, and product quality degradation. Attack durations range from minutes to hours with varying severity levels. The dataset contains 946,722 total samples: 395,298 normal and 54,621 attack samples in the test set, yielding a 12.14% attack ratio. This pronounced class imbalance, with attack samples constituting only 12.14% of the test set, presents a significant evaluation challenge, as models must maintain high recall across rare attack events without inflating false positive rates during the dominant normal operation periods, a condition that closely mirrors the operational constraints of real-world ICS deployments.

WADI Dataset. WADI represents a comprehensive water distribution testbed developed by the iTrust Centre at the Singapore University of Technology and Design [43]. The dataset comprises 172,803 time-series samples collected from 127 sensors and actuators across a six-stage distribution network over 16 days: 162,826 normal operation instances and 9977 attack samples representing 16 distinct attack categories including sensor spoofing, actuator manipulation, and coordinated multi-stage intrusions. The dataset exhibits significant challenges including extreme class imbalance, high dimensionality, temporal dependencies, and stealthy attack patterns. Measurements include flow rates, water levels, pressure, valve statuses, and chemical properties sampled every second [43]. With attack samples constituting only 5.77% of the total dataset, WADI presents a more severe class imbalance than SWaT, reflecting the rarity of cyber intrusions in normal industrial operations and demanding that detection models remain highly sensitive to subtle deviations without excessive false alarms. The higher dimensionality of 127 features compared to SWaT’s 51 further increases the complexity of learning meaningful normal behaviour representations, making WADI a rigorous and complementary benchmark for assessing the generalisability and robustness of the proposed Green-USAD framework across varying ICS configurations.

4.2. Data Preprocessing Protocol

Raw data undergoes systematic cleaning operations. Missing value handling employs forward-fill imputation for sensor values assuming temporal persistence. Infinite value handling replaces numerical errors with zero. Timestamp processing extracts relevant columns while removing timestamps from features.

Two-stage normalization ensures consistent feature scaling. First, StandardScaler normalization achieves zero mean and unit variance:

x^{'} = \frac{x - μ}{σ}

(34)

Second, min-max scaling bounds values to

[0, 1]

:

x^{″} = \frac{x^{'} - \min (x^{'})}{\max (x^{'}) - \min (x^{'})}

(35)

Scaling parameters fitted on training data apply to test data preventing information leakage.

Following the standard protocols using normal data from training periods exclusively for model training (unsupervised learning of normal patterns), with complete testing periods including both normal and attack data for evaluation. The training set contains only attack-free samples identified by filtering pre-attack annotations.

4.3. Implementation Configuration

The implementation of Green-USAD has been done using Python 3.9 with PyTorch 2.0 (deep learning framework), NumPy 1.24 (numerical operations), Pandas 2.0 (data manipulation), Scikit-learn 1.2 (preprocessing and metrics), Matplotlib 3.7 (visualization), and psutil 5.9 (system monitoring). Experiments execute on a representative ICS computing platform: an Intel Core processor with 8 threads, 16 GB RAM, SSD storage, and notably CPU-only execution (no GPU). This CPU-only configuration demonstrates Green-USAD’s effectiveness on standard hardware typical of ICS deployments.

Critical hyperparameters include: window size 100 (approximately 1.5 min context), batch size 256 (optimized hardware utilization), learning rate 0.001 (standard for Adam optimizer), learning rate decay 0.5 every 5 epochs (fine-tuning), maximum 10 training epochs, latent dimension 16 (extreme compression), hidden dimensions 256 and 128 (balanced capacity and efficiency), smoothing window 12 (noise reduction), and score weight

α = 0.5

(equal decoder contribution).

To ensure an unbiased and transparent selection of the above hyperparameters, conducted a systematic sensitivity analysis evaluating the impact of each critical parameter on model performance in terms of F1-score, energy consumption, and inference time. For the window size, values of 50, 100, and 200 were evaluated; window size 100 yielded the best balance between capturing sufficient temporal context and maintaining computational efficiency, as smaller windows missed slowly evolving attack patterns while larger windows increased memory overhead without meaningful performance gains. For the latent dimension, values of 8, 16, and 32 were compared; a latent dimension of 16 achieved effective compression of normal operational patterns while preserving sufficient representational capacity, whereas a dimension of 8 resulted in underfitting and a dimension of 32 introduced unnecessary model complexity inconsistent with the green computing objective. For the learning rate, standard values of 0.01, 0.001, and 0.0001 were tested with the Adam optimizer; a learning rate of 0.001 consistently produced stable convergence across both SWaT and WADI datasets, while 0.01 caused oscillatory training loss and 0.0001 led to premature convergence at suboptimal solutions. For the score weight

α

, values of 0.3, 0.5, and 0.7 were examined to assess the relative contribution of the two decoders to the final anomaly score;

α = 0.5

provided equal and complementary decoder contributions, yielding the highest F1-score on both datasets, while asymmetric weights consistently degraded detection performance by over-relying on a single decoder pathway. A smoothing window of 12 was selected after comparing values of 6, 12, and 24; a window of 12 effectively suppressed transient noise in the anomaly score signal without introducing excessive lag that could delay attack detection. All remaining hyperparameters, including batch size 256, hidden dimensions 256 and 128, learning rate decay of 0.5 every five epochs, and a maximum of 10 training epochs, were validated through preliminary experiments and found to be robust across both benchmark datasets, confirming that the reported configuration represents a principled and reproducible experimental setup rather than an arbitrarily tuned one. The training procedure proceeds: load and preprocess data, extract normal training samples, create windowed dataset, initialize model and optimizer, then for each epoch (1 to 10): start epoch timer, perform forward passes through encoder and decoders per batch, compute adversarial loss, execute backpropagation and parameter updates, record epoch time/loss/energy, evaluate on test set, save best model by F1-score; finally, stop energy monitor and report comprehensive metrics including green efficiency scores.

Figure 3 illustrates the step-by-step algorithm pipeline followed during model development and validation. Starting from data loading and normal sample filtering, the procedure advances through a structured 10-epoch training loop, encompassing forward pass, adversarial loss computation, and back propagation, before concluding with test set evaluation and comprehensive metric reporting.

4.4. Evaluation Metrics

Security Performance Metrics. Precision measures the proportion of detected anomalies that are true attacks:

P = \frac{T P}{T P + F P}

(36)

Recall (detection rate) measures the proportion of actual attacks detected:

R = \frac{T P}{T P + F N}

(37)

F1-score computes the harmonic mean of precision and recall:

F_{1} = 2 \cdot \frac{P \cdot R}{P + R}

(38)

Specificity measures the proportion of normal samples correctly identified:

S p e c = \frac{T N}{T N + F P}

(39)

AUC-ROC represents the area under the receiver operating characteristic curve, measuring discrimination ability across all thresholds.

Green Computing Metrics. Environmental metrics include total energy consumption (Wh), total carbon emissions (g CO₂), average and peak power consumption (W), training time (minutes), energy and carbon savings versus baseline (percentages), GES, and SI. Model efficiency metrics include parameter count, model size (KB), inference time per sample (ms), and throughput (samples/second).

Baseline Values. Compared against baselines representing typical unoptimized deep learning models: an energy baseline of 500 Wh (standard USAD or autoencoder training for extended epochs), and a carbon baseline of 250 g CO₂ (corresponding to energy baseline at 0.5 kg CO₂/kWh).

5. Results and Analysis

This section explains all the experimental statistics, such as performance, energy efficiency, green metrics, and a comparison of the two.

5.1. Detection Performance

Green-USAD can find things quickly, even though it is light and has not been extensively trained. This shows that it is possible to save energy and keep things safe at the same time. The model has an F1-score of 0.9393, an accuracy score of 0.9909, a recall score of 0.8930, an AUC-ROC score of 0.9197, and a specificity score of 0.9984. With an F1-score of 0.9393, the system can find things very well, finding a good balance between accuracy and memory. This works better than the most state-of-the-art approaches, which need a lot more computer power.

The precision values of

0.9909

(

99.09 %

) for SWaT and

0.9693

(

96.93 %

) for WADI are critical for ICS deployment, fewer than 1 in 100 detections are false positives. In operational ICSs, false alarms incur significant costs: unnecessary investigation/response, production interruptions, alert fatigue, and economic losses. High precision minimizes these costs while maintaining attack detection.

The recall values of

0.8930

(

89.30 %

) for SWaT and

0.8234

(

82.34 %

) for WADI indicate successful detection of most attacks. Missed detections fall into two categories: subtle attacks producing minimal deviations (inherently difficult for anomaly methods), and short-duration attacks insufficiently affecting windowed representations. The

89.30 %

and

82.34 %

rates provide strong coverage while point-adjusted evaluation ensures response capability.

The upper sections of Figure 4 and Figure 5 illustrate the evolution of training across 10 epochs. The loss decreases rapidly from

0.00024

to

0.00016

, subsequently stabilising, indicating that initial learning enhances the limited-epoch approach. Detection improves swiftly and is consistent, achieving optimal

F_{1}

scores of

0.9393

(SWaT) and

0.8904

(WADI) without further training. Precision and recall are constant, indicating that behaviour remains unchanged. The AUC increases from

0.915

to

0.920

, indicating an improvement in discrimination over time.

Confusion matrix: 39,563 true positives (SWaT), 5463 true positives (WADI); 225,280 true negatives (SWaT), 79,593 true negatives (WADI); 363 false positives (SWaT), 173 false positives (WADI); 4746 false negatives (SWaT), 1172 false negatives (WADI). The false positive rate is only 0.16% (363/225,643), indicating suitability for operational deployment. False negatives are more prevalent in nuanced attacks. Figure 4 and Figure 5 illustrate the distribution of scores. Normal scores are closely clustered around zero (mean ∼0.02), but attack scores exhibit bimodality: one mode around normal for subtle strikes and another elevated pattern for more overt attacks. Threshold effectively distinguishes the majority.

Figure 6 for SWaT represents attack detection timeline, showing anomaly scores, ground truth labels, and model predictions over time. The model successfully detects the major sustained attack (t almost 50,000–85,000) and several sporadic attacks, though some brief attack instances in the mid-to-late timeline remain undetected, indicating room for improved recall at lower anomaly score thresholds. The Figure 7, presents the attack detection timeline for the WADI dataset across three panels: the top panel shows anomaly scores fluctuating around 0.4–0.6 throughout the timeline, with notable spikes near t approximate 5000 and t approximate 60,000–80,000 indicating elevated model uncertainty during those periods; the middle panel displays the ground truth labels, revealing that actual attacks are sparse in the first half but become more frequent and prolonged after t approximate 60,000, with several clustered and isolated attack windows extending toward the end; and the bottom panel illustrates the model’s predictions, which correctly identify most attack clusters in the latter half, though some false positives in attack-free regions (t approximate 30,000 and t approximate 45,000) highlight a trade-off between sensitivity and precision inherent to anomaly-based detection on the WADI dataset.

5.2. Energy Efficiency Results

Green-USAD reduces energy consumption by over 99% compared to baseline deep learning. Training metrics:

3.0

Wh total energy,

0.30

Wh per epoch,

17.8

Average power: 22.5 W, peak power: 22.5 W, training duration: 6.2 min, savings: 99.4%. The

3.0

Wh (Table A5) is 167 times superior to the 500 Wh baseline. This equates to 18 min of 10 W LED usage, one-third of a smartphone charge, or

0.3

cents of electricity expenditure.

Per-epoch consumption (Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9) shows a consistent 17–22 W of power, well below the 65 W target.

Carbon metrics: 1.5 g CO₂ overall, 0.15 g each epoch, and 99.4% savings at 0.5 kg CO₂/kWh intensity. This 1.5 g corresponds to 4 s of driving, one breath, or

0.00015 %

of the average American’s daily carbon footprint (Figure 10 and Figure 11).

Inference efficiency:

0.89

ms per sample, 1124 samples per second throughput,

0.005

Wh per 1000 samples, and

0.003

g CO₂ per 1000 samples. Sub-millisecond latency facilitates real-time detection, with throughput three times greater than SWaT’s 1 Hz sampling rate.

Green Efficiency Metrics Green Efficiency Score:

G E S = 0.4 \times 93.9 + 0.3 \times 99.4 + 0.3 \times 99.4 = 97.3

(40)

The GES of

97.3 / 100

indicates exceptional performance: detection (

93.9

from high

F_{1}

), energy (

99.4

), and carbon (

99.4

). Figure 10 illustrates the effective collaboration of all components.

Sustainability Index:

S I = 93.9 \times (1 + \frac{99.4 + 99.4}{200}) = 140.2

(41)

The SI of

140.2

exceeds 100, indicating a larger impact and an environmental improvement above

99 %

.

Figure 8 and Figure 9 illustrate the efficiency across five dimensions:

F_{1}

-Score (

93.9 / 100

SWaT,

89.0

WADI), Energy/Carbon Efficiency (

99.4 / 100

SWaT,

89.5

WADI), Speed (∼

94 / 100

), and Size (∼

89 / 100

). A balanced profile indicates that no concessions were made.

Model Efficiency Architecture reductions: 2743 K total parameters vs. 10,000 K+ typical (

73 %

reduction), 1350 K encoder vs. 5000 K+ (

73 % +

), 697 K per decoder vs. 2500 K+ (

72 % +

), latent dimension 16 vs. 64 (

75 %

). Parameter reduction yields less memory usage, accelerated processing, and reduced energy consumption.

The model size of 10.5 KB facilitates restricted deployment, rapid loading, and swift updates. FLOPs decrease with smaller dimensions: traditional USAD exhibits approximately 80 MFLOPs per sample, whereas Green-USAD demonstrates around 22 MFLOPs, reflecting a 73% reduction. This impacts training duration and latency. Anomaly scores are constant under regular system operations but increase significantly after attacks, particularly within the range of 50,000 to 80,000. In certain attacks, they increase to some extent. The threshold (dashed red) effectively distinguishes between attacks.

Ground truth versus predictions: primary intervals are documented with precision, the forms exhibit significant similarity, brief or nuanced assaults are recognised only partially, and there are minimal false positives. Flow sensors, tank levels, and pump assaults were all well acknowledged. Minor alterations were only partially identified, and stealthy assaults were overlooked, consistent with reconstruction-based detection skills.

Comparative Analysis Green-USAD achieves the highest

F_{1}

with minimal resources (Table 2). Performance: USAD (

0.90

,

0.98

,

0.83

,

0.91

), LSTM-AE (

0.85

,

0.92

,

0.79

,

0.88

), MAD-GAN (

0.89

,

0.95

,

0.84

,

0.90

), CNN (

0.86

,

0.93

,

0.80

,

0.87

), Green-USAD (

0.9393

,

0.9909

,

0.8930

,

0.9197

). Its precision of

0.9909

is the highest.

Note: Energy consumption, training time, and parameter in Table 2 counts for baseline methods represent estimated values derived from reported architectural configurations and standard hardware assumptions, as these methods do not report green computing metrics in their original publications. Specifically, parameter counts are computed from published layer configurations, training times are estimated based on reported epoch counts and dataset sizes on comparable CPU hardware, and energy values are approximated using the power model in Equation (14) applied to estimated training durations. Green-USAD values are directly hardware-measured throughout the training lifecycle. Efficiency (Table 2): USAD (10M+, 500 Wh, 2 h), LSTM-AE (5M+, 300 Wh,

1.5

h), AE (3M+, 200 Wh, 1 h), Green-USAD (

2.7 M

,

3.0

Wh,

6.2

min, GES

97.3

). Achieves

100 \times

energy reduction, 10–

20 \times

time reduction.

5.3. Real-World Deployment Considerations

Despite strong benchmark performance, deploying Green-USAD in live ICS environments introduces several practical challenges. Concept drift arising from process reconfiguration may cause the learned normal manifold to diverge over time, necessitating periodic retraining. Stealthy adversarial attacks crafted to remain within reconstruction error thresholds represent a critical failure scenario, suggesting the need for complementary physics-informed detection layers. Training data contamination from previously undetected intrusions in historical operational data may silently degrade detection sensitivity, while high-frequency data streams can generate substantial false alarm volumes even at high specificity, requiring alert correlation and operator-facing dashboards to prevent alarm fatigue in practical deployments.

6. Conclusions

This paper presented Green-USAD, a sustainability-first framework for ICS anomaly detection that addresses the environmental paradox of AI-powered cybersecurity: achieving robust threat detection while minimizing computational and environmental footprint. As critical infrastructure operators confront the dual imperatives of cyber resilience and reduced carbon emissions, Green-USAD demonstrates that security effectiveness and environmental sustainability constitute unified rather than competing objectives through intelligent architectural design.

Green-USAD inverts traditional design paradigms by integrating energy efficiency as a foundational constraint from inception rather than applying compression retrospectively. The compressed USAD-based architecture with validation-guided convergence achieves competitive detection performance, F1-scores of 0.9393 (SWaT) and 0.8904 (WADI), while consuming only 3.0 Wh training energy and producing 1.5 g CO₂ emissions, representing substantial reductions versus conventional approaches. Hardware-validated energy monitoring throughout the machine learning lifecycle, coupled with multi-objective Green Efficiency Score and Sustainability Index metrics, enables systematic quantification of security-sustainability trade-offs. Comprehensive evaluation on benchmark datasets demonstrates environmental impact reductions exceeding 99% while maintaining operational detection accuracy.

This work established a reproducible methodology for green cybersecurity extending beyond specific architectures or domains, demonstrating that sustainability-first design yields both environmental benefits and operational advantages for critical infrastructure protection.

Despite these promising results, several limitations warrant acknowledgement and motivate future research directions. First, Green-USAD is evaluated exclusively on two water infrastructure datasets, SWaT and WADI, and while these represent widely accepted ICS benchmarks, the generalisability of the framework to other critical infrastructure domains such as power grids, oil and gas pipelines, and smart manufacturing systems remains to be empirically validated. Second, the unsupervised training paradigm, while advantageous for real-world deployments where labelled attack data is scarce, may produce elevated false positive rates under novel but legitimate operational conditions not represented in the training data. Third, the current evaluation is conducted on a single CPU-only hardware platform, and the behaviour of Green-USAD under varying computational constraints typical of edge ICS deployments, such as embedded controllers and resource-constrained PLCs, merits further investigation. Future work will pursue several directions to address these limitations: extending Green-USAD to diverse ICS domains and heterogeneous sensor modalities to validate cross-domain generalisability; incorporating adaptive thresholding mechanisms to dynamically adjust detection sensitivity as operational patterns evolve; exploring federated learning formulations to enable privacy-preserving distributed deployment across geographically separated facilities without centralising sensitive operational data; and investigating model pruning, quantisation, and knowledge distillation techniques to further reduce the carbon footprint while preserving detection accuracy, pushing the sustainability-first paradigm toward practical edge deployment in real-world critical infrastructure protection.

Generative AI Statement: The authors used AI-assisted language editing tools to improve the clarity, readability, and overall quality of the manuscript. Grammarly Premium was employed for grammar correction and linguistic refinement across the text. In addition, limited language editing support was obtained using ChatGPT (version 5.0) for selected sections, including the abstract, introduction, and results.

These tools were used strictly for language enhancement purposes. They did not contribute to generation of scientific content. All intellectual contributions, analyses, and conclusions presented in this work are solely those of the authors, who take full responsibility for the integrity and originality of the manuscript.

Author Contributions

Conceptualization, M.M.A. and A.T.; Writing, M.M.A., Y.D. and L.C.D.S.; software, M.M.A. and A.T.; validation, R.A.A.H.M.A., Y.D. and L.C.D.S.; formal analysis, M.F.Z., Y.D. and A.T.; investigation, Y.D. and L.C.D.S.; resources, M.M.A. and A.T.; data curation, M.M.A. and A.T.; writing—original draft preparation, M.M.A. and Y.D.; writing—review and editing, A.T., Y.D. and L.C.D.S.; visualization, M.M.A., R.A.A.H.M.A., Y.D. and L.C.D.S.; supervision, A.T., R.A.A.H.M.A. and L.C.D.S.; project administration, A.T., M.F.Z., R.A.A.H.M.A. and L.C.D.S.; funding acquisition, A.T. and M.F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The research has been funded by Universiti Brunei Darussalam, Brunei Darussalam, and Universiti Kuala Lumpur, Malaysia.

Data Availability Statement

The iTrust centre, Singapore University of Technology and Design, provided data upon the first author’s request. As the data is confidential for the authors and authors are not allowed to share the data, it can be requested for research purposes from the following webpage Access date 16 July 2023 https://itrust.sutd.edu.sg/research/publications/ by selecting iTrust Labs then datasets, and then the required dataset. Further queries can be addressed to the corresponding author.

Acknowledgments

We acknowledge the Singapore University of Technology and Design for providing the dataset for the purpose of research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Hybepparameters

Table A1. Model Hyperparameters.

Parameter	Value	Rationale
Window size	100	Captures ∼1.5 min of context
Batch size	256	Maximizes hardware utilization
Learning rate	0.001	Standard for Adam optimizer
LR decay	0.5/5 epochs	Enables fine-tuning
Epochs	10	Green training limit
Latent dimension	16	High compression for efficiency
Hidden dims	256, 128	Balanced capacity/efficiency
Smoothing window	12	Reduces noise in scores
Score weight $(α)$	0.5	Equal decoder contribution

Appendix B. Results

Table A2. Latent Dimension Sensitivity.

Latent Dim	F1-Score	Energy (Wh)	GES
8	0.912	2.7	94.2
16	0.939	3.0	97.3
32	0.941	3.8	95.8
64	0.943	5.2	92.4

Table A3. Epoch Count Sensitivity.

Epochs	F1-Score	Energy (Wh)	GES
5	0.921	1.5	95.8
10	0.939	3.0	97.3
20	0.942	6.1	94.2
50	0.944	15.3	86.7

Table A4. Model Complexity.

Metric	Green-USAD	USAD	Reduction
Total Params	2743 K	10,000 K	+73%
Encoder Params	1350 K	5000 K	+73%
Decoder Params (each)	697K	2500 K	+72%
Latent Dim.	16	64	75%

Table A5. Overall Detection, Energy, Carbon, and Inference Metrics.

Metric	Value
Detection Performance
F1-Score	0.9393
Precision	0.9909
Recall	0.8930
AUC-ROC	0.9197
Specificity	0.9984
Energy Efficiency
Total Training Energy	3.0 Wh
Energy per Epoch	0.30 Wh
Average Power	17.8 W
Peak Power	22.5 W
Training Time	6.2 min
Energy Savings	99.4%
Carbon Footprint
Total Carbon Emissions	1.5 g CO₂
Carbon per Epoch	0.15 g CO₂
Carbon Savings	99.4%
Carbon Intensity Used	0.5 kg CO₂/kWh
Inference Efficiency
Inference Time per Sample	0.89 ms
Throughput	1124 samples/sec
Inference Energy per 1000 Samples	0.005 Wh
Inference Carbon per 1000 Samples	0.003 g CO₂

Table A6. Environmental Impact Equivalents.

Equivalent	Value
LED bulb hours	0.3 h
Smartphone charges	0.3 charges
Car driving	0.01 km
CO₂ savings vs. baseline	248.5 g
Trees equivalent (annual)	0.004 trees

References

Li, Z.; Sun, Y.; Yang, L.; Zhao, Z.; Chen, X. Unsupervised machine anomaly detection using autoencoder and temporal convolutional network. IEEE Trans. Instrum. Meas. 2022, 71, 3525813. [Google Scholar] [CrossRef]
Aslam, M.M.; Tufail, A.; Apong, R.A.A.H.M.; De Silva, L.C.; Raza, M.T. Scrutinizing security in industrial control systems: An architectural vulnerabilities and communication network perspective. IEEE Access 2024, 12, 67537–67573. [Google Scholar] [CrossRef]
Feng, C.; Li, T.; Chana, D. Multi-level anomaly detection in industrial control systems via package signatures and LSTM networks. In Proceedings of the 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN); IEEE: Piscataway, NJ, USA, 2017; pp. 261–272. [Google Scholar]
Kumar, V.; Sinha, D. Synthetic attack data generation model applying generative adversarial network for intrusion detection. Comput. Secur. 2023, 125, 103054. [Google Scholar] [CrossRef]
Shang, W.; Qiu, J.; Shi, H.; Wang, S.; Ding, L.; Xiao, Y. An efficient anomaly detection method for industrial control systems: Deep convolutional autoencoding transformer network. Int. J. Intell. Syst. 2024, 2024, 5459452. [Google Scholar] [CrossRef]
Xu, L.; Wang, B.; Zhao, D.; Wu, X. DAN: Neural network based on dual attention for anomaly detection in ICS. Expert Syst. Appl. 2025, 263, 125766. [Google Scholar] [CrossRef]
Benka, D.; Horváth, D.; Špendla, L.; Gašpar, G.; Strémy, M. Machine Learning-Based Detection of Anomalies, Intrusions and Threats in Industrial Control Systems. IEEE Access 2025, 13, 12502–12514. [Google Scholar] [CrossRef]
Audibert, J.; Michiardi, P.; Guyard, F.; Marti, S.; Zuluaga, M.A. Usad: Unsupervised anomaly detection on multivariate time series. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2020; pp. 3395–3404. [Google Scholar]
Ji, Z.; Li, J.; Telgarsky, M. Early-stopped neural networks are consistent. Adv. Neural Inf. Process. Syst. 2021, 34, 1805–1817. [Google Scholar]
Burgetová, I.; Matoušek, P.; Ryšavỳ, O. Anomaly detection of ICS communication using statistical models. In Proceedings of the 2021 17th International Conference on Network and Service Management (CNSM); IEEE: Piscataway, NJ, USA, 2021; pp. 166–172. [Google Scholar]
Moustafa, N.; Slay, J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. A Glob. Perspect. 2016, 25, 18–31. [Google Scholar] [CrossRef]
Hossain, M.A.; Hasan, T.; Karovic, V., Jr.; Abdeljaber, H.A.; Haque, M.A.; Ahmad, S.; Zafar, A.; Nazeer, J.; Mishra, B. Deep learning and ensemble methods for anomaly detection in ICS security. Int. J. Inf. Technol. 2025, 17, 1761–1775. [Google Scholar] [CrossRef]
Sotiris, V.A.; Peter, W.T.; Pecht, M.G. Anomaly detection through a bayesian support vector machine. IEEE Trans. Reliab. 2010, 59, 277–286. [Google Scholar] [CrossRef]
Kavzoglu, T.; Teke, A. Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arab. J. Sci. Eng. 2022, 47, 7367–7385. [Google Scholar] [CrossRef]
Rashid, U.; Saleem, M.F.; Rasool, S.; Abdullah, A.; Mustafa, H.; Iqbal, A. Anomaly Detection using Clustering (K-Means with DBSCAN) and SMO. J. Comput. Biomed. Inform. 2024, 7. [Google Scholar] [CrossRef]
Kumar, A.; Kumar, A.; Raja, R.; Dewangan, A.K.; Kumar, M.; Soni, A.; Agarwal, D.; Saudagar, A.K.J. Revolutionising anomaly detection: A hybrid framework for anomaly detection integrating isolation forest, autoencoder, and Conv. LSTM. Knowl. Inf. Syst. 2025, 67, 11903–11953. [Google Scholar] [CrossRef]
Koniki, R.; Ampapurapu, M.D.; Kollu, P.K. An anomaly based network intrusion detection system using LSTM and GRU. In Proceedings of the 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC); IEEE: Piscataway, NJ, USA, 2022; pp. 79–84. [Google Scholar]
Nguyen, H.H.; Nguyen, C.N.; Dao, X.T.; Duong, Q.T.; Kim, D.P.T.; Pham, M.T. Variational autoencoder for anomaly detection: A comparative study. arXiv 2024, arXiv:2408.13561. [Google Scholar] [CrossRef]
Aslam, M.M.; Tufail, A.; De Silva, L.C.; Apong, R.A.A.H.M. Multi-Feature Hybrid Anomaly Detection in ICS: An Integration of ML, DL, and Statistical Techniques. In Proceedings of the 3rd ACM Workshop on Secure and Trustworthy Deep Learning Systems; Association for Computing Machinery: New York, NY, USA, 2025; pp. 43–51. [Google Scholar]
Hu, Y.; Zhang, D.; Cao, G.; Pan, Q. Network data analysis and anomaly detection using cnn technique for industrial control systems security. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC); IEEE: Piscataway, NJ, USA, 2019; pp. 593–597. [Google Scholar]
Han, C.; Gim, G. Time-Series-Based Anomaly Detection in Industrial Control Systems Using Generative Adversarial Networks. Processes 2025, 13, 2885. [Google Scholar] [CrossRef]
Marino, D.L.; Wickramasinghe, C.S.; Rieger, C.; Manic, M. Self-supervised and interpretable anomaly detection using network transformers. IEEE Trans. Ind. Inform. 2025, 21, 4252–4261. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Ren, Y.; Wang, L.; Wen, Z. A Reliable Bolt Key-Points Detection Method in Industrial Magnetic Separator Systems. IEEE Trans. Instrum. Meas. 2025, 74, 5015710. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Fu, M.; Wang, L.; Zhao, Z. STMA-Net: A Spatial Transformation-Based Multiscale Attention Network for Complex Defect Detection With X-Ray Images. IEEE Trans. Instrum. Meas. 2024, 73, 5014511. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Fu, M.; Wang, L.; Zhao, Z. An X-Ray-Based Multiexpert Inspection Method for Automatic Welding Defect Assessment in Intelligent Pipeline Systems. IEEE/ASME Trans. Mechatron. 2025, 30, 1753–1764. [Google Scholar] [CrossRef]
Khan, R.Z.; Razak, L.A.; Premaratne, G. Literature Review of Green Management and Advanced Technologies for Firm Performance. In Green Engineering for Optimizing Firm Performance; Chapman and Hall/CRC: Boca Raton, FL, USA, 2025; pp. 1–30. [Google Scholar]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3645–3650. [Google Scholar]
Bacenetti, J. Heat and cold production for winemaking using pruning residues: Environmental impact assessment. Appl. Energy 2019, 252, 113464. [Google Scholar] [CrossRef]
Krishnan, S.; Lam, M.; Chitlangia, S.; Wan, Z.; Barth-Maron, G.; Faust, A.; Reddi, V.J. QuaRL: Quantization for fast and environmentally sustainable reinforcement learning. arXiv 2019, arXiv:1910.01055. [Google Scholar]
Tripathi, A.M.; Pandey, O.J. Divide and distill: New outlooks on knowledge distillation for environmental sound classification. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 1100–1113. [Google Scholar] [CrossRef]
Xiong, Y.; Lan, L.C.; Chen, X.; Wang, R.; Hsieh, C.J. Learning to schedule learning rate with graph neural networks. In Proceedings of the International conference on learning representation (ICLR), Virtual, 25–29 April 2022. [Google Scholar]
Feng, J.; Huang, D. Optimal gradient checkpoint search for arbitrary computation graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 25–29 April 2021; pp. 11433–11442. [Google Scholar]
Li, H.; Wang, Y.; Hong, Y.; Li, F.; Ji, X. Layered mixed-precision training: A new training method for large-scale AI models. J. King Saud. Univ.-Comput. Inf. Sci. 2023, 35, 101656. [Google Scholar] [CrossRef]
Vrbančič, G.; Podgorelec, V. Transfer learning with adaptive fine-tuning. IEEE Access 2020, 8, 196197–196211. [Google Scholar] [CrossRef]
Patterson, D.; Gonzalez, J.; Le, Q.; Liang, C.; Munguia, L.M.; Rothchild, D.; So, D.; Texier, M.; Dean, J. Carbon emissions and large neural network training. arXiv 2021, arXiv:2104.10350. [Google Scholar] [CrossRef]
Liang, Y.; Li, X. Efficient kernel management on GPUs. In Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 14–18 March 2016; Volume 16, pp. 85–90. [Google Scholar]
Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green AI. Commun. ACM 2020, 63, 54–63. [Google Scholar] [CrossRef]
Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the International conference on artificial neural networks; Springer: Cham, Switzerland, 2019; pp. 703–716. [Google Scholar]
Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum Learning. In Proceedings of the 26th Annual International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2009; pp. 41–48. [Google Scholar]
Vapnik, V. Principles of Risk Minimization for Learning Theory; Morgan Kaufmann: Burlington, MA, USA, 1991; Volume 4, pp. 831–838. [Google Scholar]
Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-Physical Systems for Smart Water Networks (CySWater); IEEE: Piscataway, NJ, USA, 2016; pp. 31–36. [Google Scholar]
Ahmed, C.M.; Palleti, V.R.; Mathur, A.P. WADI: A Water Distribution Testbed for Research in the Design of Secure Cyber Physical Systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks; ACM: New York, NY, USA, 2017; pp. 25–28. [Google Scholar] [CrossRef]
Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. arXiv 2022, arXiv:2201.07284. [Google Scholar] [CrossRef]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2828–2837. [Google Scholar]

Figure 1. Proposed USAD Methodology Architecture.

Figure 2. End-to-end experimental pipeline of Green-USAD.

Figure 3. Flowchart of the proposed algorithm development process.

Figure 4. Green-USAD results for SWaT: (top) training dynamics, loss convergence,

F_{1}

progression, precision/recall trends, AUC-ROC evolution; (middle) detection performance, ROC curve, precision-recall curve, confusion matrix, score distribution; (bottom) green metrics, energy/carbon per epoch, GES, SI.

Figure 4. Green-USAD results for SWaT: (top) training dynamics, loss convergence,

F_{1}

progression, precision/recall trends, AUC-ROC evolution; (middle) detection performance, ROC curve, precision-recall curve, confusion matrix, score distribution; (bottom) green metrics, energy/carbon per epoch, GES, SI.

Figure 5. Green-USAD results for WADI.

Figure 6. Attack timeline for SWaT: anomaly scores with threshold, ground truth labels, predictions, temporal alignment demonstrating major attack detection.

Figure 7. Attack timeline for WADI.

Figure 8. Green analysis for SWaT: energy savings, carbon reduction, efficiency radar, score breakdown, cumulative impact, power/CPU per epoch.

Figure 9. Green analysis for WADI.

Figure 10. Environmental Impact Analysis on SWAT Dataset.

Figure 11. Environmental Impact Analysis on WADI Dataset.

Table 1. Comparison of Green ICS Anomaly Detection Methods.

Ref.	Method	ICS	DL	Energy	Carbon	Green	Light	RT	Unsup.
[8]	USAD	●	●	○	○	○	◐	●	●
[1]	LSTM-AE	●	●	○	○	○	◐	●	●
[20]	CNN	●	●	○	○	○	◐	●	●
[38]	MAD-GAN	●	●	○	○	○	○	●	●
[3]	Multi-lvl	●	●	○	○	○	◐	●	◐
[17]	LSTM	●	●	○	○	○	●	●	●
[18]	VAE	●	●	○	○	○	◐	●	●
[19]	Multi	●	●	○	○	◐	●	●	●
[11]	Analysis	○	●	●	○	○	○	○	◐
[35]	Carbon	○	●	●	◐	●	○	○	○
[1]	Measure	○	●	●	○	○	○	○	○
[37]	Green AI	○	●	◐	●	◐	●	○	○
This work	Green-USAD	●	●	●	●	●	●	●	●

● Represents Full consideration, ◐ Partial consideration, and ○ No consideration; (Industrial Control System (ICS), Deep Learning (DL), Real Time (RT) and Unsupervised (Unsup)).

Table 2. Performance and Efficiency Comparison.

Method	F1	Prec.	Recall	AUC
[8] USAD	0.90	0.98	0.83	0.91
[16] LSTM+AE	0.85	0.92	0.79	0.88
[38] AE	0.89	0.95	0.84	0.90
[20] CNN	0.86	0.93	0.80	0.87
[44] TranAD (SWaT)	0.81	0.97	0.69	0.84
[44] TranAD (WADI)	0.49	0.35	0.82	0.89
[38] MAD-GAN ** (WADI)	0.37	0.41	0.33	—
[38] MAD-GAN ** (SWaT)	0.77	0.98	0.63	—
[45] OmniAnomaly (total)	0.85	0.77	0.95	—
Green-USAD	0.9393	0.9909	0.8930	0.9197
Efficiency Comparison
Method	Params	Energy	Time	Green
[8] USAD	10M+	∼500 Wh	∼2 h	—
[16] LSTM+AE	5M+	∼300 Wh	∼1.5 h	—
[20] CNN	3M+	∼200 Wh	∼1 h	—
[44] TranAD	12M+	∼600 Wh	∼2.5 h	—
[45] OmniAnomaly	4M+	∼250 Wh	∼1.2 h	—
Green-USAD *	2.7M	3.0 Wh	6.2 min	97.3

* Energy and time values for baselines are estimated from architectural configurations and published training details. Green-USAD values are directly hardware-measured. ** MAD-GAN results reported per dataset separately due to significant performance variation across benchmarks. Bold Digits are results of proposed experiment.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aslam, M.M.; Tufail, A.; Ding, Y.; De Silva, L.C.; Awg Haji Mohd Apong, R.A.; Zuhairi, M.F. Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems. Technologies 2026, 14, 267. https://doi.org/10.3390/technologies14050267

AMA Style

Aslam MM, Tufail A, Ding Y, De Silva LC, Awg Haji Mohd Apong RA, Zuhairi MF. Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems. Technologies. 2026; 14(5):267. https://doi.org/10.3390/technologies14050267

Chicago/Turabian Style

Aslam, Muhammad Muzamil, Ali Tufail, Yepeng Ding, Liyanage Chandratilak De Silva, Rosyzie Anna Awg Haji Mohd Apong, and Megat F. Zuhairi. 2026. "Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems" Technologies 14, no. 5: 267. https://doi.org/10.3390/technologies14050267

APA Style

Aslam, M. M., Tufail, A., Ding, Y., De Silva, L. C., Awg Haji Mohd Apong, R. A., & Zuhairi, M. F. (2026). Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems. Technologies, 14(5), 267. https://doi.org/10.3390/technologies14050267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Green Computing for Critical Infrastructure: A Sustainability-First AI Framework for Energy-Efficient Anomaly Detection in Industrial Control Systems

Abstract

1. Introduction

Contributions

2. Related Work

Green Computing and Sustainable Artificial Intelligence

3. Methodology

3.1. System Architecture and Design Philosophy

3.2. Energy-Efficient USAD Model Architecture

3.3. Hardware-Validated Energy Monitoring

3.4. Multi-Objective Green Efficiency Metrics

3.5. Anomaly Detection Procedure

4. Experimental Setup

4.1. Benchmark Datasets

4.2. Data Preprocessing Protocol

4.3. Implementation Configuration

4.4. Evaluation Metrics

5. Results and Analysis

5.1. Detection Performance

5.2. Energy Efficiency Results

5.3. Real-World Deployment Considerations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Hybepparameters

Appendix B. Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI