1. Introduction
Industrial Process Control and Monitoring Systems (PCMS), including Supervisory Control and Data Acquisition (SCADA), Distributed Control Systems (DCS), and Programmable Logic Controllers (PLCs) [
1], constitute the operational backbone of critical infrastructure sectors such as energy production, water treatment, transportation, and manufacturing [
2]. These systems are responsible for real-time monitoring and control of physical processes whose failure can result in severe economic loss, environmental damage, and threats to human safety [
3]. In recent years, the integration of Operational Technology (OT) with Information Technology (IT), driven by Industry 4.0 initiatives and the rapid adoption of the Industrial Internet of Things (IIoT), has significantly enhanced operational efficiency and remote accessibility [
4]. However, this convergence has also expanded the attack surface of industrial environments, exposing PCMS to increasingly sophisticated cyber threats [
5].
Unlike conventional IT systems, PCMS operate under strict real-time constraints, rely heavily on legacy infrastructure, and employ specialized industrial communication protocols such as Modbus, DNP3, and OPC-UA [
5]. These characteristics limit the effectiveness of traditional IT-centric cybersecurity solutions and introduce unique vulnerabilities. The consequences of successful cyberattacks on PCMS extend beyond data breaches to include physical equipment damage, operational shutdowns, and safety incidents [
6]. High-profile attacks such as Stuxnet, Havex, Black Energy, and Industroyer have demonstrated the ability of adversaries to manipulate industrial processes and disrupt critical services, highlighting the urgent need for intelligent and adaptive cybersecurity mechanisms tailored to industrial environments. Conventional signature-based intrusion detection systems (IDS) remain widely deployed in PCMS but are inherently reactive and ineffective against zero-day attacks, polymorphic malware, and stealthy intrusions that mimic legitimate operational behavior [
7]. Furthermore, modern PCMS generate large volumes of heterogeneous data, including sensor measurements, actuator signals, control commands, and network traffic logs [
8,
9]. The high dimensionality, temporal dependencies, and severe class imbalance inherent in such data pose significant challenges for traditional rule-based and shallow machine learning approaches, which often struggle to distinguish malicious anomalies from normal operational fluctuations [
10,
11].
Deep learning has emerged as a promising paradigm for addressing these challenges due to its ability to automatically learn hierarchical representations from high-dimensional and multivariate time-series data [
12]. Architectures such as Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Transformer models have demonstrated strong potential in modeling spatial and temporal dependencies in industrial data streams [
13]. By leveraging deep neural networks, it becomes possible to continuously monitor industrial processes and detect subtle deviations indicative of cyber intrusions, insider threats, or advanced persistent attacks without relying solely on predefined signatures [
14,
15]. Despite growing research interest, existing deep learning-based ICS security studies exhibit several limitations [
16]. Many rely on single datasets, limiting generalizability across diverse industrial contexts, while others emphasize detection accuracy without adequately addressing class imbalance, computational constraints, interpretability [
17], or deployment feasibility in real-time environments. In addition, insufficient methodological detail in some studies hinders reproducibility and practical adoption in operational PCMS [
18,
19].
To address these gaps, this study presents a systematic evaluation of deep learning architectures for cybersecurity threat detection in industrial process control and monitoring systems. Using the HAI Security Dataset, which reflects realistic industrial operational behaviors and attack scenarios, we assess and compare the performance of CNN, LSTM, and Transformer models on multivariate sensor, actuator, and network data. Beyond detection performance, the study also considers generalization capability and practical deployment considerations relevant to resource-constrained industrial environments [
20,
21]. Despite the growing adoption of deep learning for ICS cybersecurity, existing studies often suffer from limited generalizability due to reliance on single models or datasets, insufficient attention to class imbalance, and inadequate discussion of deployment constraints in real-time industrial environments. Furthermore, many works report high accuracy without critically analyzing overfitting, minority-class performance, or computational feasibility, limiting their practical applicability.
To address the identified limitations in existing industrial cybersecurity research, this study makes several key contributions. First, it provides a unified and systematic experimental comparison of CNN, LSTM, and Transformer models for anomaly detection in Industrial Process Control and Monitoring Systems (PCMS) using realistic multivariate time-series data under identical preprocessing, training, and evaluation conditions [
22,
23]. Second, this study extends performance analysis beyond overall accuracy by explicitly examining class imbalance, overfitting behavior, and model generalization, with particular emphasis on the detection of minority (attack) classes [
24]. Third, it discusses practical deployment considerations for deep learning–based intrusion detection in real-time and resource-constrained ICS environments, highlighting the trade-offs between detection performance and computational complexity [
25]. Finally, the study identifies promising future research directions aimed at improving the robustness, scalability, and trustworthiness of AI-driven cybersecurity solutions for industrial process control and monitoring systems.
2. Materials and Methods
This section describes the experimental methodology adopted in this study and explains how each deep learning model is applied and evaluated within the same anomaly detection framework. All models (CNN, LSTM, and Transformer) are trained and tested using identical data preprocessing steps, sliding-window segmentation, and evaluation metrics to ensure a fair and consistent comparison. The outputs of these models directly correspond to the experimental results presented in
Section 3, including the dataset used in this study, the deep learning models employed for anomaly detection, and the overall algorithmic workflow adopted to evaluate cybersecurity threats in Industrial Control Systems (ICS).
2.1. Dataset Description
This study utilizes the HAI Security Dataset, a publicly available dataset designed to emulate realistic Industrial Control System (ICS) operations and cyberattack scenarios [
26]. The dataset consists of multivariate time-series data collected from an industrial process control environment and includes sensor measurements, actuator states, and network-related variables. The dataset comprises multiple features (over 50 process variables) representing physical and cyber components of the system, such as temperature, pressure, flow rates, valve states, control commands, and network activity indicators. These features capture the dynamic behavior of the industrial process under both normal and abnormal operating conditions. Several types of cyberattacks are represented in the dataset, including false data injection, command manipulation, and process disruption attacks, which reflect realistic threat scenarios targeting ICS environments. Each data instance is labeled as either normal or attack, resulting in a highly imbalanced class distribution, where attack samples constitute a small fraction of the total data. This imbalance reflects real-world industrial environments, where malicious events are rare but highly impactful. To ensure reliable evaluation, the dataset was divided into training, validation, and testing subsets, preserving temporal order to prevent data leakage and to assess model generalization.
2.2. CNN-Based Anomaly Detection Model
The Convolutional Neural Network (CNN) model is designed to capture local temporal patterns in multivariate ICS time-series data. The input to the CNN consists of fixed-length sliding windows of normalized time-series data, where each window represents a sequence of observations across all features. Temporal dependencies are handled by applying one-dimensional convolutional filters along the time axis, enabling the model to learn short-term fluctuations, abrupt changes, and localized anomalies in sensor and control signals. Pooling layers are used to reduce dimensionality and improve robustness to noise. The extracted feature representations are passed to fully connected layers, which output a binary anomaly prediction indicating normal or attack behavior. The CNN model described in this subsection is evaluated in the Results Section using accuracy, confusion matrix, ROC–AUC, and classification metrics to assess its effectiveness in detecting cyber anomalies in ICS data.
2.3. LSTM-Based Anomaly Detection Model
The Long Short-Term Memory (LSTM) model is employed to model long-term temporal dependencies inherent in industrial processes. Similar to the CNN, the input to the LSTM consists of sliding windows of multivariate time-series data. Unlike convolutional models, the LSTM processes input sequences sequentially and maintains internal memory states that allow it to learn the evolution of system behavior over time. This capability enables the detection of anomalies that manifest as deviations from expected operational sequences rather than instantaneous signal changes. The final hidden state of the LSTM is passed to a dense layer, producing a binary classification output that indicates whether the observed sequence corresponds to normal operation or an attack. The performance of the LSTM-based model is experimentally analyzed in result Section, where its ability to capture long-term temporal dependencies is compared against CNN and Transformer models.
2.4. Transformer-Based Anomaly Detection Model
The Transformer-based model leverages a self-attention mechanism to capture global temporal dependencies across the entire input sequence. The input consists of fixed-length time-series windows, augmented with positional encoding to preserve temporal order. Unlike recurrent architectures, the Transformer processes all time steps in parallel, allowing it to model complex, long-range interactions among features and time points. The self-attention layers compute contextualized representations of the input sequence, which are then aggregated and passed to a classification head. The model outputs a binary anomaly label, identifying whether the sequence represents normal behavior or a cyberattack. Experimental results for the Transformer model are presented in the result highlighting its generalization capability and trade-offs between detection performance and computational complexity.
2.5. Algorithmic Workflow
The overall anomaly detection process employed in this study is summarized in Algorithm 1.
| Algorithm 1. Deep Learning-Based ICS Anomaly Detection.
|
| Input: Multivariate time-series data X |
| Output: Anomaly label y |
| 1. Normalize raw ICS time-series data. |
| 2. Segment data into fixed-length sliding windows.
|
| 3. Train the selected deep learning model (CNN, LSTM, or Transformer) using labeled data. |
| 4. Predict anomaly labels for unseen test sequences. |
| 5. Evaluate detection performance using standard classification metrics. |
2.6. Experimental Setup and Implementation Details
To ensure reproducibility, all experiments were conducted using Python 3.11 with the TensorFlow/Keras deep learning framework. The HAI Security Dataset was normalized using min–max scaling to ensure uniform feature ranges. Time-series data were segmented into fixed-length sliding windows of W time steps with a stride of S to capture temporal dependencies. The dataset was split into training (70%), validation (15%), and testing (15%) sets while preserving temporal order to prevent data leakage. All models were trained for N epochs using the Adam optimizer with a learning rate of η and a batch size of B. Binary cross-entropy was used as the loss function. To mitigate overfitting, early stopping was applied based on validation loss. Model performance was evaluated using accuracy, precision, recall, F1-score, and AUC, which are particularly suitable for imbalanced industrial cybersecurity datasets.
Hyperparameters were selected based on commonly adopted configurations in prior ICS anomaly detection studies and empirical validation on the training and validation sets. Early stopping and dropout were applied to reduce overfitting. The dataset was split into training (70%), validation (15%), and testing (15%) subsets while preserving temporal order to avoid data leakage.
3. Results
The results of the study show that all three deep learning models, CNN, LSTM, and Transformer, performed well in detecting cyber threats within Industrial Control Systems using the HAI Security Dataset. The Transformer model achieved the highest accuracy at 92%, followed by CNN at 91% and LSTM at 90%, with all models attaining an F1-score of 91%. The LSTM model recorded the highest precision (93%) and AUC (0.98), while the Transformer demonstrated the best overall balance between recall and generalization. However, all models exhibited signs of overfitting and struggled with detecting minority class (attack) instances due to class imbalance. Overall, the Transformer emerged as the most effective model for real-time anomaly detection in ICS environments.
In addition to detection accuracy, inference efficiency was qualitatively analyzed. Transformer-based models demonstrated higher computational overhead compared to CNN and LSTM models due to the self-attention mechanism, which may limit their deployment on resource-constrained edge devices. Nevertheless, their superior generalization performance highlights a trade-off between predictive accuracy and computational cost. It is important to note that this evaluation was conducted using a single dataset, and further validation on additional ICS datasets such as SWAT, WADI, and EPIC is necessary to confirm the generalizability of the models across diverse operational scenarios.
Figure 1 represents the accuracy and loss of a Convolutional Neural Network (CNN) model, clearly illustrating the problem of overfitting. While the training accuracy consistently increases and training loss consistently decreases, indicating the CNN is effectively learning the training data, the validation accuracy plateaus and even drops in later epochs, coupled with a fluctuating or increasing validation loss. This divergence signifies that the CNN is memorizing the training examples rather than learning generalizable features, leading to poor performance on unseen data despite excellent performance on the training set.
3.1. CONVOLUTIONAL NEURAL NETWORK (CNN)
Figure 2 visually summarizes the performance of a binary classification model, showing the counts of correct and incorrect predictions for two classes, labeled 0 and 1. Out of a total of 166 instances, the model correctly identified 132 true negatives (class 0 predicted as 0) and 19 true positives (class 1 predicted as 1), resulting in 151 accurate predictions. However, it made 10 false positive errors (class 0 incorrectly predicted as 1) and 5 false negative errors (class 1 incorrectly predicted as 0), indicating a stronger performance in identifying class 0 than class 1.
Figure 3 depicts the ROC curve, with an impressive Area Under the Curve (AUC) of 0.97, which signifies that the model is an excellent binary classifier. Its curve hugs the top-left corner of the plot, demonstrating a high true positive rate while maintaining a low false positive rate across various classification thresholds. This indicates a strong capability to distinguish between the two classes, performing significantly better than a random classifier.
Table 1 details the model’s performance across two classes, 0 and 1, highlighting a strong capability in identifying class 0 with 96% precision and 93% recall. Conversely, performance on class 1, which has significantly fewer instances (24 vs. 142 for class 0), is weaker, showing a precision of 66% and a recall of 79%. Although the overall accuracy is high at 91%, the disparity in per-class metrics, particularly the lower precision for class 1, suggests the model struggles more with false positives for this minority class.
3.2. LONG SHORT-TERM MEMORY (LSTM)
Figure 4 illustrates the training progression of an LSTM model, exhibiting clear signs of overfitting. While the training accuracy consistently improves and training loss continuously decreases across epochs, the validation metrics show a significant divergence. In the initial training phase (first plot), the validation accuracy starts to plateau and fluctuate while validation loss remains higher than training loss, indicating early overfitting. This trend is exacerbated in the longer training run (second plot), where the validation accuracy becomes volatile, and validation loss rises in later epochs, demonstrating the LSTM’s increasing inability to generalize effectively to unseen data despite continued learning on the training set.
Figure 5 indicates the model achieved an accuracy of approximately 90.36%, with 129 true negatives and 21 true positives, demonstrating good overall performance but a lower precision of 61.76% for class 1 compared to its recall of 87.5% for the same class. Concurrently, the accompanying accuracy and loss plots reveal a clear case of overfitting, as the training accuracy consistently improves and loss decreases, while the validation accuracy becomes volatile and validation loss begins to rise in later epochs, signifying a deterioration in the model’s ability to generalize to unseen data.
Figure 6 depicts the ROC curve with an AUC of 0.98 signifies the model’s excellent ability to distinguish between classes. The accompanying accuracy and loss plots reveal a critical issue of overfitting. As training progresses, the model continues to improve on the training data (rising accuracy, falling loss), but its performance on unseen validation data deteriorates, marked by volatile validation accuracy and a clear increase in validation loss in later epochs. This indicates that despite its strong discriminative power, the model is memorizing the training set and failing to generalize effectively.
Table 2 indicates that the model performs very well on the majority class (class 0, with 142 instances), achieving a high precision of 98% and recall of 91%. However, its performance on the minority class (class 1, with 24 instances) is less robust, demonstrated by a significantly lower precision of 62% despite a decent recall of 88%. While the overall accuracy is 90%, the notable disparity in precision between the two classes highlights that the model frequently makes false positive errors when predicting class 1, suggesting room for improvement in handling the imbalanced dataset.
3.3. Transformer
Figure 7 indicates that the Transformer model is significantly overfitting. While the model demonstrates robust learning on the training data, with consistently increasing accuracy and decreasing loss, its generalization to unseen validation data steadily degrades. This is evident from the widening gap between training and validation accuracy, the plateauing or increasing validation loss, and the overall instability of validation metrics, suggesting the model is memorizing the training set rather than extracting broadly applicable patterns.
Figure 8 summarizes the performance of a binary classification model, showing that out of 166 total instances, the model correctly identified 136 true negatives (class 0 predicted as 0) and 16 true positives (class 1 predicted as 1). However, it also made 6 false positive errors (class 0 incorrectly predicted as 1) and 8 false negative errors (class 1 incorrectly predicted as 0). This indicates that while the model performs reasonably well in correctly identifying class 0, it has a notable challenge with precisely predicting class 1 due to the false positive and false negative errors, despite identifying a fair portion of actual class 1 instances.
Figure 9 displays a Receiver Operating Characteristic (ROC) curve, a standard visualization for evaluating binary classification model performance. The orange curve plots the True Positive Rate (sensitivity) against the False Positive Rate for various classification thresholds. The dashed blue line represents a random classifier, serving as a baseline for comparison. A key metric, the Area Under the Curve (AUC), is reported as 0.95. This high AUC value, close to the ideal of 1.0, indicates that the model exhibits excellent discriminatory power, effectively distinguishing between positive and negative classes with a high likelihood of correctly identifying true positives while keeping false positives low.
Table 3 details a model’s performance across two classes, “0” and “1”, with a notable class imbalance evident from the support values (142 for class 0, 24 for class 1). The model demonstrates strong performance on class 0, achieving high precision (0.94), recall (0.96), and F1-score (0.95). However, its performance significantly drops for class 1, with lower precision (0.73) and recall (0.67), resulting in an F1-score of 0.70, indicating difficulty in accurately identifying this minority class. While the overall accuracy is high at 0.92, this figure is likely boosted by the model’s proficiency with the dominant class 0; therefore, metrics like the lower macro-averaged F1-score (0.82) and particularly the F1-score for class 1 (0.70) offer a more critical view of the model’s limitations on the minority class.
Table 4 compares the performance of three deep learning models, CNN, LSTM, and Transformer, across accuracy, precision, recall, and F1-score. All models demonstrate strong performance, with metrics consistently in the low 90s. The Transformer model shows a slight advantage in accuracy and recall (both 92%), while the LSTM achieves the highest precision (93%). Notably, all three models exhibit an identical F1-score of 91%, indicating a similar balance between precision and recall. Overall, the results suggest that all architectures are highly effective for the task, with only minor performance variations distinguishing them.
3.4. Summary of Experimental Results
All three deep learning models demonstrated strong anomaly detection performance on the HAI Security Dataset, with accuracy values exceeding 90%. The Transformer model achieved the highest overall accuracy (92%) and recall (92%), indicating superior generalization capability in detecting cyber threats. The LSTM model yielded the highest precision (93%), suggesting fewer false alarms, while the CNN model provided competitive performance with lower computational complexity. Despite high overall accuracy, all models exhibited reduced performance on the minority attack class due to severe data imbalance. This limitation is reflected in lower precision and F1-scores for class 1 across all models. ROC–AUC values above 0.95 confirm strong discriminatory power, even in the presence of overfitting. These results highlight a trade-off between detection accuracy, generalization, and computational efficiency in industrial cybersecurity applications.
4. Discussion
This study provides a comparative evaluation of three deep learning architectures: Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) networks, and Transformer models for cybersecurity threat detection in Industrial Control Systems (ICS) [
26]. Using the HAI Security Dataset, which reflects realistic ICS operational behavior and attack scenarios, the results demonstrate that deep learning approaches can effectively detect cyber threats in SCADA and DCS environments [
27]. Among the evaluated models, the Transformer achieved the highest overall accuracy (92%) and recall (92%), indicating superior generalization capability, while the LSTM model produced the highest precision (93%). Despite this strong overall performance, all models exhibited reduced effectiveness in detecting minority attack classes, primarily due to data imbalance and overfitting. This limitation highlights the need for improved learning strategies, including advanced regularization techniques, cost-sensitive loss functions, and data balancing methods such as synthetic oversampling or adversarial data generation [
28].
Beyond model performance, this research addresses practical considerations for deploying AI-based security solutions in industrial environments. Constraints such as limited computational resources at the edge, strict real-time requirements, and the lack of interpretability of deep learning models remain significant barriers to adoption. To mitigate these challenges, this study emphasizes the importance of model compression techniques, including pruning and quantization, as well as the integration of Explainable AI (XAI) methods such as SHAP, LIME, and attention mechanisms to enhance transparency and operator trust. In addition, hybrid approaches that combine deep learning with rule-based logic or domain-specific constraints are identified as promising solutions for improving robustness while maintaining operational reliability. Federated learning is also highlighted as a viable strategy for enabling collaborative model training across distributed industrial sites without exposing sensitive data, while digital twins offer a safe and realistic environment for validating AI models against cyber-physical attack scenarios [
29].
Although deep learning has matured significantly in conventional IT security, its adoption in ICS cybersecurity remains limited due to the unique characteristics of industrial environments, including legacy systems, real-time determinism, and specialized communication protocols. The findings of this study indicate that, when appropriately adapted, deep learning models can outperform traditional intrusion detection systems that rely on static rules and known signatures. In particular, the Transformer’s ability to capture long-range temporal dependencies makes it well-suited for identifying subtle and low-frequency attacks that are difficult to detect using conventional methods. This work contributes to a clearer understanding of how deep learning can be effectively applied to ICS cybersecurity. It underscores the importance of designing solutions that are accurate, explainable, and aligned with operational constraints. It also establishes a foundation for future research exploring sensor fusion, reinforcement learning, and causal inference to further strengthen resilience against sophisticated industrial cyber threats.
5. Conclusions
This study demonstrates the effectiveness of deep learning techniques for cybersecurity threat detection in Industrial Control Systems (ICS), particularly within SCADA and DCS environments. Using the HAI Security Dataset, CNN, LSTM, and Transformer models were evaluated on multivariate time-series data, with the Transformer achieving the highest overall accuracy of 92%. While all models exhibited strong detection capability, performance on minority attack classes was constrained by severe class imbalance and overfitting, highlighting critical challenges in real-world ICS anomaly detection. Beyond performance evaluation, this work underscores key practical considerations for deploying AI-based security solutions in industrial settings, including data scarcity, limited computational resources, explainability requirements, and strict real-time operational constraints. The findings suggest that although Transformer models offer superior generalization, their computational overhead presents trade-offs when deployed in resource-constrained environments. This study is subject to several limitations. First, the evaluation was conducted using a single dataset, which may restrict generalizability across diverse industrial contexts. Second, class imbalance and overfitting were not fully mitigated, potentially affecting the reliable detection of rare attack events. These limitations constrain external validity and motivate future research directions, including multi-dataset validation, advanced data balancing strategies, and federated learning–based training to enhance robustness and privacy. In addition, hybrid modeling approaches and digital twin–based validation frameworks are identified as promising avenues for improving the scalability, trustworthiness, and real-world applicability of AI-driven cybersecurity systems. The proposed framework provides a scalable foundation for real-time, intelligent anomaly detection and contributes to the development of more resilient cybersecurity defenses for critical industrial infrastructure.