Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks

Liu, Xin; Liu, Tao; Hu, Ning

doi:10.3390/electronics14214233

Open AccessArticle

Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks

by

Xin Liu

¹,

Tao Liu

^2,3,* and

Ning Hu

²

¹

College of Computer Science and Engineering, Changsha University, Changsha 410022, China

²

Peng Cheng Laboratory, Shenzhen 518108, China

³

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(21), 4233; https://doi.org/10.3390/electronics14214233

Submission received: 28 August 2025 / Revised: 12 October 2025 / Accepted: 13 October 2025 / Published: 29 October 2025

(This article belongs to the Special Issue Novel Approaches for Deep Learning in Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

Industrial control systems (ICS), as critical infrastructure supporting national operations, are increasingly threatened by sophisticated stealthy network attacks. These attacks often break malicious behaviors into multiple highly camouflaged packets, which are embedded into large-scale background traffic with low frequency, making them semantically and temporally indistinguishable from normal traffic and thus evading traditional detection. Existing methods largely rely on flow-level statistics or long-sequence modeling, resulting in coarse detection granularity, high latency, and poor byte-level interpretability, falling short of industrial demands for real-time and actionable detection. To address these challenges, we propose Avocado, a fine-grained, multi-level intrusion detection model. Avocado’s core innovation lies in contextual flow-feature fusion: it models each packet jointly with its surrounding packet sequence, enabling independent abnormality detection and precise localization. Moreover, a shared-query multi-head self-attention mechanism is designed to quantify byte-level importance within packets. Experimental results show that Avocado significantly outperforms state-of-the-art flow-level methods on NGAS and CLIA-M221 datasets, improving packet-level detection ACC by 1.55% on average, and reducing FPR and FNR to 3.2%, 3.6% (NGAS), and 3.7%, 4.3% (CLIA-M221), respectively, demonstrating its superior performance in both detection and interpretability.

Keywords:

AI-based threat detection; intrusion detection systems (IDS); deep learning for cybersecurity; industrial control systems (ICS)

1. Introduction

As a critical component of national infrastructure, ICS are widely deployed in sectors such as energy, electricity, manufacturing, and transportation. With the deep integration of information networks and control networks, the ICS attack surface continues to expand, posing increasingly severe cybersecurity threats [1]. In recent years, numerous cyberattacks targeting ICS—such as the Stuxnet worm in Iran [2], the Ukraine power grid incident [3], and the DarkSide ransomware attack in the United States [4]—have not only disrupted enterprise operations but also severely endangered national infrastructure security. As novel attack methods continue to evolve, traditional security protection systems face unprecedented challenges.

In real-world attack–defense scenarios, adversarial tactics against ICS are becoming increasingly intelligent and stealthy [5,6,7,8]. The core feature of stealthy attacks lies in blending malicious operations into large-scale background traffic, exhibiting low activity frequency and extended time spans, and masquerading as normal atomic operation packets to conceal malicious intent. To achieve such camouflage, attackers often embed malicious behavior into packets that are semantically and statistically indistinguishable from legitimate traffic. For instance, malicious control logic may be split into multiple small packets [7], which are highly similar to normal traffic in both content and timing, thereby evading detection. Additionally, attackers may analyze normal traffic patterns over long periods and inject carefully crafted “normal” packets at critical moments to gradually manipulate physical processes [5,6,8]. Numerous real-world cases demonstrate that such stealthy attacks are extremely difficult for traditional detection methods to identify, significantly increasing the complexity of ICS security protection.

Due to the constraints of ICS environments, bypass deployment becomes essential. ICS devices are often not allowed to be patched or updated without risking process integrity, human safety, or environmental damage. Furthermore, their computational and storage resources are limited, making them unsuitable for hosting additional security modules. IDS, due to their deploy-ability without downtime and the ability to leverage external computation, are considered one of the most promising solutions for ICS security [9].

To address ICS security issues, researchers from academia and industry have proposed various intrusion detection techniques, broadly categorized into rule/feature-based methods, statistical learning methods, and representation learning approaches. Rule-based methods rely on static features such as IP addresses, ports, and payloads, assuming the closed and deterministic nature of ICS environments. However, in realistic scenarios, attackers can disguise themselves within legitimate sessions [10], rendering such rule-based systems ineffective. Moreover, stealthy attacks that mimic legitimate packet-level atomic operations can further undermine detection performance [8]. Statistical learning techniques focus on modeling traffic-level metrics—such as packet count, transmission rate, and flow duration—typically using algorithms like Decision Trees [11] and Support Vector Machines [12,13]. While these approaches offer certain detection capabilities, they often rely heavily on complex feature engineering. In contrast, representation learning techniques based on deep neural networks (DNNs) have recently gained prominence, offering automatic extraction of high-dimensional features and complex patterns from traffic data [14], and demonstrating superior performance in detecting advanced stealthy attacks.

Despite advancements in detection techniques, limitations persist in traffic representation schemes themselves. As shown in Figure 1, existing approaches can be classified into packet-level and flow-level representations. Packet-level approaches, also known as Deep Packet Inspection (DPI) [15], focus on analyzing features within a single packet but are often ineffective at identifying well-disguised malicious atomic operations [8]. In contrast, flow-level methods model sequences of packets over time to classify five-tuple flows [16], but their coarse granularity only allows for binary classification of an entire flow, without the ability to pinpoint specific malicious packets or critical fields. Some studies have suggested combining the two approaches to complement each other’s strengths [17,18]. While fusion-based solutions improve detection speed and overall accuracy (ACC) to some extent, they still fall short of achieving the fine-grained granularity and precision necessary for detecting stealthy ICS attacks. Particularly in highly camouflaged scenarios, accurately locating the malicious packet and understanding its intent remain key challenges.

Additionally, existing representation schemes lack the capacity to perform fine-grained modeling and interpretation of protocol-critical bytes, which hampers interpretability. As a result, current detection systems often fail to provide actionable insights for security analysts during anomaly localization and traceability [19]. Fundamentally, these limitations stem from insufficient granularity and depth in the design of traffic feature representations.

To address these challenges, we propose a novel fine-grained, multi-level industrial intrusion detection model, Avocado. The core innovation of Avocado lies in its fusion of packet- and flow-level features. Specifically, a packet feature extraction module is designed to effectively represent each individual packet, while a flow-level fusion module jointly models the packet with its surrounding context in the sequence. This enables per-packet anomaly detection and precise localization, thereby overcoming the low ACC of traditional packet-level methods and the coarse granularity of flow-level classification. Furthermore, Avocado introduces a shared-query multi-head self-attention mechanism to provide byte-level interpretability in ICS traffic, allowing the importance of key bytes within each packet to be quantitatively evaluated. This interpretability offers significant value for security personnel in anomaly localization and root-cause analysis. Overall, Avocado balances detection performance with interpretability and demonstrates strong effectiveness against complex and stealthy ICS attacks.

The main contributions of this paper are as follows:

(1): We propose Avocado, a novel fine-grained intrusion detection framework for ICS. It constructs multi-level byte–packet–flow representations to enhance the ACC and granularity of stealthy attack detection.
(2): A shared-query attention mechanism is introduced to achieve byte-level interpretability in ICS traffic. Compared to conventional attention mechanisms, it enables quantification of individual byte contributions to the classification results, improving localization and behavioral analysis capabilities.
(3): We conduct comprehensive evaluations of Avocado against seven state-of-the-art methods using public datasets NGAS [20] and CLIA-M221 [8]. Experimental results show that Avocado improves ACC by an average of 1.55%, and reduces FPR and FNR to 3.2% and 3.6% (NGAS), and 3.7% and 4.3% (CLIA-M221), outperforming existing mainstream approaches. The source code and preprocessing scripts are publicly available at https://github.com/MysteryObstacle/Avocado.git (accessed on 21 October 2025).

The rest of the paper is organized as follows: Section 2 reviews related work on ICS intrusion detection. Section 3 presents the motivation and objectives. Section 4 discusses common attack scenarios. Section 5 describes the proposed method in detail. Section 6 provides experimental results and analysis. Section 7 discusses the model’s limitations. Section 8 concludes the paper.

2. Related Work

Intrusion detection research for ICS can be broadly classified into three categories: packet-level representation, flow-level modeling, and packet-flow fusion methods. These approaches differ in terms of feature granularity and modeling capabilities, and have explored trade-offs between detection ACC, real-time performance, and interpretability. However, they still exhibit notable shortcomings in handling highly covert attacks and achieving fine-grained traceability.

2.1. Limitations of Existing Detection Paradigms

Packet-level representation methods focus on analyzing the static structure or content of individual packets and are typically used to identify abnormal communication patterns. Early work by Ke Wang et al. [21] employed high-order n-grams and Bloom filters to model packet payloads. Later approaches introduced machine learning techniques to classify carefully engineered statistical features. For instance, Osho et al. [11] used PCA and Decision Trees to model packet-level traffic characteristics, while Anton et al. [13] applied SVM and Random Forests to classify extracted control field features such as CRC and Setpoint values, thereby improving detection performance. However, these methods rely heavily on manual feature engineering, making them less adaptable to complex, multi-field, and highly disguised traffic commonly seen in ICS environments. While effective in detecting attacks with overt payload anomalies, they fail to incorporate contextual semantics and struggle with identifying covert behaviors that span across multiple packets.

To better capture contextual dependencies, flow-level modeling approaches analyze the temporal structure of entire communication flows. Huda et al. [22] proposed a deep belief network (DBN)-based intrusion recognition framework, which adapts well to flow characteristics over long time windows. Aouedi et al. [23] introduced a federated semi-supervised learning framework that integrates multi-source data while preserving privacy. Yang et al. [24] developed a flow-level model combining Improved Grey Wolf Optimization (IGWO) with GRU to optimize both network architecture and learning rate. Despite these advancements, flow-level methods are limited to coarse-grained binary classification, determining only whether a flow contains an attack, without the capability to locate specific abnormal packets or fields.

Packet–flow fusion methods aim to combine the fine-grained detail of packet-level features with the contextual understanding of flow-level modeling, thereby improving both detection coverage and semantic interpretation. For example, Wang et al. [25] proposed a hierarchical detection model that integrates LeNet-5 and GRU. The former extracts packet features in grayscale image format, while the latter captures temporal sequence dependencies, demonstrating improved performance from the fusion of packet and flow information. However, most fusion-based approaches primarily focus on enhancing classification ACC, and have not yet resolved the cross-granularity localization or byte-level interpretability challenges. As a result, their effectiveness in detecting highly covert industrial attacks remains limited.

2.2. Interpretability in ICS Intrusion Detection

On the topic of model interpretability, recent research has begun incorporating explainable AI (XAI) mechanisms [26] into ICS intrusion detection tasks [19]. Liu et al. [27] introduced the SHAP algorithm [28] into a federated multimodal detection framework to improve interpretability at the sensor level. Khan et al. [29] combined autoencoders with convolutional temporal networks to perform flow-level attack detection and explained model decisions via feature importance weights. However, such methods mainly address feature-level attribution, and cannot pinpoint specific data packets or protocol bytes, which limits their practical utility in industrial scenarios requiring precise anomaly localization and semantic field interpretation.

2.3. Critical Analysis and Identification of Research Gaps

In summary, while existing detection methods have advanced in overall performance, they face two major challenges when addressing ICS attacks characterized by strong camouflage and intricate field dependencies:

(1): Granularity imbalance between packet-level and flow-level detection: Current fusion strategies lack effective mechanisms to simultaneously capture fine-grained structural features and long-term temporal dependencies.
(2): Insufficient interpretability: Most existing models are unable to provide byte-level attribution or understand protocol semantics, limiting their ability to assist security personnel in accurate incident tracing and response.

By introducing a packet–flow feature fusion mechanism and a shared-query vector attention module, the proposed Avocado model directly addresses these two issues. It significantly improves the detection ACC for highly disguised attacks while also enabling fine-grained interpretability at the byte level.

3. Motivation and Objective

ICS cybersecurity is facing increasingly complex and highly covert threats. Through communication that closely mimics legitimate business behavior in both content and timing, attackers can evade traditional detection mechanisms and stealthily gain control without disrupting the overall production process. As previously discussed, such attacks often split malicious control logic into multiple atomic-level packets [8], which are then injected into periodic background flows for execution. Traditional packet-level inspection [11,13,21] struggles to capture contextual semantics, while flow-level analysis can detect overall temporal anomalies but fails to pinpoint specific malicious packets or protocol fields [22,23,24,25]. Existing packet–flow fusion strategies primarily focus on improving classification ACC, yet lack an effective mechanism to bridge detection granularity with contextual modeling capability.

Moreover, most current detection approaches lack byte-level interpretability. Given the complexity and customization of industrial protocols, a binary anomaly prediction (“anomalous” or “not”) is far from sufficient to support practical security response or behavioral traceability [19]. Existing XAI methods typically analyze the importance of sensor data [27] or manually crafted features [29], but fail to provide insight into key fields within protocol structures. As a result, even when anomalies are successfully flagged, the model cannot indicate which bytes are responsible, limiting the system’s usability in real-world forensic analysis.

Based on these observations, the core motivations of this work can be summarized as follows:

First, we aim to break the granularity bottleneck by enabling joint modeling of packet-level and flow-level characteristics. Packet-level detection excels at capturing local spatial structures, while flow-level modeling is better suited for learning temporal dependencies across packets. These strengths are inherently complementary. We design a fusion architecture that aggregates contextual flow-level information using a multi-head self-attention mechanism, allowing each packet to be evaluated in the context of its surrounding sequence, thereby enhancing the model’s ability to detect malicious packets (see Section 6.2 for a comparison with advanced methods). This design also demonstrates robust performance under short-sequence conditions (see Section 6.4 for experimental results).

Second, we introduce byte-level interpretability to enhance the transparency and practicality of the model. A novel shared-query vector attention mechanism is proposed to explicitly label the contribution of individual bytes within a packet to the model’s prediction. This enables field-level localization in complex protocol structures and shows promising results in detecting highly covert attacks (see Section 6.5 for details).

In summary, Avocado is designed to provide an industrial intrusion detection framework that integrates fine-grained anomaly detection with semantic interpretability. It not only achieves high ACC and precise granularity, but also delivers localization and explainability, offering strong technical support for the detection and response to sophisticated ICS threats. The following sections detail the proposed model architecture and its performance evaluation on representative ICS datasets.

4. Attack Scenarios

To clarify the key detection challenges addressed by the proposed model [30], this section focuses on the field control network on the OT side, based on the hierarchical structure of the classic Purdue model. We analyze five representative network-level attack scenarios and highlight the subtle traffic-level characteristics that are often overlooked in detection tasks.

4.1. Industrial Control Network Architecture

A typical ICS network consists of two major components: the Operational Technology (OT) network and the Information Technology (IT) network. The OT network is responsible for the real-time operation and control of industrial devices, such as Programmable Logic Controllers (PLCs) [31], while the IT network handles data storage, information systems, and enterprise-level applications. These systems are usually structured into multiple layers, including enterprise-level networks, process control networks, and field control networks, each with distinct functional responsibilities [14,32].

This study targets the field control layer, which directly connects production equipment with operator interfaces and is often the primary focus of cyber-attacks. We assume that the attacker has already infiltrated the ICS network perimeter and analyze five typical network-level attacks (excluding physical access or insider threats), assessing their potential impact on system operations.

4.2. Attack Techniques

To illustrate the necessity of the proposed model architecture, we select five attack types included in the experimental datasets and analyze their key manifestations across different granularity levels in the network traffic—byte, packet, and flow—as shown in Table 1.

From the above analysis, it is evident that ICS attacks typically exhibit three layers of abnormal characteristics:

Byte-level anomalies refer to irregularities in specific bytes within packets, such as corrupted CRC checks or spoofed response bytes.
Packet-level anomalies concern structural inconsistencies, such as forged control commands or partial malicious logic within individual packets, which may appear legitimate in isolation but deviate from normal patterns.
Flow-level anomalies emerge from abnormal temporal patterns and inter-packet behaviors, including high-frequency scanning or multi-step, delayed injection schemes that unfold across multiple packets.

To address these detection challenges, the proposed Avocado model employs a multi-module design:

The byte-level interpretability module dynamically assigns weights to packet bytes, effectively highlighting those most indicative of malicious activity.
The packet-level feature extractor leverages Convolutional Neural Networks (CNNs) to capture local spatial patterns and internal structural deviations.
The flow-level fusion module applies a multi-head self-attention mechanism to integrate temporal dependencies and contextual relationships between packets, significantly enhancing the detection of cross-packet behaviors.

Through this layered and collaborative architecture, Avocado is capable of accurately detecting a wide spectrum of ICS-specific attack behaviors—from reconnaissance to control logic injection—while providing interpretable insights into the underlying anomalies.

5. Methodology

5.1. Workflow

Through this layered and collaborative architecture, Avocado is capable of accurately detecting a wide spectrum of ICS-specific attack behaviors—from reconnaissance to control logic injection—while providing interpretable insights into the underlying anomalies, as shown in Figure 2.

Offline stage: First, real industrial control system (ICS) traffic is collected and labeled based on expert analysis. The labeled dataset is then divided into training, validation, and testing subsets. The training phase involves learning representations at multiple granularities—byte-level feature extraction, packet-level pattern learning, and flow-level temporal modeling (see Section 5.3 for details). After optimization, the trained model is deployed as the core detection engine.

Online stage: In real-time deployment, the system captures live network traffic through a sniffer and converts it into the appropriate input format via the preprocessing module (see Section 5.2). The model then performs per-packet classification. For each packet, the model outputs an anomaly score along with byte-level interpretability information, which is forwarded to the security operations center (SOC) for further analysis and response.

5.2. Preprocessing

Preprocessing plays a crucial role in converting raw network traffic into a format suitable for deep learning models. To ensure high detection ACC and operational efficiency, a six-step pipeline is adopted:

Flow division: Traffic is divided according to the five-tuple (source IP, destination IP, source port, destination port, and protocol type), uniquely identifying each network connection. In ICS environments, communication patterns between devices (e.g., PLC and HMI) are often distinctive. Five-tuple-based separation enables connection-specific analysis, which improves anomaly detection precision by isolating the traffic context of each device pair.

Flow filtering: Only traffic associated with specific industrial devices is retained. ICS traffic generally follows fixed communication routines, such as predefined instruction sets or polling cycles. By filtering out irrelevant traffic (e.g., internal IT traffic), data dimensionality and processing complexity are greatly reduced, enabling the model to focus on critical ICS flows. This step enhances both detection efficiency and ACC by reducing noise and false positives.

Packet filtering: Packets that are unrelated to ICS control commands—such as DNS, SYN, ACK, or FIN packets—are filtered out. These packets primarily serve connection management and usually do not contain payloads relevant to attack detection. While some attacks may exploit such packets, under the assumed threat scenarios of this study, filtering them helps to extend modeling over longer control flows. This step should be customized based on the specific attack context to ensure that important signatures are not inadvertently removed.

Packet cleaning: Non-essential metadata such as Ethernet headers are removed, retaining only the payload portion of each packet. This avoids overfitting to deployment-specific configurations and enhances the model’s generalization across different ICS environments. Focusing on payload semantics ensures that the model learns protocol- and application-level features relevant to intrusion detection.

Interception and zero-padding: Packets are either truncated (if too long) or zero-padded (if too short) to achieve a fixed-length input format. Consistent input dimensions are critical for deep learning models to facilitate batch processing, memory management, and training stability. This step ensures that the model can handle varied packet lengths without compromising performance.

Sliding window technique: An innovative sliding window strategy is introduced to enhance sample diversity and robustness. A window of size N is moved across the stream data with step size s, grouping every N consecutive packets as a training instance. This technique brings several benefits:

Sample amplification: Due to the limited size of most ICS datasets, sliding windows significantly increase the number of effective training samples, improving generalization.
Robustness to attack position: In single-packet attacks, malicious packets can appear at any position. Fixed grouping could bias detection performance depending on packet position. The sliding window mitigates this by exposing the model to varying spatial–temporal contexts.
Real-time responsiveness: During inference, the model can process traffic on-the-fly without waiting for a complete sequence. Classification can be triggered as soon as the required number of packets is received, enhancing response latency and operational efficiency.

Through this comprehensive preprocessing pipeline, raw ICS traffic is transformed into a standardized input suitable for deep learning-based detection. The design ensures retention of key semantic features while improving generalization and computational efficiency. In practice, especially for flow filtering and packet filtering, fine-tuning may be required depending on the attack type and ICS protocol to avoid excluding useful attack indicators.

5.3. Training

ICS networks are increasingly exposed to complex and diverse security threats. Attack vectors range from byte-level manipulations within packets to time-series characteristics across flows, posing severe challenges to IDS. Based on the discussion above, current IDS methods suffer from two critical limitations:

First, existing methods lack byte-level interpretability, making it difficult to effectively identify abnormal key bytes within packets. This severely hampers the ability of security personnel to trace attack sources and analyze abnormal behaviors—especially in advanced attacks where malicious payloads are often hidden in seemingly normal data structures, greatly increasing detection complexity.

Second, current methods struggle to effectively balance between packet-level granularity and global flow-level temporal modeling. For advanced cross-packet attacks—such as fragmented malicious logic injections—most traditional techniques rely solely on static packet-level feature extraction, ignoring the dynamic temporal correlations across packets. This limitation leads to insufficient temporal modeling capabilities and poor global action recognition.

To address these challenges, we propose the Avocado model, which integrates the following core modules to overcome the above issues:

Byte-level interpretability: To tackle fine-grained byte-level anomalies, Avocado incorporates a Byte-Level Interpretable Module (see Section 5.3.1), introducing a novel shared query vector attention mechanism that dynamically assigns importance weights to individual bytes. This module not only highlights abnormal key bytes but also provides interpretable visualizations to assist security personnel in tracing and analyzing anomalies.
Packet-flow feature fusion: To bridge the gap between packet-level representation and flow-level temporal dynamics, Avocado integrates both a Packet-Level Feature Extraction Module (see Section 5.3.2) and a Flow-Level Feature Fusion Module (see Section 5.3.3). The packet-level module uses CNNs to extract local spatial features from each packet. This choice is based on empirical findings from [14], which demonstrate that in ICS environments—where data packets typically have short and fixed lengths—CNNs offer superior performance in capturing protocol-level patterns while avoiding gradient vanishing or explosion issues that often affect RNN-based models. The flow-level module then employs a multi-head self-attention mechanism to capture inter-packet temporal dependencies, enabling deep fusion of both spatial and sequential features for more accurate and fine-grained detection.

As shown in Figure 3, by combining these modules, Avocado achieves multi-level anomaly detection from bytes to flows, demonstrating significant advantages in addressing the key limitations of existing IDS.

5.3.1. Byte-Level Interpretable Module

In ICS environments, the byte-level interpretability of data packets is crucial for security personnel to conduct post-incident analysis and ensure accurate anomaly detection. However, existing IDS often lack fine-grained analysis of byte-level features, making it difficult to effectively identify potentially malicious bytes within a packet.

To address this issue, Avocado introduces a self-attention mechanism based on a shared query vector, which dynamically captures the weight distribution of each byte in the packet. This mechanism clearly highlights the contribution of each byte to the detection result and significantly improves model interpretability.

Unlike standard self-attention, which computes pairwise relationships between all byte positions, the shared-query attention in Avocado uses a unified query vector to assess how much each byte contributes to the overall packet representation. This design focuses the model’s attention on globally relevant bytes, improving attribution clarity and interpretability at the byte level.

Triple embedding: For each packet byte sequence after preprocessing, the Avocado model first embeds each byte into a multidimensional space to capture different types of information. This embedding consists of three components:

Raw Byte Embedding: Encodes the original byte value (0–255), capturing semantic patterns of protocol fields such as function codes, register addresses, and CRC values.
Position Embedding: Reflects the absolute position of the byte within the packet. This is important because many ICS protocols follow fixed field layouts, and the meaning of a byte often depends on its location.
Segment Embedding: Differentiates between valid content bytes and zero-padded bytes (used to standardize packet length). This prevents the model from attributing importance to artificial padding that carries no semantic information.

This embedding design is motivated by the structural characteristics of ICS traffic, where payloads are often short, structured, and position-sensitive. By fusing these three embeddings, the model can more accurately capture the spatial patterns of meaningful protocol structures and avoid the influence of irrelevant noise or padding.

Assume the preprocessed byte sequence of each packet has a length of m. The embedding matrices are defined as follows:

Raw Byte Embedding: After preprocessing, each packet is represented as a one-dimensional byte embedding with an embedding dimension of 1. The raw byte embedding matrix is defined as:

$E_{b y t e} \in R^{1 \times m}$

(1)
Position Embedding: Since the positional order of bytes within a packet strongly affects their semantic meaning, a one-dimensional position embedding is used to encode byte positions. The position embedding matrix is defined as:

$E_{p o s} \in R^{1 \times m}$

(2)
Period of Embedding (Segment Embedding): Segment embedding is used to differentiate valid bytes from zero-padded bytes, ensuring that padding does not contribute to feature learning. The segment embedding matrix is defined as:

$E_{s e g} \in R^{1 \times m}$

(3)

Finally, each byte is represented as a three-dimensional vector after combining these embeddings, and the complete packet representation is expressed as:

E \in R^{3 \times m}

(4)

Self-attention Mechanism for Shared Query Vectors: Self-attention mechanism has achieved remarkable results in the field of natural language processing. Generally, self-attention generates weights by calculating the relationship between each byte and other bytes. In intrusion detection, however, it is more meaningful to consider the contribution of each byte of the packet to the overall test result. But Avocado, therefore, puts forward the attention mechanism of sharing query vector, by sharing the query vector, to generate the importance weight of each byte.

Compute the $K$ , $Q$ , and $V$ matrices: The embedding matrix $E$ is transposed and passed through three learnable linear projections to generate the key ( $W_{K}$ ), query ( $W_{Q}$ ), and value ( $W_{V}$ ) matrices:

${K = E^{T} W}_{K}, {Q = E^{T} W}_{Q}, {V = E^{T} W}_{V}$

(5)

where $Q, K, V \in R^{m \times d}$ , and $d$ is the hidden dimension.
Generate the Shared Query Vector: The shared query vector is computed by averaging the query matrix along the byte dimension:

$Q_{p a c k e t} = \frac{1}{m} \sum_{i = 1}^{m} Q_{i}$

(6)

where $Q_{p a c k e t} \in R^{1 \times d}$ .
Shared query vector weight calculation: The shared query vector is multiplied by the key matrix and normalized using the Softmax function to generate attention weights for each byte:

$W_{B} = S o f t m a x (\frac{Q_{p a c k e t} K^{T}}{\sqrt{d_{k}}})$

(7)

where $W_{B} \in R^{1 \times m}$ , and $d_{k}$ is the embedding dimension of the key matrix.
The output calculation: The attention weights are applied element-wise to the value matrix to obtain the weighted byte representation:

$Z = V ⨀ W_{B}^{T}$

(8)

Then, a linear transformation followed by residual addition with the transposed original embedding is performed:

{B = Z W}_{O} + E^{T}

(9)

where

B \in R^{m \times 3}

maintains the same shape as the original input.

By combining triple embedding with a shared query vector attention mechanism, the Byte-Level Interpretable Module adaptively assigns importance scores to each byte in the packet. This approach not only improves the model’s capability in detecting complex attacks, but also provides high interpretability, enabling security analysts to quickly locate the critical byte positions involved in the intrusion, and to better understand and investigate abnormal packets.

5.3.2. Packet-Level Feature Extraction Module

This module is designed to perform deep feature extraction on each individual packet. Given that each packet in an ICS (Industrial Control System) network exhibits complex spatial and temporal correlations, Convolutional Neural Networks (CNNs) are employed to effectively capture these local patterns—particularly the inter-byte dependencies within the packet structure.

By leveraging CNNs for spatial feature extraction, the model not only improves its representational capability of individual packets, but also benefits from parameter sharing inherent in CNNs, which reduces computational complexity and meets the requirements of real-time intrusion detection in ICS environments.

Input transformation: Before applying convolutional operations, the byte embedding matrix output from the Byte-Level Interpretable Module

B \in R^{m \times 3}

must be reshaped to match the input format expected by the CNN. Specifically, a Permute operation is applied to transform the dimensions into:

B' \in R^{3 \times 1 \times m}

. Here, the dimension “3” corresponds to the original byte embedding channels, and “1” represents a virtual height dimension, introduced to preserve the one-dimensional nature of the convolution.

One-dimensional convolution layer: Subsequently, the transformed data passes through two successive one-dimensional convolutional layers, each extracting spatial-local features:

$C_{1}$ and $C_{2}$ : Number of output channels (filters) for the first and second convolutional layers;
$k_{1}$ and $k_{2}$ : Kernel sizes;
$s_{1}$ and $s_{2}$ : Strides.

The convolution operations are defined as:

X_{1} = C o n v 1 D (B', C_{1}, k_{1}, s_{1})

(10)

X_{2} = C o n v 1 D (X_{1}, C_{2}, k_{2}, s_{2})

(11)

Max Pooling layer: After the convolutional operations, the output is passed through a Max Pooling layer to further extract critical features:

X = M a x P o o l (X_{2})

(12)

Max pooling reduces redundant information by shrinking the feature map size, improving computational efficiency while preserving the most prominent features.

Feature Flattening: The pooled features are then flattened into a one-dimensional vector to obtain a compact packet-level feature representation:

P = F l a t t e n (X)

(13)

Through this module, the model leverages convolutional neural networks to extract local spatial features within each packet. Max pooling is then used to reduce dimensionality effectively, and flattening prepares the features for subsequent processing by the flow-level feature fusion module.

The convolutional layers capture the spatial dependencies in byte sequences, and when combined with pooling and flattening, produce a compact, high-dimensional representation. This design enhances both the expressive power of the extracted features and the computational efficiency of the model, making it well-suited for real-time ICS intrusion detection tasks.

5.3.3. Flow-Level Feature Fusion Module

The objective of this module is to enhance the model’s capability to detect advanced industrial control network attacks by fusing temporal features across packets. Single-packet detection often proves ineffective against sophisticated attacks in which malicious payloads are either fragmented or embedded within seemingly normal traffic. Temporal characteristics of the data stream play a critical role in identifying such threats.

Attackers may scatter malicious code across multiple packets, making each payload appear indistinguishable from normal ones. Moreover, they may insert seemingly benign packets at critical stages of industrial processes, further complicating detection. Traditional methods struggle to accurately identify such attacks due to a lack of temporal context modeling.

To effectively capture these temporal dependencies, this study adopts a Multi-Head Self-Attention Mechanism in the Flow-Level Feature Fusion Module. The input to this module is a sequence of packet-level features extracted by the Packet-Level Feature Extraction Module. Suppose there are

n

consecutive packets in a group, and each packet feature has a hidden dimension of

d_{p}

. Then the input is:

F \in R^{n \times d_{p}}

(14)

The self-attention mechanism calculates inter-packet correlations using matrices of queries (

Q

), keys (

K

), and values (

V

). Unlike Section 5.3.1, where a shared query vector is used, here each packet has its own query vector, forming a query matrix

Q

. The attention mechanism computes the relevance between each pair of packets, allowing the model to capture potential temporal patterns across the entire sequence. This enables the model to incorporate flow-level information into each individual packet’s feature representation, facilitating accurate detection of complex attack behaviors.

The query, key, and value matrices are calculated as follows:

{K = F W}_{K}, {Q = F W}_{Q}, {V = F W}_{V}

(15)

where

W_{K}

,

W_{Q}

, and

W_{V}

are learnable weight matrices.

The attention-weighted feature matrix

Z

is computed using the scaled dot-product attention:

Z = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(16)

Here,

d_{k}

is the dimension of the key vectors, used for scaling to prevent excessively large inner product values.

Each matrix

Z

corresponds to a single attention head. The multi-head attention mechanism enhances the model’s capacity to handle complex temporal structures by computing multiple independent attention heads in parallel. Each head focuses on different temporal aspects or dependencies among packets, thereby enriching the overall representation.

The outputs from all attention heads are concatenated and linearly transformed, followed by a residual connection with the original input

F

. The final output is calculated as:

F' = M u l t i H e a d (F) + F = C o n c a t (Z_{1}, …, Z_{h}) W_{O} + F

(17)

where

F' \in R^{n \times d_{p}}

has the same shape as the input

F

,

h

is the number of attention heads, and

W_{O}

is a learnable weight matrix for linear projection.

5.3.4. Detection Result Output Module

The objective of this module is to compute the classification probability label for each packet based on the feature matrix

F' \in R^{n \times d_{p}}

generated by the Flow-Level Feature Fusion Module, where each packet feature vector is denoted as

P' \in R^{1 \times d_{p}}

.

To classify each packet’s feature vector, the model utilizes a set of Fully Connected Neural Networks (Feedforward Neural Networks, FFNNs) to perform nonlinear transformation and further feature extraction. The FFNN module enhances the expressive capability of the model and helps it capture deeper feature relationships.

Fully connected network: Each packet’s feature vector $P'$ is passed through several fully connected layers sequentially. The output of the $k$ -th fully connected layer is given by:

$H_{k} = σ (H_{k - 1} W_{k} + b_{k})$

(18)

where
o
$H_{k - 1}$ is the output from the previous layer;
o
$W_{k} \in R^{d_{k - 1} \times d_{k}}$ is the weight matrix of the $k$ -th layer;
o
$b_{k}$ is the bias term;
o
$σ$ denotes the activation function.
The first fully connected layer takes the fused features from $F'$ as input, and the output dimension of the final layer matches the number of target classes in the classification task.
Softmax: After passing through the fully connected layers, the final output is fed into a Softmax activation function, which maps each packet’s feature vector to a probability distribution over the classification categories. After applying Softmax, the output is a probability vector $p \in R^{1 \times C}$ for each packet, indicating the likelihood of belonging to each class.
Final classification results: For each packet, the final predicted label is determined by selecting the class with the highest probability:

$\hat{y} = a r g m a x (p)$

(19)

where $\hat{y}$ is the predicted class label.
Additionally, as mentioned in Section 5.3.1, $W_{B} \in R^{1 \times m}$ represents the interpretability weights of each byte in the corresponding packet.

5.3.5. Model Enhancement

To further improve the performance of the model, Avocado introduces several enhancement mechanisms aimed at optimizing feature extraction, stabilizing the training process, and enhancing the generalization capability. The specific enhancement strategies are as follows:

Normalized layer: Normalization helps balance the numerical scales of features, leading to more stable training and faster convergence. Avocado incorporates layer normalization into multiple modules:
o
In the Byte-Level Interpretable Module (Section 5.3.1), the byte embedding matrix $E \in R^{3 \times m}$ is normalized to ensure that the resulting byte attention weights $W_{B} \in R^{1 \times m}$ are more reasonable and to avoid attention being overly concentrated on a few specific bytes.
o
In the Packet-Level Feature Extraction Module (Section 5.3.2), each packet feature vector $P = R^{1 \times d_{p}}$ is normalized to enhance the stability of packet-level features.
o
In the Flow-Level Feature Fusion Module (Section 5.3.3), the flow-level feature matrix $F' \in R^{n \times d_{p}}$ is normalized to ensure the effective fusion of temporal features.
Dropout: Dropout has been used to prevent model fitting, to improve the generalization ability. It was introduced into the key module of Avocado, but using the Dropout mechanism, neurons randomly discarded parts, meaning the model can better adapt to different input data distribution, avoiding excessive dependence on specific characteristics. Specifically, Dropout is applied in the process of sharing the query vector calculated by the attention mechanism, preventing the model from relying too much on byte-level characteristics; after the maximum pool shell is used for package-level feature extraction, it enhances the robustness of model; in the long-sequence attention mechanism, it avoids relying too much on some models in capturing the temporal relationship model; and in all the connection layers (FFNNs), it further enhances the generalization ability of the model with different input scenarios. Due to the location of the Dropout operation, the model can maintain higher flexibility and stability in each phase of feature extraction.
A one-dimensional parallel convolution: In the pact-level feature extraction module (Section 5.3.2), the text uses a two-dimensional convolution to implement a one-dimensional parallel convolution operation on a set of packets. Since each group of window input contains multiple consecutive packets in ICS networks, the byte sequence of each packet can be processed in parallel to capture the local pattern between adjacent bytes by controlling the shape of the 2D convolution kernel. This operation can enhance the convolution layer feature extraction ability, reduce the calculation cost and improve efficiency.

6. Experiments and Results Analysis

6.1. Experimental Setup

To comprehensively evaluate the intrusion detection performance of the proposed Avocado model in industrial control scenarios, a series of experiments were conducted. Evaluations were performed on two publicly available datasets (see Section 6.1.2), covering five representative types of ICS attacks. The effectiveness of Avocado was compared against six state-of-the-art industrial intrusion detection methods (see Section 6.2). Additionally, the rationality of the sliding window preprocessing scheme was validated by varying the step size (see Section 6.3). An ablation study was conducted to investigate the impact of different flow-level feature fusion strategies on short packet sequences (see Section 6.4). Finally, we assessed the byte-level interpretability performance of Avocado (see Section 6.5).

6.1.1. Experimental Environment

All experiments were conducted on a workstation equipped with an NVIDIA RTX 4090 GPU and 128 GB RAM. The model was trained using the AdamW optimizer with weight decay regularization to prevent overfitting, and early stopping was applied to monitor convergence. A cosine learning rate schedule was adopted for stable optimization. To ensure reproducibility, all random processes were fixed with the same seed. The complete set of hyperparameters is summarized in Table 2.

6.1.2. Datasets

To evaluate the effectiveness and generalization capabilities of Avocado in ICS attack detection, we selected two open-source datasets, which together encompass five major categories of attacks (as introduced in Section 4.2) and 37 distinct attack techniques.

Dataset1: Control Logic Injection Attacks on Schneider Modicon M221 (CLIA-M221)

The data level was released in 2019 at the University of New Orleans on PLC remote control logic injection attack datasets, including 22 from engineer station upload the normal program flow control logic, 29 through traditional inject attack traffic control logic, 29 through debris fill injection attack traffic noise. Through this dataset, the performance of various ICS anomaly traffic detection schemes can be compared more effectively when facing highly covert industrial control attacks. Corresponding attack classification and data statistics are shown in Table 3.

Dataset2: The New Natural Gas Pipeline Control System Dataset (NGAS)

The NGAS dataset is a revised version of the natural gas pipeline control dataset released by Mississippi State University in 2015 [37]. It improves upon the earlier dataset proposed by Morris et al., which contained unrealistic patterns—such as fixed destination addresses for certain attacks (e.g., address 4 for DoS attacks, address 19 for scanning)—that led to models achieving overly optimistic results. The revised NGAS dataset corrects these anomalies and better reflects realistic ICS behavior, offering a more objective benchmark for evaluating intrusion detection models.

The NGAS dataset includes four major attack categories: response injection, command injection, DoS, and reconnaissance attacks, encompassing 35 specific attack instances. The dataset statistics are summarized in Table 4.

6.2. Comparison with the State-of-the-Art Methods

In order to evaluate the comprehensive performance of the proposed threat awareness model in the intrusion detection task of industrial control systems, especially its advantages over the current state-of-the-art methods, seven representative comparison methods are selected. Included are the traditional machine learning method (DT [11], SVM [12,13], mainstream deep learning approach (GRU [24,25]), GRU combined with intelligent optimization algorithm to improve the model (IGWO-GRU), integration, deep belief network model (DBN [22]), and federal a semi-supervised learning method (FL-AE [23]). The above methods have shown strong performance in a number of actual industrial scenarios, which are representative and advanced, and constitute an effective reference for the performance of the model in this paper.

Experiments are carried out on two typical industrial datasets (NGAS [20] and CLIA-M221 [8]) to reflect the general attack recognition ability and high covert attack detection ability, respectively. For a comprehensive evaluation, we report multiple performance metrics, including ACC, False-Positive Rate (FPR), False-Negative Rate (FNR), Precision (P), Recall (R), and F1-score (F1). The results are summarized in Table 5, and the corresponding confusion matrices are visualized as percentage heatmaps in Figure 4 (NGAS) and Figure 5 (CLIA-M221).

On the NGAS [20] dataset, the proposed model achieves an ACC of 95.7%, which is the highest among all baselines. Importantly, it also achieves the best balance between precision, recall, and F1-score, indicating that it can not only reduce false alarms but also avoid missing stealthy attacks. Some attacks in NGAS (e.g., DoS, CRC anomalies) exhibit strong protocol-level signatures, where even traditional methods perform reasonably well; however, by introducing a byte-level attention mechanism and CNN-based packet feature extraction, our model maintains fine-grained recognition while further lowering both FPR and FNR.

On the CLIA-M221 [8] dataset, which mainly includes highly covert control logic injection attacks, the superiority of our model is even more evident. Traditional methods suffer significant ACC drops (e.g., DT only 80.4%), whereas our model, with its flow-level feature fusion, effectively captures temporal dependencies across fragmented packets. It achieves 94.5% ACC with an F1-score of 96.9%, while FPR and FNR are reduced to 3.7% and 4.3%, respectively. This demonstrates that the model not only improves overall detection ACC but also achieves strong consistency across all key metrics (ACC, P, R, and F1), verifying its practical value in detecting covert attacks in industrial control systems.

6.3. Comparison Experiment of Flow Window Step Size

To verify the effectiveness of the sliding window preprocessing strategy proposed in Section 5.2—particularly the impact of different step sizes on model performance—this section conducts comparative experiments based on the Avocado model using the NGAS dataset. Four sliding window configurations with step sizes of 1, 4, 7, and 10 are evaluated in terms of training ACC and convergence speed.

As shown in Figure 6, the 1-step sliding window configuration achieves the best performance in both training ACC and convergence speed. As the step size increases (e.g., 4-step, 7-step, 10-step), both the training ACC and the convergence rate noticeably decline. Larger step sizes lead to a reduction in the number of training samples, resulting in the model being less capable of capturing the nuanced variations in attack patterns during training, which ultimately impacts detection performance.

The core advantage of the sliding window technique lies in its ability to generate a large number of new samples by gradually sliding over the traffic stream, effectively expanding the dataset. This enhances the model’s generalization ability and robustness to packet position variations within flows. However, the choice of step size must balance between sample volume and computational efficiency. The quantitative impact of step size on sample generation and computational cost is described by the following formulas:

T = \frac{N - W}{s} + 1

(20)

where

o: $N$ : Number of packets in the dataset;
o: $W$ : Window size;
o: $s$ : Step size;
o: $T$ : Total number of generated samples.

Assuming the cost to generate one sample is

C_{s a m p l e}

, the total computational cost is:

C_{t o t a l} = T C_{s a m p l e}

(21)

As seen from the formulas, with an increasing step size s, the total number of samples T decreases, leading to lower computational cost. However, this may cause insufficient sample coverage, thereby affecting the detection performance. Experimental results further confirm that the 1-step sliding window strategy, by generating more fine-grained samples, enhances the model’s ability to capture attack features and significantly improves performance in complex industrial control network scenarios. These findings suggest that in real-world applications, a smaller step size should be prioritized to accurately detect advanced attacks. Future research may explore adaptive step size strategies, optimizing sliding window parameters based on specific attack scenarios and resource constraints, to achieve an optimal trade-off between detection performance and efficiency.

6.4. Comparison Experiment of Flow Feature Fusion Module

The flow-level feature fusion module is a critical component of the proposed model, designed to capture temporal dependencies between packets and improve the model’s ability to detect complex attack behaviors. To evaluate its effectiveness and adaptability to different temporal modeling strategies, four comparative experiments were conducted on the NGAS [20] dataset, covering representative sequence modeling approaches:

LSTM: A classic temporal modeling technique capable of capturing long-term dependencies within data streams. In this setting, the sequence of packet features extracted from the packet-level module is sequentially fed into LSTM time steps to generate temporal representations. Although effective for long dependencies, LSTM may underperform in detecting rapid, short-term attacks.
GRU: A simplified variant of LSTM with fewer parameters and faster training. GRU performs well in capturing short- and mid-term sequence patterns but is slightly less capable of modeling long-range dependencies.
w/o FFFM: A baseline model that removes the flow feature fusion module, relying solely on packet-level classification. This setup ignores temporal correlations between packets and is used to isolate the contribution of the fusion module.
Full Model: The complete Avocado model with the proposed flow feature fusion module, which computes inter-packet relationships and integrates packet-level features with temporal context. This mechanism effectively captures multi-level temporal patterns, especially suited for short, rapidly changing attacks in ICS environments.

As shown in Figure 7, the experimental results reveal significant differences in both training ACC and convergence speed across the four models:

The w/o FFFM model achieves the lowest training ACC, approximately 89%, and exhibits the slowest convergence. This highlights that relying solely on packet-level features is insufficient to capture temporal dependencies, particularly in the presence of dispersed or covert attacks.
Both LSTM and GRU significantly improve detection performance, reaching approximately 92% and 93% training ACC, respectively. LSTM shows an advantage in modeling long-term dependencies but at the cost of higher computational overhead. GRU, with its simpler structure, offers faster training while maintaining competitive ACC.
However, both LSTM and GRU show limitations in detecting short-term, high-frequency attacks common in ICS networks.
The Full Model, equipped with the proposed multi-head self-attention mechanism, outperforms all other methods with a training ACC of approximately 95%, demonstrating faster convergence and superior modeling of complex, short-duration attacks.

These findings confirm that flow-level feature fusion is essential for accurately detecting sophisticated attack behaviors. While RNN-based methods can capture certain temporal patterns, they often suffer from low training efficiency and limited sensitivity to short-term variations. In contrast, the proposed multi-head attention mechanism can effectively model cross-packet relationships within short sequences, enhancing the detection of stealthy attacks and fulfilling the dual requirements of ACC and real-time performance in industrial control networks.

6.5. Byte-Level Interpretive Ability Effect

As shown in the heatmap results in Figure 8, each row represents an individual packet with its original bytes and the corresponding attention distribution. The first column indicates the packet class, where 0 denotes normal packets, and the following columns correspond to raw bytes. The color intensity reflects the relative importance of each byte in the model’s prediction. The model assigns significantly higher attention weights to the 8th to 12th byte regions across multiple packets. This indicates that the model can accurately focus on critical fields in industrial control protocols, such as function codes and register addresses. Additionally, the attention distribution is concentrated in the earlier packets of the flow, suggesting that the model effectively identifies potential anomaly carriers at an early stage. In contrast, the model assigns consistently low attention weights to invalid or padded bytes (e.g., repeated 0x00 values), effectively filtering out redundant or non-informative content and reducing the risk of false correlations.

Figure 9 shows the per-class byte-level interpretability heatmap corresponding to the categories defined in Table 2 (CLIA-M221 Dataset). The results highlight that different traffic classes exhibit distinct byte-level attention patterns, which demonstrates that the model captures class-specific semantic cues while maintaining robust interpretability across different attack types.

Overall, the byte-level interpretability module not only demonstrates strong attack recognition capabilities, but also provides intuitive and actionable insights for security analysts. It facilitates protocol auditing, attack forensics, and policy refinement, thereby offering a high degree of interpretability and practical applicability in real-world industrial control security scenarios.

7. Discussion

The Avocado model demonstrates outstanding performance in detecting advanced industrial control network attacks, with particular innovations in byte-level interpretability, packet-level feature extraction, and flow-level feature fusion. However, its practical deployment still faces the following challenges:

Limitations in capturing long-range stealthy attacks: Although Avocado employs a sliding window strategy to model local temporal dependencies across packets, its effectiveness may decrease when stealthy attack behaviors span across a sequence of packets longer than the current window length. In such cases, important cross-packet correlations may be missed, potentially reducing detection sensitivity. Future improvements could explore hierarchical temporal modeling or dynamic windowing mechanisms to better capture long-range dependencies without incurring significant computational overhead.

Limitations of byte-level interpretability: Although Avocado uses the attention mechanism to achieve byte-level feature importance annotation, its interpretability relies on the internal weight distribution of the deep model, which may be limited in complex attack scenarios. For example, in hybrid attacks (e.g., command injection combined with time delay attack), the byte weight analysis of a single packet may not fully reveal the attack behavior. Future research can explore combining expert knowledge, building visual attack path graphs, or adopting multi-level explanation methods to enhance the interpretability and intuitiveness of the model.

Adaptability to high level covert attacks: While experiments validate Avocado’s effectiveness against highly covert attacks, the complexity and variability of such attacks impose greater demands on training data. In real-world industrial environments, attack samples are often scarce and costly to label. To address this challenge, future work may consider using Generative Adversarial Networks (GANs) to augment the dataset, or exploring unsupervised learning and adaptive training strategies to reduce the dependence on large-scale labeled data and improve the model’s generalization capability.

8. Conclusions

This paper proposes Avocado, a fine-grained intrusion detection model tailored for advanced industrial control network attacks. By integrating a byte-level interpretability module, a packet-level feature extraction module, and a flow-level feature fusion module, the model significantly enhances both detection ACC and explainability.

Experimental results on multiple public datasets demonstrate that Avocado outperforms existing methods, especially in detecting highly covert and fragmented attacks.

Future research can focus on the following directions:

(1): Enhancing interpretability through multi-granularity explanation techniques that go beyond single-packet views;
(2): Improving the detection of long-range stealthy attacks. Although Avocado utilizes a sliding window to capture local temporal patterns, its effectiveness may degrade when attack behaviors span sequences longer than the current window. Future efforts may explore hierarchical or recurrent fusion strategies to expand the temporal receptive field without incurring excessive computational overhead;
(3): Adaptability to evolving traffic patterns can be increased by incorporating lifelong learning or unsupervised updating mechanisms, thereby ensuring long-term applicability.

The research presented focuses on implementing Avocado, providing novel insights into and technical foundations of the security protection of industrial control systems and lays the groundwork for future developments in intrusion detection for industrial control networks.

Author Contributions

Conceptualization, X.L.; Methodology, X.L. and T.L.; Software, T.L.; Formal analysis, X.L.; Writing—original draft, X.L. and T.L.; Writing—review & editing, X.L. and N.H.; Visualization, T.L.; Supervision, N.H.; Project administration, N.H.; Funding acquisition, N.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Major Key Project of PCL (Grant No. PCL2024A05-1); the Academician Fang Binxing Workstation in Hainan Province, China (Grant No. YSGZZ2023003); and the Specific Research Fund of the Innovation Platform for Academicians of Hainan Province, China (Grant No. YSPTZX202506). The APC was funded by these projects.

Data Availability Statement

Data are contained within the article. The datasets NGAS [20] and CLIA-M221 [8] used in this study are publicly available from their original publications.

Conflicts of Interest

The authors declare no conflict of interest.

References

Goranin, N.; Čeponis, D.; Čenys, A. A Systematic Literature Review of Current Research Trends in Operational and Related Technology Threats, Threat Detection, and Security Insurance. Appl. Sci. 2025, 15, 2316. [Google Scholar] [CrossRef]
Farwell, J.P.; Rohozinski, R. Stuxnet and the future of cyber war. Survival 2011, 53, 23–40. [Google Scholar] [CrossRef]
Khan, R.; Maynard, P.; McLaughlin, K.; Laverty, D.; Sezer, S. Threat analysis of blackenergy malware for synchrophasor based real-time control and monitoring in smart grid. In Proceedings of the 4th International Symposium for ICS & SCADA Cyber Security Research 2016, Belfast, UK, 23–25 August 2016; BCS: Singapore, 2016; pp. 53–63. [Google Scholar]
Beerman, J.; Berent, D.; Falter, Z.; Bhunia, S. A review of colonial pipeline ransomware attack. In Proceedings of the 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), Bangalore, India, 1–4 May 2023; IEEE: New York, NY, USA, 2023; pp. 8–15. [Google Scholar]
de Sá, A.O.; da Costa Carmo, L.F.R.; Machado, R.C. Covert attacks in cyber-physical control systems. IEEE Trans. Ind. Inform. 2017, 13, 1641–1651. [Google Scholar] [CrossRef]
Li, W.; Xie, L.; Wang, Z. Two-loop covert attacks against constant value control of industrial control systems. IEEE Trans. Ind. Inform. 2018, 15, 663–676. [Google Scholar] [CrossRef]
Yoo, H.; Ahmed, I. Control logic injection attacks on industrial control systems. In Proceedings of the IFIP International Conference on ICT Systems Security and Privacy Protection, Lisbon, Portugal, 25–27 June 2019; Springer International Publishing: Cham, Switzerland, 2019; pp. 33–48. [Google Scholar]
Cárdenas, A.A.; Amin, S.; Lin, Z.S.; Huang, Y.L.; Huang, C.Y.; Sastry, S. Attacks against process control systems: Risk assessment, detection, and response. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, Hong Kong, China, 22–24 March 2011; pp. 355–366. [Google Scholar]
Scarfone, K.; Mell, P. Guide to intrusion detection and prevention systems (idps). NIST Spec. Publ. 2007, 800, 94. [Google Scholar]
Alladi, T.; Chamola, V.; Zeadally, S. Industrial control systems: Cyberattack trends and countermeasures. Comput. Commun. 2020, 155, 1–8. [Google Scholar] [CrossRef]
Osho, O.; Hong, S.; Kwembe, T.A. Network intrusion detection system using principal component analysis algorithm and decision tree classifier. In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2021; IEEE: New York, NY, USA, 2021; pp. 273–279. [Google Scholar]
Sandeep, V.; Kondappan, S.; Jone, A.A. Anomaly intrusion detection using svm and c4. 5 classification with an improved particle swarm optimization (I-PSO). Int. J. Inf. Secur. Priv. (IJISP) 2021, 15, 113–130. [Google Scholar] [CrossRef]
Anton, S.D.D.; Sinha, S.; Schotten, H.D. Anomaly-based intrusion detection in industrial data with SVM and random forests. In Proceedings of the 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, 19–21 September 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Li, L.; Fu, Z.; Zou, G.; Mu, Z.; Zhang, Q.; Wang, G.; Wang, P. Survey on methodology of intrusion detection in industrial control system based on artificial intelligence. In Proceedings of the 2022 International Conference on Computers and Artificial Intelligence Technologies (CAIT), Zhejiang, China, 4–6 November 2022; IEEE: New York, NY, USA, 2022; pp. 93–103. [Google Scholar]
AbuHmed, T.; Mohaisen, A.; Nyang, D. A survey on deep packet inspection for intrusion detection systems. arXiv 2008, arXiv:0803.0037. [Google Scholar] [CrossRef]
Umer, M.F.; Sher, M.; Bi, Y. Flow-based intrusion detection: Techniques and challenges. Comput. Secur. 2017, 70, 238–254. [Google Scholar] [CrossRef]
Sperotto, A.; Schaffrath, G.; Sadre, R.; Morariu, C.; Pras, A.; Stiller, B. An overview of IP flow-based intrusion detection. IEEE Commun. Surv. Tutor. 2010, 12, 343–356. [Google Scholar] [CrossRef]
Golling, M.; Hofstede, R.; Koch, R. Towards multi-layered intrusion detection in high-speed networks. In Proceedings of the 2014 6th International Conference on Cyber Conflict (CyCon 2014), Tallinn, Estonia, 3–6 June 2014; IEEE: New York, NY, USA, 2014; pp. 191–206. [Google Scholar]
Mutalib, N.H.A.; Sabri, A.Q.M.; Wahab, A.W.A.; Abdullah, E.R.M.F.; AlDahoul, N. Explainable deep learning approach for advanced persistent threats (APTs) detection in cybersecurity: A review. Artif. Intell. Rev. 2024, 57, 297. [Google Scholar] [CrossRef]
Morris, T.H.; Thornton, Z.; Turnipseed, I. Industrial control system simulation and data logging for intrusion detection system research. In Proceedings of the 7th Annual Southeastern Cyber Security Summit, Huntsville, AL, USA, 3–4 June 2015. [Google Scholar]
Wang, K.; Parekh, J.J.; Stolfo, S.J. Anagram: A content anomaly detector resistant to mimicry attack. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection, Hamburg, Germany, 20–22 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 226–248. [Google Scholar]
Huda, S.; Yearwood, J.; Hassan, M.M.; Almogren, A. Securing the operations in SCADA-IoT platform based industrial control system using ensemble of deep belief networks. Appl. Soft Comput. 2018, 71, 66–77. [Google Scholar] [CrossRef]
Aouedi, O.; Piamrat, K.; Muller, G.; Singh, K. Federated semisupervised learning for attack detection in industrial internet of things. IEEE Trans. Ind. Inform. 2022, 19, 286–295. [Google Scholar] [CrossRef]
Yang, W.; Shan, Y.; Wang, J.; Yao, Y. An industrial network intrusion detection algorithm based on IGWO-GRU. Clust. Comput. 2024, 27, 7199–7217. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Z.; Yi, F.; Zeng, C. Attack traffic detection based on LetNet-5 and GRU hierarchical deep neural network. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications, Nanjing, China, 25–27 June 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 327–334. [Google Scholar]
Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 2023, 263, 110273. [Google Scholar] [CrossRef]
Bahadoripour, S.; Karimipour, H.; Jahromi, A.N.; Islam, A. An explainable multi-modal model for advanced cyber-attack detection in industrial control systems. Internet Things 2024, 25, 101092. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Khan, I.A.; Moustafa, N.; Pi, D.; Sallam, K.M.; Zomaya, A.Y.; Li, B. A new explainable deep learning framework for cyber threat discovery in industrial IoT networks. IEEE Internet Things J. 2021, 9, 11604–11613. [Google Scholar] [CrossRef]
Williams, T.J. The Purdue enterprise reference architecture. Comput. Ind. 1994, 24, 141–158. [Google Scholar] [CrossRef]
Ayub, A.; Jo, W.; Qasim, S.A.; Ahmed, I. How are industrial control systems insecure by design? A deeper insight into real-world programmable logic controllers. IEEE Secur. Priv. 2023, 21, 10–19. [Google Scholar] [CrossRef]
Canonico, R.; Sperlì, G. Industrial cyber-physical systems protection: A methodological review. Comput. Secur. 2023, 135, 103531. [Google Scholar] [CrossRef]
Sheng, C.; Yao, Y.; Zhao, L.; Zeng, P.; Zhao, J. Scanner-hunter: An effective ICS scanning group identification system. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3077–3092. [Google Scholar] [CrossRef]
Ylmaz, E.N.; Ciylan, B.; Gönen, S.; Sindiren, E.; Karacayılmaz, G. Cyber security in industrial control systems: Analysis of DoS attacks against PLCs and the insider effect. In Proceedings of the 2018 6th International Istanbul Smart Grids and Cities Congress and Fair (ICSG), Istanbul, Turkey, 25–26 April 2018; IEEE: New York, NY, USA, 2018; pp. 81–85. [Google Scholar]
Eke, H.; Petrovski, A.; Ahriz, H. Detection of false command and response injection attacks for cyber physical systems security and resilience. In Proceedings of the 13th International Conference on Security of Information and Networks, Istanbul, Turkey, 4–6 November 2020; pp. 1–8. [Google Scholar]
Rasapour, F.; Serra, E.; Mehrpouyan, H. Framework for detecting control command injection attacks on industrial control systems (ics). In Proceedings of the 2019 Seventh International Symposium on Computing and Networking (CANDAR), Nagasaki, Japan, 26–29 November 2019; IEEE: New York, NY, USA, 2019; pp. 211–217. [Google Scholar]
Morris, T.; Gao, W. Industrial control system traffic data sets for intrusion detection research. In Proceedings of the International Conference on Critical Infrastructure Protection, Arlington, VA, USA, 19–20 March 2024; Springer: Berlin/Heidelberg, Germany, 2014; pp. 65–78. [Google Scholar]

Figure 1. Schematic diagram of an IDS scheme based on network traffic.

Figure 2. Application workflow of the Avocado model.

Figure 3. Overview of Avocado model framework.

Figure 4. NGAS confusion matrix.

Figure 5. CLIA-M221 confusion matrix.

Figure 6. Sliding window step control experiment.

Figure 7. Ablation study of the flow feature fusion module.

Figure 8. Byte-level interpretability visualization.

Figure 9. Per-class byte-level interpretability heatmap.

Table 1. Characteristics and anomaly levels of five typical network attacks.

Attack Types	Network Characteristics	Anomaly Hierarchy
Reconnaissance [33]	High-frequency scanning; numerous small flows; rapid traversal across IPs/ports	Flow level
Denial of Service (DoS) [34]	Bursty traffic; malformed fields (e.g., CRC errors); resource exhaustion patterns	Byte/Packet/Flow level
Response Injection [35]	Tampered feedback packets; syntactically valid but semantically inconsistent fields	Byte/Flow level
Command Injection [36]	Imitated control commands; structurally similar to normal traffic but with malicious intent	Byte/Packet/Flow level
Control Logic Injection [8]	Malicious logic split across multiple packets; delayed or fragmented injection	Byte/Packet/Flow level

Table 2. Key training hyperparameters and random seeds.

Parameter	Value	Description
Random seed	9876	Ensures reproducibility of data splits and model initialization
Window size (N)	10	Number of packets per sliding window
Step size (s)	1	Window stride between samples
Packet length (m)	128 bytes	Truncated or padded to fixed length
Batch size	128	Number of samples per iteration
Dropout	0.2	Regularization factor
Optimizer	AdamW	With weight decay = 1 × 10⁻⁴
Learning rate	1 × 10⁻³ (cosine schedule)	Dynamically adjusted during training
Early stopping	patience = 5	Stops training if no validation improvement

Table 3. Statistics of CLIA-M221 Dataset.

Categories	Tags	Size (MB)	Control the Amount of Logic	M221 Number of Packets	Number of Write Request Packets
Normal control logic is issued	0	2.1	22	10,148	1101
Malicious control logic injection	1	3.7	29	11,092	1535
Malicious Control logic	2	2.2	29	8168	5362

Table 4. Statistics of NGAS dataset.

Categories	Tags	Number of Samples
Normal	0	214,580
NMRI	1	20,412
CMRI	2	13,035
MSCI	3	7900
MPCI	4	7753
MFCI	5	4898
DoS	6	3874
Recon	7	2176

Table 5. Comparison of advanced methods.

Methods	NGAS [20]						CLIA-M221 [8]
Methods	ACC (%)	FPR (%)	FNR (%)	P (%)	R (%)	F1 (%)	ACC (%)	FPR (%)	FNR (%)	P (%)	R (%)	F1 (%)
Osho et al. [11]	83.6	8.3	12.5	74.7	87.5	80.7	80.4	9.1	14.7	94.7	85.3	89.6
ASandeep et al. [12]	88.9	5.7	8.9	81.8	91.1	86.1	86.3	7.4	9.5	95.9	90.5	93.0
Anton et al. [13]	92.1	5.3	4.6	83.5	95.4	89.1	90.2	6.5	7.1	96.4	92.9	94.6
Wang et al. [25]	91.8	4.3	5.9	85.9	94.1	89.8	90.2	5.9	7.0	96.8	93.0	94.7
Huda et al. [22]	93.7	4.0	6.2	86.8	93.8	90.1	92.6	5.3	6.3	97.1	93.7	95.3
Aouedi et al. [23]	94.2	4.0	4.7	87.0	95.3	91.0	91.6	5.4	6.4	97.1	93.6	95.3
Yang et al. [24]	94.3	3.8	4.5	87.6	95.5	91.4	92.8	4.6	6.1	97.5	93.9	95.7
Our model	95.7	3.2	3.6	89.4	96.4	92.7	94.5	3.7	4.3	98.0	95.7	96.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Liu, T.; Hu, N. Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks. Electronics 2025, 14, 4233. https://doi.org/10.3390/electronics14214233

AMA Style

Liu X, Liu T, Hu N. Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks. Electronics. 2025; 14(21):4233. https://doi.org/10.3390/electronics14214233

Chicago/Turabian Style

Liu, Xin, Tao Liu, and Ning Hu. 2025. "Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks" Electronics 14, no. 21: 4233. https://doi.org/10.3390/electronics14214233

APA Style

Liu, X., Liu, T., & Hu, N. (2025). Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks. Electronics, 14(21), 4233. https://doi.org/10.3390/electronics14214233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Avocado: An Interpretable Fine-Grained Intrusion Detection Model for Advanced Industrial Control Network Attacks

Abstract

1. Introduction

2. Related Work

2.1. Limitations of Existing Detection Paradigms

2.2. Interpretability in ICS Intrusion Detection

2.3. Critical Analysis and Identification of Research Gaps

3. Motivation and Objective

4. Attack Scenarios

4.1. Industrial Control Network Architecture

4.2. Attack Techniques

5. Methodology

5.1. Workflow

5.2. Preprocessing

5.3. Training

5.3.1. Byte-Level Interpretable Module

5.3.2. Packet-Level Feature Extraction Module

5.3.3. Flow-Level Feature Fusion Module

5.3.4. Detection Result Output Module

5.3.5. Model Enhancement

6. Experiments and Results Analysis

6.1. Experimental Setup

6.1.1. Experimental Environment

6.1.2. Datasets

6.2. Comparison with the State-of-the-Art Methods

6.3. Comparison Experiment of Flow Window Step Size

6.4. Comparison Experiment of Flow Feature Fusion Module

6.5. Byte-Level Interpretive Ability Effect

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI