MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios

Sun, Jiakun; Jin, Pengfei; Wang, Yabo; Jin, Shuyuan

doi:10.3390/electronics15051017

Open AccessArticle

MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios

by

Jiakun Sun

¹

,

Pengfei Jin

¹,

Yabo Wang

¹ and

Shuyuan Jin

^1,2,*

¹

School of Computer Science and Engineering, Sun Yat-sen University, Gangzhou 510275, China

²

Guangdong Province Key Laboratory of Information Security Technology, Guangzhou 510275, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 1017; https://doi.org/10.3390/electronics15051017

Submission received: 19 January 2026 / Revised: 10 February 2026 / Accepted: 13 February 2026 / Published: 28 February 2026

Download

Browse Figures

Versions Notes

Abstract

Malicious traffic detection in the Industrial Internet of Things (IIoT) faces significant challenges, primarily due to the scarcity of labeled data, high inference latency on resource-constrained edge devices, and the lack of comprehensibility in deep learning models. To overcome these limitations, this paper proposes MeeDet, a novel detection framework that integrates Mamba-based state-space modeling, a dynamic early-exit mechanism, and Large Language Model (LLM)-driven comprehensibility. The proposed MeeDet operates through a four-stage pipeline. First, raw packet captures are preprocessed into header-only, standardized stride-based sequences. Second, a 12-layer unidirectional Mamba backbone is pretrained on unlabeled data using two complementary tasks: Masked Byte Modeling for byte-level semantics and Next-Flow Prediction for long-range flow-level temporal coherence. Third, the model is fine-tuned by attaching lightweight binary heads to each Mamba layer, allowing for the early termination of high-confidence benign samples and adaptive routing of ambiguous flows to deeper layers. Finally, for detected malicious samples, structured prompts containing key network traffic features are processed by an LLM to generate human-readable diagnostic reports, without affecting real-time detection latency. Extensive experiments on five public IIoT datasets demonstrate the superiority of MeeDet over existing baselines. MeeDet achieves F1-scores exceeding 0.98 on key benchmarks while significantly reducing computational overhead. Specifically, at a 1% malicious traffic ratio, MeeDet requires only 1.7 MFLOPs and 1.58 ms of average inference latency, representing a reduction of over 70% in computational cost compared to strong pretrained baselines.

Keywords:

IIoT; malicious traffic detection; early exit; Mamba encoder

1. Introduction

In Industrial Internet of Things (IIoT) environments, malicious network traffic detection is a key technology for ensuring the industrial control system’s security. Continuously monitoring network communications in IIoT environments safeguards data security and mitigates malicious attacks (e.g., malicious firmware delivery, command-and-control communications, unauthorized write operations, lateral movement, scanning, and resource abuse) [1,2,3]. These activities are often concealed within seemingly normal traffic patterns to evade traditional intrusion detection systems based on deep packet inspection. Unlike traditional IT networks, IIoT traffic exhibits a set of distinctive technical characteristics that fundamentally affect intrusion detection design. First, IIoT environments rely heavily on industrial control protocols (e.g., Modbus, OPC UA, DNP3) that are command-driven, stateful, and often lack encryption or authentication. Their semantics are tightly coupled with physical processes, making malicious deviations subtle and difficult to detect using generic IT-oriented features. Second, periodic and deterministic communication patterns dominate IIoT traffic. Field devices typically engage in fixed-interval polling and reporting, resulting in highly regular inter-arrival times and packet structures. While this regularity provides strong baselines for detection, it also implies that low-rate or “low-and-slow” attacks can closely mimic benign behavior and evade short-context models. Third, real-time and resource constraints are significantly stricter in IIoT deployments. Detection systems are often deployed on edge gateways co-located with control equipment, where inference latency, memory footprint, and power consumption are tightly bounded. Even small increases in average processing delay may interfere with control loops or safety monitoring. These characteristics provide exploitable baselines and fingerprints for malicious traffic detection, while simultaneously imposing stricter requirements on model generalization, low-latency inference, and comprehensibility. According to the 2022 Industrial Internet Security Situation Report [4], network attacks in IIoT contexts show a fluctuating upward trend, posing new threats to industrial production safety.

Existing studies on network traffic detection in IIoT scenarios can be broadly categorized into three lines. (i) Anomaly detection methods: The anomaly detection methods construct baseline models of normal operating conditions in IIoT environments and identify suspicious traffic by deviations in statistical features and temporal behaviors [5,6,7,8,9]. Ref. [5] uses label information as a constraint to control the category of generated samples based on the Conditional Variational Autoencoder, and they overcome gradient bias by introducing unbiased estimation via Categorical Features Gradient Boosting, improve prediction accuracy, and reduce the risk of overfitting by adopting various tree growth modes. Ref. [6] presents the first lightweight time-frequency anomaly detection model (LTFAD) based on a shallow All-MLP architecture for IIoT time series, which integrates dual-branch global-local reconstruction and time-frequency joint learning to achieve high timeliness and accuracy on edge devices. A full graph autoencoder is proposed for one-class group anomaly detection in large-scale IIoT systems to tackle multisource data coupling and lack of physical knowledge, with full graph reconstruction, variational learning, and graph augmentation improving accuracy and robustness on LRE multisensor datasets [8]. However, they are prone to false alarms for infrequent yet legitimate activities such as maintenance operations and batch transitions. In addition, most anomaly detection methods emphasize binary classification, making it difficult to refine detections to specific attack categories. (ii) Machine learning-based methods: These methods extract traffic features (e.g., packet length distributions, inter-arrival time statistics, session duration) from raw network traffic and train machine learning-based models, which achieve better performance [10,11,12,13]. A Genetic Algorithm-Random Forest-based IDS is proposed for IIoT intrusion detection, which selects features via Genetic Algorithm (with RF as fitness function) and achieves superior performance (87.61% TAC and 0.98 AUC) on the UNSW-NB15 dataset [11]. Ref. [12] proposes an optimized traditional machine learning-based IoT malicious traffic detection model, which reduces model complexity via feature engineering and Bayesian optimization for flow-level classification, and achieves an F1 score exceeding 0.75 in practical evaluations. Nevertheless, acquiring labeled data from real IIoT environments is difficult due to the scarcity of accessible labeled data. Moreover, data distributions drift with changes in operating conditions and challenge model stability. (iii) Deep learning methods: These studies learn representations directly from raw network traffic, commonly leveraging CNNs, Transformers, and graph neural networks to capture inter-device communication patterns [14,15,16,17,18]. However, inference budgets in resource-constrained IIoT environments are limited, making it difficult to balance detection performance and efficiency. Furthermore, deep learning-based methods are often considered “black boxes”, making it difficult to interpret their decision-making processes; this weaker comprehensibility often fails to meet IIoT maintenance requirements.

In summary, existing studies have addressed certain issues in malicious traffic detection in IIoT, but they still face challenges in real-world network environments. Although deep learning methods achieve remarkable detection performance, they overlook the data distribution in IIoT environments, which face significantly higher computational requirements. It is challenging to attain an optimal balance between detection performance and efficiency. Consequently, most IDSs continue to utilize machine learning-based or anomaly detection methods as a trade-off, rather than more computationally intensive deep learning methods. Building a detection model for IIoT scenarios that simultaneously achieves strong performance, efficiency, and comprehensibility remains an open research challenge.

To address these issues, we propose MeeDet, which combines Mamba blocks with an early exit mechanism for malicious traffic detection in IIoT scenarios. While balancing detection performance and efficiency, MeeDet also achieves comprehensibility by leveraging a LLM. MeeDet comprises three stages: (i) large-scale self-supervised pretraining of a Mamba-based representation model on unlabeled IIoT network traffic data to learn robust traffic representations; (ii) supervised fine-tuning with dynamic early exits so that most high-confidence benign samples are classified at shallow layers, improving average latency and computational efficiency while maintaining global accuracy; and (iii) feature distillation and evidence summarization for flagged malicious samples, augmented with LLM–based explanations to assist network administrators in investigation and response.

The main contributions of this paper are summarized as follows:

Our proposed MeeDet achieves detection performance comparable to the state-of-the-art methods with a Mamba-based backbone network structure.
MeeDet substantially improves detection efficiency and reduces computational overhead while maintaining detection accuracy with an early-exit mechanism.
MeeDet provides comprehensibility for detection outcomes with a LLM, thereby assisting network administrators in incident analysis and decision-making.

2. Related Works

2.1. Mamba Structure

State space models (SSMs) have emerged as competitive sequence learners for long-context modeling. Early selective SSMs introduced stable continuous-time parameterizations and efficient scanning, which paved the way for hardware-friendly variants suitable for real-world applications [19,20]. The Mamba architecture extends this lineage by incorporating input-dependent selection and fused kernels. This design achieves linear complexity and favorable latency–throughput trade-offs compared to self-attention mechanisms, while retaining strong performance on long sequences [21]. Such properties are particularly appealing for network analytics, where network flow traces are inherently long, bursty, and noisy. Recent studies in traffic analysis have begun replacing attention or RNN backbones with Mamba blocks for per-flow classification and anomaly detection. These works report improved recall on low-and-slow behaviors and increased robustness to missing packets, attributed to the model’s gated scanning and convolutional warm-up mechanisms [22,23].

In the context of the IIoT, which is characterized by proprietary protocols, periodic beacons, and strict real-time constraints, Mamba has been integrated with protocol-aware tokenizers and lightweight MLP heads. This combination facilitates early-exit mechanisms on edge gateways, thereby reducing inference costs while maintaining accuracy under conditions of class imbalance. Intrusion detection at IT/OT boundaries necessitates long-horizon temporal reasoning and compute-efficient models, requirements that align closely with the design objectives of Mamba.

2.2. Early-Exit Mechanisms

Early-exit mechanisms enable conditional computation by terminating inference once predictions reach a sufficient level of confidence, thereby reducing average latency. This paradigm not only accelerates the inference process but also lowers computational costs, making it particularly beneficial for deployment in resource-constrained environments. To optimize this efficiency, mechanisms can be employed to dynamically determine confidence thresholds based on network performance and the nature of the input data [24,25]. Specific architectures, such as BranchyNet, incorporate side branches that allow for early termination during both training and inference phases [26]. This design has demonstrated substantial computational savings with minimal impact on overall accuracy. Beyond standard confidence scores, alternative approaches leverage uncertainty estimation to guide exit decisions. Providing a measure of uncertainty is crucial when dealing with out-of-distribution inputs, where the network’s raw confidence may be unreliable [27,28]. Furthermore, multi-scale designs like MSDNet process inputs at varying resolutions and introduce classification layers at different depths [29]. Advanced implementations of such dynamic routing may utilize reinforcement learning to determine the optimal inference path based on input characteristics and the desired trade-off between accuracy and latency.

Collectively, early-exit mechanisms represent a significant shift toward more efficient deep learning architectures, specifically in scenarios where computational resources are limited or inference speed is critical, such as malicious traffic detection, autonomous driving, and medical image analysis. These strategies continue to expand the applicability of deep learning, rendering complex models more practical for real-world deployment.

3. MeeDet

3.1. Overview

In the context of IIoT environments, network infrastructures inherently exhibit significant vulnerabilities. As illustrated in Figure 1, we consider a smart manufacturing system where shop-floor devices communicate with an on-premise edge server, which subsequently exchanges data summaries with a cloud service. In this architecture, devices generate telemetry and control traffic directed to the edge, while selected summaries are transmitted to the cloud. Within this framework, an external network attacker targets cloud-facing APIs and the factory perimeter to compromise credentials, inject malicious traffic toward the edge, and move laterally to disrupt controllers or exfiltrate sensitive data. To mitigate these threats, MeeDet is deployed to passively inspect flows at the edge, extracting network traffic features and performing online inference to flag suspicious behaviors in real time.

Our proposed MeeDet is an efficient detector composed of four phases: (i) pre-processing, where raw traffic flows undergo splitting and parsing to be tokenized and mapped into embeddings, constructing stride-based sequences with identity anonymization; (ii) pre-training, which utilizes a Mamba-based backbone to perform large-scale self-supervised learning via Masked Byte Modeling (MBM) and Next-Flow Prediction (NFP) to capture intrinsic traffic dependencies; (iii) fine-tuning, where the pre-trained model is adapted via a dynamic early-exit strategy, employing intermediate MLP heads to allow high-confidence benign samples to exit early while routing complex attack patterns (e.g., Injection, Flood) through deeper layers; and (iv) analysis, which executes post-detection forensics by extracting structured features from flagged samples and constructing prompts for Large Language Model (LLM)-driven reporting.

Unlike a simple combination of existing techniques, MeeDet is designed around a unified objective: enabling accurate, low-latency, and explainable detection under IIoT-specific constraints. The Mamba backbone is selected not merely as an efficient sequence model, but because its state-space formulation aligns with the long-horizon, periodic dynamics of industrial protocols. The early-exit mechanism is not an auxiliary optimization; instead, it is explicitly calibrated to exploit the extreme class imbalance typical of IIoT traffic, ensuring that benign flows consume minimal computation while preserving recall for rare attacks. Finally, the LLM module is decoupled from the real-time path and constrained by structured prompts, transforming low-level traffic evidence into operator-oriented insights without compromising system safety. This tight coupling of architectural choices differentiates MeeDet from prior works that address efficiency, accuracy, or interpretability in isolation.

3.2. Data Pre-Processing

In the data pre-processing phase, we convert raw packet captures into compact sequences that focus exclusively on protocol interaction patterns while suppressing identity leakage and encrypted noise. Packets are grouped into flows based on the standard 5-tuple. Given that the payload of encrypted traffic is statistically indistinguishable from random noise, we discard the payload entirely and utilize only the protocol headers, as these contain rich state transition information.

We adopt a header-only preprocessing strategy to ensure protocol generality and deployment feasibility in IIoT environments. Payload contents are excluded because many industrial protocols transmit unencrypted but device-specific traffic formats, which may introduce overfitting to particular datasets and raise privacy concerns in real deployments. Each packet header is normalized into a fixed-length byte sequence, and identity-related fields (e.g., IP addresses and MAC addresses) are masked to prevent the model from learning dataset-specific shortcuts. This design encourages the model to focus on temporal and structural traffic patterns rather than static identifiers.

For each packet

p_{i}

in a flow, we extract the first

N_{h}

bytes of its headers, which typically include the Ethernet, IP, and Transport layer headers. To prevent the model from learning spurious correlations based on specific network identities, we strictly mask identity-bearing fields. Specifically, the source and destination IP addresses and MAC addresses are set to zero. The standardized header vector for packet

p_{i}

is defined as

{\tilde{h}}_{i} = Mask (hdr {(p_{i})}_{1 : N_{h}}) \in {0, \dots, 255}^{N_{h}},

(1)

where

hdr (\cdot)

denotes the raw header byte sequence and

Mask (\cdot)

represents the operation that zeros out the identity fields. We set

N_{h} = 54

to cover the standard Ethernet, IP, and TCP or UDP headers.

We collect the first M packets of each flow and concatenate their masked headers into a continuous byte sequence

b \in {0, \dots, 255}^{L_{b}}

, where

L_{b} = M \times N_{h}

. This concatenation preserves the temporal structure of the protocol exchange. To capture local structural patterns across packet boundaries, we partition

b

into non-overlapping strides of length

L_{s}

:

s_{t} = {[b_{(t - 1) L_{s} + 1}, \dots, b_{t L_{s}}]}^{⊤} \in R^{L_{s}}, t = 1, \dots, N_{s},

(2)

where

N_{s} = ⌊ L_{b} / L_{s} ⌋

. We normalize the byte values to the range

[0, 1]

to treat

s_{t}

as a real-valued feature vector.

Each stride is subsequently projected linearly into a D-dimensional latent space and augmented with a positional embedding. A learnable class token

x_{cls}

is appended to aggregate global context. The input embedding process is formulated as follows:

x_{t} = W s_{t} + e_{t}, W \in R^{D \times L_{s}}, e_{t} \in R^{D},

(3)

X^{(0)} = {[x_{1}, \dots, x_{N_{s}}, x_{cls}]}^{⊤} \in R^{(N_{s} + 1) \times D} .

(4)

Here,

W

is the learnable projection matrix, and

e_{t}

represents the positional embedding for the t-th stride. Finally,

X^{(0)}

serves as the input to the backbone model. Traffic flows are segmented into stride-based sequences with a fixed length to balance temporal coverage and computational efficiency. Stride-based sampling allows overlapping contextual windows, which improves the model’s ability to capture periodic communication patterns commonly observed in IIoT traffic, while avoiding excessive sequence length that would increase inference latency. The sequence length and stride are empirically selected based on validation performance and edge-device resource constraints.

3.3. Pretraining

MeeDet employs a stack of 12 unidirectional Mamba blocks to encode header stride sequences efficiently. Unlike standard Transformers that scale quadratically, Mamba utilizes a content-conditioned state space model to capture long-range dynamics with linear complexity. To enable global context understanding essential for masked modeling, each block processes the sequence in a causal (forward-only) manner, while the state-space formulation enables long-range dependency modeling without requiring backward context. Specifically, the selective scan mechanism of Mamba is inherently suitable for IIoT traffic analysis. Unlike standard Transformers that treat all tokens equally via attention matrices, Mamba’s input-dependent selection allows the model to effectively filter out irrelevant protocol noise (e.g., padding bytes or constant header fields) while selectively propagating essential state information (e.g., TCP flags sequences or specific function codes in Modbus). This capability is crucial for distinguishing between normal periodic polling and low-rate denial-of-service attacks, as the model can retain long-term memory of flow states without the computational burden of storing the entire attention history. A 12-layer unidirectional Mamba backbone is employed as a compromise between modeling capacity and edge-side efficiency. Deeper configurations were observed to yield diminishing performance gains while incurring noticeably higher latency, whereas shallower models showed insufficient capacity to model long-range protocol dependencies. The unidirectional setting is adopted to align with real-time detection scenarios, where future packets are not available at inference time.

We analyze the computational complexity of the Mamba backbone with respect to sequence length and model size. Given an input sequence of length L and hidden dimension d, each Mamba layer processes the sequence using a state-space formulation with causal convolution and selective state updates. As a result, both the time and memory complexity scale linearly with sequence length, i.e.,

O (L \cdot d)

per layer. Importantly, the number of model parameters in Mamba is independent of the input sequence length and depends only on the model width and depth. In contrast, Transformer-based architectures rely on self-attention mechanisms whose computational and memory complexity scale quadratically with sequence length, i.e.,

O (L^{2} \cdot d)

, which becomes prohibitive for long traffic sequences. This linear complexity property makes Mamba particularly suitable for modeling long-range and periodic IIoT traffic under strict edge-side resource constraints.

Although bidirectional architectures are commonly used in representation learning, MeeDet adopts a strictly unidirectional Mamba backbone. This choice is motivated by both empirical performance and deployment considerations. In preliminary experiments, we evaluated bidirectional variants but observed increased inference latency with no consistent performance gains, particularly on encrypted traffic where future context contributes limited discriminative information. In contrast, the unidirectional design achieves faster inference and better aligns with real-time IIoT scenarios, where traffic arrives sequentially and future packets are unavailable. Therefore, all experiments in this work use a 12-layer unidirectional Mamba backbone.

To balance representation capacity and edge-deployment efficiency, we configure the model with a hidden dimension

D = 192

, a state expansion factor

E = 2

, and a state dimension

N = 16

. The 1D convolution kernel size is set to 4. We adopt two widely used self-supervised tasks adapted for header-only traffic analysis:

Masked Byte Modeling (MBM): We employ a reconstruction objective to learn token-level semantics from unidirectional context. A fixed mask ratio r is applied to the header stride sequence, yielding a masked index set

B

. For each masked position

t \in B

, the model predicts the original header byte vector. Let

s_{t}

denote the ground-truth byte vector (normalized to

[0, 1]

) and

h_{t}

be the encoder’s latent output at position t. The reconstruction is obtained via a prediction head

ψ (\cdot)

(a linear layer). The MBM loss is the Mean Squared Error (MSE) between the prediction and the normalized ground truth:

L_{MBM} = \frac{1}{| B |} \sum_{t \in B} {∥ s_{t} - ψ (h_{t}) ∥}_{2}^{2} .

(5)

Here, r controls the masking density;

s_{t}

is the fixed, normalized raw header stride; and

ψ (h_{t})

is the predicted reconstruction. This objective forces the encoder to infer missing protocol fields (e.g., IP addresses, TCP flags, port numbers) by aggregating information from preceding strides and long-range state transitions captured by the state-space dynamics, effectively learning the syntax of network protocols.

Next-Flow Prediction (NFP): To capture temporal coherence at the flow level, we form ordered flow pairs

(F_{a}, F_{b})

sampled from the same device window. Since our input is restricted to headers,

F

represents the sequence of header strides for a given flow. A binary label y indicates whether

F_{b}

physically follows

F_{a}

(

y = 1

) or is a randomly sampled negative (

y = 0

). We use a Siamese architecture where the encoder processes each flow’s header sequence independently. We pool the class token from the final block to obtain embeddings

e_{a}

and

e_{b}

. Their similarity is measured via a bilinear or dot-product interaction followed by a sigmoid activation

σ (\cdot)

. The NFP objective is a binary cross-entropy loss:

L_{NFP} = - y log σ (e_{a}^{⊤} e_{b}) - (1 - y) log (1 - σ (e_{a}^{⊤} e_{b})) .

(6)

This task encourages the model to learn device-specific temporal patterns and flow transition logic based solely on header signatures.

The pretraining loss linearly combines MBM and NFP via non-negative weights

λ_{MBM}

and

λ_{NFP}

:

L_{pre} = λ_{MBM} L_{MBM} + λ_{NFP} L_{NFP} .

(7)

In practice, MBM stabilizes early feature extraction of protocol fields, while NFP strengthens sequence-level alignment of traffic behaviors.

3.4. Fine-Tuning with Early Exits

To reduce latency in benign-dominant IIoT traffic, we attach a lightweight binary head to each of the

l = 1, \dots, 12

Mamba layers. Let

H^{(l)}

denote the token sequence output at layer ℓ and

c^{(l)}

the corresponding class-token embedding. The ℓ-th head produces a benign probability

s^{(l)} \in (0, 1)

via a single-layer MLP with sigmoid. With a calibrated threshold

τ^{(l)}

, the decision rule is early exit as benign if

s^{(l)} \geq τ^{(l)}

; otherwise, inference proceeds to the next layer. If no early exit is triggered, a terminal softmax head outputs a probability vector

p

over C malicious classes. The thresholds

τ^{(l)}

are not fixed arbitrarily; instead, they are dynamically calibrated on the validation set after the training phase. Specifically, we employ a greedy search strategy to find the optimal

τ^{(l)}

for each layer that satisfies a user-defined tolerance for accuracy drop (e.g., <1%) relative to the full model, while maximizing the exit rate for benign samples. This ensures that the early-exit mechanism strictly adheres to the safety-critical requirements of the IIoT environment. Early-exit thresholds are calibrated per dataset using a validation set. The tolerance is defined as a maximum relative F1-score drop of 1% compared to full-depth inference. Thresholds

τ^{(l)}

are greedily selected to maximize benign exit rate under this constraint. Calibration is performed independently for each dataset and class-imbalance setting.

We optimize three complementary components. First, the average focal binary cross-entropy

L_{bin}

trains the per-layer benign/continue heads using label

y_{b} \in {0, 1}

(1 for benign, 0 for malicious). The focal parameters

α

and

γ

balance classes and emphasize hard examples, respectively. Second, the terminal multi-class head is trained with cross-entropy

L_{mc}

over the C attack categories, producing

p

. Third, to ensure that early exits do not degrade performance compared to the full model, we introduce a consistency penalty

L_{c o n s}

. This loss term encourages the probability distribution of the intermediate heads to align with the prediction of the final layer, particularly for hard samples. We formulate this using the Kullback–Leibler (KL) divergence between the output probability distribution of the ℓ-th early-exit head, denoted as

P^{(l)}

, and the final head’s distribution

P^{(f i n a l)}

. The consistency loss is calculated as the average KL divergence across all layers:

L_{c o n s} = \frac{1}{L - 1} \sum_{l = 1}^{L - 1} KL (P^{(f i n a l)} ‖ P^{(l)})

(8)

By minimizing this divergence, the intermediate layers learn to approximate the rich semantic representation of the final layer, thereby improving the reliability of early decisions. The overall fine-tuning objective is

L_{f i n e} = L_{b i n} + β L_{m c} + η L_{c o n s}

(9)

During pretraining, Masked Byte Modeling and Next-Flow Prediction are jointly optimized to capture both fine-grained protocol semantics and coarse-grained temporal dependencies across flows. In the fine-tuning stage, binary cross-entropy loss is applied to each early-exit head, with shared backbone parameters to ensure consistent representations across layers. This multi-head supervision encourages earlier layers to learn separable representations for benign traffic, which is essential for the effectiveness of the early-exit mechanism.

We optimize the model using the AdamW optimizer with a weight decay of 0.01. The learning rate is initialized at

5 \times 10^{- 4}

and decays following a cosine annealing schedule. For the loss function components, we empirically set the balancing coefficients

β = 1.0

and

η = 0.5

to ensure that the multi-class classification and consistency regularization contribute effectively without overwhelming the binary early-exit objective. The Focal Loss hyperparameters are set to

α = 0.25

and

γ = 2.0

to address class imbalance. The model is fine-tuned for 20 epochs with a batch size of 128.

3.5. LLM-Driven Analysis

When a sample is labeled as any malicious type, MeeDet emits structured network traffic features following the feature extraction method documented [30]. The structured features will be transformed into a controlled natural-language prompt for an LLM, containing a brief context, ranked evidence with normalized units, and explicit instructions to return a short narrative plus machine-readable actions. The LLM produces a risk assessment, likely causes, and concrete mitigations. This workflow turns model predictions into actionable recommendations, reducing time-to-triage while preserving privacy and operational safeguards. To mitigate the hallucination risks inherent in generative models, we implement a Constraint-Based Prompting strategy. The prompt does not merely ask for an open-ended analysis; instead, it injects a retrieved context snippet from a verified knowledge base (e.g., MITRE ATT&CK for ICS matrix) corresponding to the detected attack class. The LLM is explicitly instructed to ground its reasoning solely on the provided structured features and the injected knowledge context. Furthermore, the output generation is constrained to a strict JSON schema, and any response containing non-existent protocol fields or undefined action codes is automatically rejected and flagged for manual review.

It is important to note that MeeDet operates in a decoupled manner to ensure real-time performance. The Mamba-based detector acts as a synchronous gatekeeper, processing high-throughput traffic with millisecond-level latency. The LLM module operates asynchronously and is triggered only when the detector flags a sample as malicious. This “human-in-the-loop” design ensures that the heavy computational cost of LLM generation (typically seconds) does not impede the network’s packet forwarding or the detector’s real-time inference capabilities.

4. Experiments

In this section, we conduct three malicious traffic detection tasks to prove the effectiveness of MeeDet in solving problems under different IIoT scenarios. We then compare our model with four baselines and perform a layer-wise analysis of MeeDet. We further provide an interpretative analysis of the remarkable performance obtained by MeeDet.

4.1. Experiment Setup

4.1.1. Datasets and Downstream Tasks

To evaluate the performance of MeeDet, we conduct experiments across three malicious traffic detection tasks in IIoT on five public datasets. The datasets are shown in Table 1. Crucially, the self-supervised pretraining phase utilizes only the unlabeled packets from the training split. The testing set is strictly reserved for the final performance evaluation and is never exposed to the model during either the pretraining or fine-tuning phases.

The selected datasets collectively reflect key aspects of real-world IIoT environments. Edge-IIoTset and X-IIoTID capture heterogeneous device interactions and mixed industrial/IoT protocols commonly observed at OT/IT boundaries. TON-IoT emphasizes telemetry-driven industrial workloads with realistic background traffic, while CICIIoT2023 introduces large-scale, multi-protocol attack scenarios under controlled conditions. CICAPT-IIoT2024 further models long-lived, multi-stage APT campaigns aligned with industrial threat models. Although no public dataset fully captures proprietary industrial deployments, the diversity of protocol types, attack stages, and traffic periodicity across these datasets provides a reasonable approximation of operational IIoT conditions.

Task 1: General Malicious Traffic Detection. In IIoT environments, malicious traffic exhibits characteristics that differ markedly from traditional IT settings. IIoT traffic is inherently multi-protocol, combining industrial-specific protocols (e.g., Modbus, OPC UA) and IoT protocols (e.g., MQTT, CoAP), and often features the coexistence of legacy and modern communication paradigms. To address these challenges, we construct a comprehensive dataset by integrating multiple sources (CICIIoT2023, Edge-IIoTset, X-IIoTID, and TON-IoT) containing both IoT and IIoT traffic. The resulting dataset spans diverse protocols and multiple malicious-traffic categories, enabling rigorous evaluation of model performance for IIoT malicious traffic detection.

Task 2: APT Detection in IIoT. In IIoT environments, Advanced Persistent Threat (APT) attacks exhibit distinctive characteristics and introduce new challenges. The long operational lifecycles of IIoT systems render APT campaigns more stealthy and persistent, and more deeply intertwined with industrial processes. Adversaries can further exploit the heterogeneity and comparatively weak security mechanisms of IIoT devices to orchestrate multi-stage intrusions. The CICAPT-IIoT2024 dataset captures network traffic spanning all canonical APT phases—collection, exfiltration, command and control, persistence, discovery, credential access, lateral movement, and defense evasion. Building on this dataset, we design experiments to evaluate and improve the effectiveness of APT detection within IIoT environments.

Task 3: Detection Efficiency Evaluation. Edge devices in IIoT environments are resource-constrained, making detection efficiency paramount. In most industrial sites, normal traffic accounts for over 90% of the total volume, while malicious traffic typically constitutes only a small fraction, often less than 5% [32,33,34]. Unfortunately, existing detection methods frequently overlook the data distribution of real IIoT scenarios, a critical oversight that leads to multiple practical issues. To address this gap, we constructed datasets with different malicious traffic proportions (1%, 5%, 10%, and 20%) based on the aforementioned multi-protocol IIoT traffic datasets. These proportions are designed to mimic the varying degrees of imbalance observed in real scenarios. For example, 1% malicious traffic simulates well-protected industrial sites with minimal attack attempts, while 20% represents high-risk environments (e.g., unpatched legacy IIoT systems in critical infrastructure) facing frequent threats. By evaluating detection models on these scenario-specific datasets, we can more accurately assess their actual performance in the real IIoT world, such as their ability to minimize false negatives (avoid missing rare malicious traffic) while controlling false positives (preventing normal traffic from being incorrectly flagged as malicious) in industrial practice.

4.1.2. Evaluation Metrics

We evaluate detection performance using four typical classification metrics: Accuracy (AC), Precision (PR), Recall (RC), and F1-score (F1). These metrics capture complementary aspects of detection and are widely adopted in classification settings; hence, their calculations are not repeated here.

To evaluate the detection efficiency, we measure Floating Point Operations (FLOPs) and inference time. FLOPs approximate computational complexity and resource demand (larger values typically imply higher compute cost and longer latency), while inference time denotes the wall-clock time required to generate a prediction for a single input. Since MeeDet employs early-exit inference, samples may terminate at different off-ramps; therefore, we report the average FLOPs and average inference time over all test samples.

For an L-layer MLP with widths

d_{0}, \dots, d_{L}

, the dominant cost of layer i is the dense product

X_{i - 1} W_{i} \in R^{d_{i - 1} \times d_{i}}

, requiring approximately

2 d_{i - 1} d_{i}

FLOPs; bias additions and elementwise activations contribute only

O (d_{i})

overhead and are neglected in the main term. Consequently, the total MLP cost is

\sum_{i = 1}^{L} 2 d_{i - 1} d_{i}

, and for the two-layer feed-forward network in our setting with dimensions

d \to h \to d

, the complexity simplifies to approximately

4 d h

, again omitting bias and activation as lower-order terms.

For a Mamba block processing a sequence of length T with model width d, expansion width p, and q parameter-generating projections, the leading operations are linear projections to and from the SSM pathway and the parameter heads. The input projection

d \to p

and output projection

p \to d

each cost about

2 T d p

FLOPs, while the q parameter projections each add approximately

2 T d p

, yielding a dominant term of

2 (q + 2) T d p

FLOPs. The diagonal, first-order selective SSM scan contributes an additional

O (T p)

FLOPs with a small constant, and elementwise gating/activations and lightweight normalization similarly add lower-order

O (T p)

or

O (T d)

terms; these are negligible relative to the matrix multiplications when

T d p

is large.

In our experiments, we omit detailed FLOPs derivations for the remaining network components. In practice, layers with negligible computational impact, such as activation functions, are excluded from the count. Although this may introduce minor discrepancies, it does not materially affect the overall complexity estimates.

4.1.3. Implementation Details

Our experiments were implemented using PyTorch 2.1.0 and Python 3.10. The model training and performance evaluation were conducted on a high-performance server equipped with an Intel Xeon Gold 6226R CPU @ 2.90GHz, 128GB RAM, and a single NVIDIA GeForce RTX 3090 GPU (24GB VRAM). For all datasets, we follow a standard train/validation/test split. Specifically, 70% of the data is used for training, 10% for validation, and 20% for testing. All splits are performed at the flow level to avoid information leakage. For robustness experiments under class imbalance, the test set remains fixed while imbalance ratios are applied only to the training set. MeeDet is trained in two phases. In the pretraining phase, the unidirectional Mamba backbone is trained for 50 epochs using unlabeled traffic. In the supervised fine-tuning phase, the full model is trained for 30 epochs. Early stopping is applied based on validation performance. Unless otherwise specified, we use a batch size of 256 for pretraining and 128 for supervised fine-tuning. To evaluate the efficiency metrics (FLOPs and Latency), we utilized the thop library and measured the wall-clock time with a batch size of 1 to simulate real-time streaming inference. For the LLM component, we utilized 4-bit quantization (NF4) to reduce memory usage, enabling deployment on devices with limited VRAM (approx. 6 GB required), though the primary latency benchmarks focus on the Mamba detection backbone.

4.2. Comparison with State-of-the-Art Methods

We compare MeeDet with various state-of-the-art (SOTA) methods in IIoT/IT environments, including (1) the abnormal detection-based method, CVAE-CatBoost [5]; (2) the machine learning-based method, CART [10]; (3) the deep learning methods, CNN-LSTM [14]; (4) the pre-training method, ET-BERT [15]. These different types of methods provide a baseline for comprehensive comparison, enabling a more comprehensive evaluation of our proposed model. Baseline methods were selected to represent three dominant paradigms in IIoT traffic detection: anomaly detection (CVAE-CatBoost), classical machine learning (CART), sequence-based deep learning (CNN-LSTM), and large-scale pretrained models (ET-BERT). ET-BERT serves as a particularly strong reference point, as it represents the state of the art in pretrained traffic modeling. Comparisons against these baselines highlight not only accuracy gains but also the efficiency advantages of conditional computation, which are underexplored in recent IIoT-focused works.

4.2.1. General Malicious Traffic Detection in IIoT

In Table 2, we present a comparative evaluation of five methods for general malicious traffic detection in IIoT. CVAE-CatBoost prioritizes RC but yields the weakest overall balance; on Edge-IIoTset, its F1 is approximately 0.88, which is notably below the stronger baselines and reflects the PR–RC trade-off typical of reconstruction-based detectors under heterogeneous traffic. CART improves upon CVAE-CatBoost on simpler traffic, reaching about 0.88 AC on Edge-IIoTset, but its F1 declines to roughly 0.87 on temporally rich data such as X-IIoTID, suggesting limited capacity to model long-range dependencies. CNN-LSTM leverages convolutional features and recurrent sequence modeling; it attains nearly 0.93 F1 on Edge-IIoTset yet remains below 0.90 F1 on X-IIoTID, indicating sensitivity to distributional shifts and complex temporal patterns. ET-BERT consistently ranks at or near the top, with around 0.94 F1 on CICIoT2023 and close to 0.99 F1 on X-IIoTID, which highlights the benefits of large-scale pretraining and contextual embeddings for IIoT traffic. MeeDet achieves competitive overall performance, with the strongest results on some datasets and metrics, reaching about 0.97 F1 on Edge-IIoTset while maintaining F1 above 0.93 on TON-IoT. In particular, on the CICIoT2023 dataset, MeeDet’s F1-score (0.9312) is slightly lower than that of ET-BERT (0.9365). These findings indicate that anomaly-detection pipelines emphasize RC but can underperform on balanced metrics in mixed traffic, classical learners and standard deep models benefit from temporal cues but struggle with long-horizon or distributionally complex regimes, and pretrained contextual modeling augmented with efficient state-space sequence modeling and early exits improves robustness and maintains accuracy even when high-confidence samples terminate early. Overall, MeeDet offers a favorable accuracy–efficiency trade-off for practical IIoT deployment.

4.2.2. APT Traffic Detection in IIoT

Figure 2 presents row-normalized confusion matrices for four methods (CART, CNN-LSTM, ET-BERT, and MeeDet) in the CICAPT-IIoT2024 dataset. The results reflect key properties of APT campaigns, including multi-stage progression, stealthy command-and-control, and temporally extended behavior. All methods achieve strong recall on Collection and Defence Evasion, where traffic patterns are comparatively distinctive; ET-BERT and MeeDet exceed 90% and display the sharpest diagonals. Systematic confusion is most evident between C2 and Exfiltration, consistent with low-and-slow beaconing and staged data transfers that blur the operational boundary between control and leakage. CART and CNN-LSTM exhibit additional leakage from Persistence and Discovery, suggesting limited ability to capture long-range temporal context and to separate these stages from benign maintenance-like activity. Errors are also concentrated between Credential Access and Lateral Movement, which often occur in rapid succession during APT pivots; pretrained contextual representations in ET-BERT reduce this coupling, while MeeDet further mitigates it through state-space sequence modeling and improved temporal aggregation. Despite class imbalance across stages, ET-BERT and MeeDet maintain higher per-class recall than CART and CNN-LSTM, indicating better robustness. Overall, the matrices demonstrate that pretrained contextual modeling substantially improves discrimination among APT stages, and that MeeDet provides the most consistent stage-level separation for APT traffic detection. Note that CVAE-CatBoost is formulated as a binary anomaly detector rather than a multi-class classifier. Consequently, it is not included in the stage-wise analysis presented in this figure and is omitted from the discussion in this section.

4.2.3. Detection Efficiency Evaluation

As summarized in Table 3, we evaluate robustness to class imbalance by varying the benign–malicious traffic ratio from 1% to 20%. The proposed MeeDet, which couples a Mamba state-space backbone with an adaptive early-exit mechanism, maintains state-of-the-art detection accuracy while markedly improving efficiency. Across all ratios, MeeDet sustains F1 around 0.98 (for example, 0.9775 at 1% and 0.9825 at 20%), matching or slightly surpassing ET-BERT (0.9766–0.9850). The efficiency gap is substantial. At 1% malicious traffic, MeeDet requires about 1.7 MFLOPs and 1.58 ms per inference, whereas ET-BERT consumes about 6.26 MFLOPs and 10.16 ms; at 20%, MeeDet remains at about 5.6 MFLOPs and 6.81 ms. Relative to CNN–LSTM and CART, MeeDet delivers higher F1 with comparable or lower latency. These results indicate that integrating Mamba with adaptive early exiting preserves accuracy under varying class imbalance while significantly reducing compute and inference time, supporting real-time IIoT deployment.

Moreover, the adaptive early-exit architecture is central to balancing efficiency and accuracy in MeeDet, particularly under class imbalance. Specifically, by attaching lightweight classifiers to successive layers of the Mamba state-space backbone, the model performs conditional computation. In this process, “easy” inputs terminate at shallow depths once they meet a calibrated confidence threshold. Conversely, “hard” inputs, which are often minority or ambiguous malicious cases, are automatically propagated to deeper layers for richer temporal modeling before a final decision. This design preserves accuracy because uncertain samples still receive full-depth processing. At the same time, it improves average-case efficiency because a large portion of benign flows exit early when their representations become linearly separable. Consistent with this behavior, the measured MFLOPs increase from the 1% to 20% malicious settings. This trend indicates that more inputs traverse deeper exits as task difficulty rises, rather than reflecting a fixed-budget speedup. In addition, intermediate heads act as regularizers that enhance probability calibration at shallow stages. This reduces overconfident errors while the final head safeguards recall on rare classes. Collectively, these properties explain why MeeDet maintains near-constant F1 scores with markedly lower computational cost and latency. Thus, it provides a principled and practical mechanism for real-time IIoT deployment under dynamically shifting class ratios.

All results in Table 3 are averaged over multiple runs with different random seeds. We observe a minor F1-score fluctuation at the 10% imbalance ratio, where the performance is slightly lower than at 5% and 20%. However, the difference is small and the F1-score remains comparable to that observed at the extreme 1% imbalance setting. We attribute this variation to the randomness in sample and class selection under different imbalance configurations, which may lead to differences in the specific attack category composition. Overall, MeeDet demonstrates stable performance across a wide range of class imbalance ratios, indicating robustness to skewed traffic distributions.

The early-exit strategy is calibrated using validation-driven thresholds rather than fixed heuristics. When the malicious ratio decreases from 20% to 1%, the average exit depth shifts toward shallower layers, yielding substantial efficiency gains without degrading F1-score. This behavior indicates robustness to extreme class imbalance, as uncertain or minority samples are naturally routed to deeper layers. Moreover, the monotonic increase in FLOPs with higher malicious ratios confirms that efficiency improvements stem from conditional computation, not from under-processing difficult cases. This adaptive behavior is particularly important in IIoT environments, where attack rates fluctuate over time.

4.3. Ablation Study

To investigate the individual contributions of the core components in MeeDet, specifically the Mamba-based architecture, self-supervised pre-training, and the early-exit mechanism, we conducted an ablation study on the Edge-IIoTset dataset. We compared the full MeeDet model against three distinct variants. Variant A removes the pre-training stage and relies solely on supervised fine-tuning. Variant B discards the dynamic early-exit mechanism, forcing all samples to traverse the entire network. Variant C replaces the Mamba architecture with a Transformer encoder of similar size while retaining the other strategies. The quantitative results are summarized in Table 4.

First, the impact of the self-supervised pre-training strategy is significant. As shown in the comparison between MeeDet and Variant A (w/o Pre-training), the F1-score declines sharply from 98.34% to 94.12%. This degradation indicates that the Masked Byte Modeling (MBM) and Next-Flow Prediction (NFP) tasks are crucial for learning robust feature representations from unlabeled traffic. Without these tasks, the model struggles to generalize effectively and is prone to overfitting on the limited labeled data during fine-tuning.

Second, the efficacy of the early-exit mechanism is demonstrated by comparing MeeDet with Variant B (w/o Early Exit). Although Variant B achieves a marginally higher F1-score of 98.45%, it incurs a substantial computational penalty. The average FLOPs increase from 1.75 M to 12.40 M, which represents an approximate

7 \times

increase in computational cost. MeeDet achieves comparable detection accuracy, within a 0.11% margin, while drastically reducing computational overhead. This result validates the premise that the majority of traffic samples can be reliably classified at shallow layers without processing the full network depth.

Finally, the architectural advantage of Mamba over traditional attention mechanisms is highlighted by Variant C (Transformer Backbone). Despite utilizing the same pre-training and early-exit mechanisms, the Transformer-based variant yields a lower F1-score of 96.88% and higher latency of 3.85 ms compared to MeeDet. This suggests that the linear complexity and selective state-space modeling of Mamba are better suited for capturing the long-range temporal dependencies inherent in IIoT traffic sequences than standard self-attention mechanisms. Collectively, these results confirm that the integration of all three components is essential for achieving the optimal balance between accuracy and efficiency. The ablation results reveal structural insights beyond raw performance changes. Removing pretraining degrades F1 substantially, indicating that MBM and NFP help the model internalize protocol syntax and inter-flow temporal logic before supervision. Disabling early exit preserves accuracy but incurs a sevenfold increase in computation, confirming that most benign flows become linearly separable at shallow layers. Replacing Mamba with a Transformer increases latency and reduces accuracy, suggesting that state-space dynamics are better aligned with IIoT traffic regularity than attention-based token interactions.

The ablation study focuses on understanding the overall trade-off between accuracy, false-positive behavior, and inference efficiency introduced by the early-exit mechanism. While a per-exit-layer breakdown could provide additional granularity, the aggregated results already capture the dominant trend: Earlier exits reduce computation with marginal impact on detection performance.

4.4. Layer-Wise Analyses

Feature Evolution across Early-Exit Layers. We select Flow Duration, Total Length of Packets, and Flow IAT Mean as representative features according to [35]. Figure 3 visualizes their distributions across MeeDet’s early exits, showing that deeper layers yield progressively sharper and more separable patterns, indicating refined temporal representations by the Mamba backbone. Benign traffic forms compact, near-unimodal clusters at shallow exits, enabling high-confidence early termination, while malicious or ambiguous samples maintain broader or multimodal spreads and are routed to deeper layers for additional context. This stratified behavior aligns with conditional computation: Many benign flows exit early, reducing FLOPs and latency, whereas hard cases receive full-depth processing to preserve accuracy. Combined with Table 2, these results demonstrate that early exiting improves average-case efficiency without degrading detection under class imbalance.

False Alarm Analysis. In Figure 4, we report simulated false positive rates (FPR) for early-exit layers under three confidence thresholds

τ \in {0.80, 0.90, 0.95}

. For an exit layer l, FPR is measured on a held-out validation set by forcing a decision at head l and classifying a flow as benign when its maximum posterior probability is greater than

τ

; otherwise, the flow is labeled malicious. Two consistent trends emerge. First, FPR decreases with depth, reflecting better-calibrated representations and improved benign–malicious separability in deeper layers. Second, FPR decreases as

τ

increases, indicating that stricter early-exit policies reduce false alarms at the cost of fewer early terminations and higher average computation. These operating characteristics guide the choice of

τ

to meet deployment-specific constraints on efficiency and false-alarm budgets. The thresholds shown here are for sensitivity analysis only and are not the calibrated deployment thresholds used in the system.

4.5. Comprehensibility

While MeeDet efficiently identifies malicious traffic flows, numerical classification labels lack semantic comprehensibility for security analysts. To address this, we integrate a Large Language Model (LLM) module designed to generate actionable threat intelligence reports based on the detected anomalies. We employ the Llama-3-8B-Instruct model as the core semantic engine. This model was selected for its balance between reasoning capability and computational efficiency. To ensure feasible deployment on edge servers alongside the detection module, we utilize 4-bit NormalFloat (NF4) quantization, which reduces the memory footprint by approximately 70% with negligible degradation in reasoning performance. The model is deployed using the vLLM library to optimize inference throughput.

To mitigate hallucinations and ensure domain relevance, we design a structured prompt template incorporating Chain-of-Thought (CoT) reasoning. The input to the LLM consists of three components: (1) the encrypted network traffic features extracted by MeeDet, which are limited to header-only and payload-derived metadata (e.g., packet lengths, timing, and protocol handshake fields), with no payload content retained; (2) the predicted attack class (e.g., DDoS, XSS); (3) a retrieved context snippet from the MITRE ATT&CK framework corresponding to the predicted class. We explicitly instruct the model to follow a three-step reasoning process: Analyze the traffic syntax, correlate it with the known attack signature, and finally, generate a concise remediation suggestion. The temperature parameter is set to 0.1 to maximize determinism in the generated responses. Given the generative nature of LLMs, ensuring the factual accuracy of the output is critical. We utilize a regex-based parser to verify that the output adheres to the specified JSON schema (e.g., containing fields for “Root Cause” and “Action Item”). Outputs failing this check are automatically rejected and regenerated.

Figure 5 depicts the closed-loop after a flow is flagged as malicious. The system extracts salient fields from the flow record—five-tuple, timing/periodicity and burstiness, beacon score, SNI/certificate metadata, destination AS/ASN, and novelty indicators—and serializes them into a structured prompt. The LLM returns a context-aware report with a verdict and confidence, evidence tied to specific fields, recommended countermeasures for the egress setting, potential false-positive causes, and required follow-up telemetry. This evidence-conditioned exchange provides actionable, auditable guidance that accelerates analyst response.

The LLM-based analysis module is intentionally designed as an asynchronous, non-blocking component. Its output latency (seconds) does not affect packet forwarding or real-time detection, which remain governed solely by the Mamba-based backbone. In practice, this module is suitable for edge servers or SOC-side deployments rather than ultra-constrained field devices. Its role is to assist post-alert triage, root-cause analysis, and response planning, rather than inline enforcement. This separation ensures that MeeDet remains deployable in real-time IIoT environments while still benefiting from high-level semantic reasoning.

We conduct an expert-judgment evaluation on 120 LLM-generated incident analysis reports, randomly sampled from the test set and covering diverse attack categories. Each report is independently evaluated by three domain experts along usefulness and faithfulness dimensions using a 5-point Likert scale. Inter-rater agreement, measured using Cohen’s

κ

, is 0.71, indicating substantial agreement among evaluators beyond chance-level consistency. The generated reports achieve an average usefulness score of 4.18 (95% CI: [4.06, 4.30]), where the CI (Confidence Interval) reflects estimation uncertainty across evaluated samples, and a faithfulness score of 4.05 (95% CI: [3.92, 4.18]). Factual consistency analysis further shows that 92.3% of the evaluated reports are fully grounded in the extracted traffic features (95% CI: [88.1%, 95.4%]), with no unsupported claims detected. Stability checks across multiple random seeds and decoding temperatures show no statistically significant degradation in evaluation scores. We emphasize that this expert-judgment evaluation is advisory rather than ground truth, and is intended to provide a structured and reproducible assessment of the interpretability and practical utility of the generated explanations.

5. Conclusions

This paper presents MeeDet, a dedicated solution for IIoT malicious traffic detection that addresses the core challenges of labeled data scarcity, high inference cost, and poor comprehensibility through three key innovations: First, the self-supervised pretraining framework (MBM + NFP) effectively leverages unlabeled IIoT traffic to learn discriminative multimodal features, mitigating the reliance on scarce labeled data—a critical pain point in industrial scenarios. Second, the dynamic early-exit mechanism terminates over 85% of benign samples in shallow Mamba layers, drastically reducing average latency and computational overhead while preserving detection accuracy, enabling deployment on resource-constrained edge devices. Third, the LLM-assisted analysis module transforms opaque model predictions into structured reports with evidence chains (e.g., periodic IAT for C2 beaconing) and concrete mitigations (e.g., host isolation, SNI blocking), solving the “black-box” limitation of deep learning for IIoT maintenance. Experimental results validate MeeDet’s practical value: It achieves state-of-the-art accuracy in general malicious traffic and APT detection, maintains robustness under class imbalance (1–20% malicious ratios), and delivers significant efficiency gains over existing methods. These properties confirm MeeDet as a viable approach for securing IIoT environments.

Author Contributions

Conceptualization, J.S. and P.J.; methodology, J.S.; software, J.S.; validation, J.S., Y.W. and S.J.; formal analysis, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S. and S.J.; visualization, J.S.; supervision, Y.W.; project administration, S.J.; funding acquisition, S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62472456.

Institutional Review Board Statement

This study exclusively utilizes publicly available Industrial Internet of Things (IIoT) traffic datasets that are released for research purposes. All datasets used in this research are fully anonymized and contain no personally identifiable information (PII). The proposed detection framework is designed solely for defensive cybersecurity applications, including intrusion detection and operational analysis, and does not involve user profiling or content inspection. Since no human subjects were involved in this research, ethical approval and informed consent were not required, and no review by an Institutional Review Board (IRB) or ethics committee was necessary.

Data Availability Statement

The data analyzed in this study are publicly available IIoT traffic datasets released for research purposes. No new data were created in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ghiasvand, E.; Ray, S.; Iqbal, S.; Dadkhah, S.; Ghorbani, A.A. CICAPT-IIOT: A provenance-based APT attack dataset for IIoT environment. arXiv 2024, arXiv:2407.11278. [Google Scholar] [CrossRef]
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Al-Hawawreh, M.; Sitnikova, E.; Aboutorab, N. X-IIoTID: A Connectivity-Agnostic and Device-Agnostic Intrusion Data Set for Industrial Internet of Things. IEEE Internet Things J. 2022, 9, 3962–3977. [Google Scholar] [CrossRef]
Barracuda Networks. The State of Industrial Security in 2022; Technical Report; Barracuda Networks: Campbell, CA, USA, 2022; Available online: https://www.barracuda.com/products/network-protection/industrial-security (accessed on 12 January 2026).
Zhang, Z.; Zong, X.; He, K.; Lian, L. Research on Abnormal Traffic Detection in Industrial Control Network Based on CVAE-CatBoost. Comput. Eng. 2023, 49, 173–180. [Google Scholar] [CrossRef]
Chen, L.; Cao, X.; He, T.; Xu, Y.; Liu, X.; Hu, B. A lightweight All-MLP time-frequency anomaly detection for IIoT time series. Neural Netw. 2025, 187, 107400. [Google Scholar] [CrossRef] [PubMed]
Poorazad, S.K.; Benzaïd, C.; Taleb, T. A Novel Buffered Federated Learning Framework for Privacy-Driven Anomaly Detection in IIoT. In Proceedings of the 2024 IEEE Global Communications Conference, GLOBECOM 2024, Cape Town, South Africa, 8–12 December 2024; IEEE: New York, NY, USA, 2024; pp. 1725–1730. [Google Scholar] [CrossRef]
Feng, Y.; Chen, J.; Liu, Z.; Lv, H.; Wang, J. Full Graph Autoencoder for One-Class Group Anomaly Detection of IIoT System. IEEE Internet Things J. 2022, 9, 21886–21898. [Google Scholar] [CrossRef]
Han, G.; Tu, J.; Liu, L.; Martínez-García, M.; Peng, Y. Anomaly Detection Based on Multidimensional Data Processing for Protecting Vital Devices in 6G-Enabled Massive IIoT. IEEE Internet Things J. 2021, 8, 5219–5229. [Google Scholar] [CrossRef]
Sangodoyin, A.; Akinsolu, M.O.; Pillai, P.; Grout, V. Detection and Classification of DDoS Flooding Attacks on Software-Defined Networks: A Case Study for the Application of Machine Learning. IEEE Access 2021, 9, 122495–122508. [Google Scholar] [CrossRef]
Kasongo, S.M. An Advanced Intrusion Detection System for IIoT Based on GA and Tree Based Algorithms. IEEE Access 2021, 9, 113199–113212. [Google Scholar] [CrossRef]
Gao, C.; Zhao, X.; Wang, X.; Wang, L.; Fan, Z.; Yao, Y.; Jiang, Z. A Variant and Flow-Level AutoML Method for IoT Malicious Traffic Detection. In Proceedings of the 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025, Compiegne, France, 5–7 May 2025; Shen, W., Abel, M., Matta, N., Barthès, J.A., Luo, J., Zhang, J., Zhu, H., Peng, K., Eds.; IEEE: New York, NY, USA, 2025; pp. 177–182. [Google Scholar] [CrossRef]
Wang, C.; Gao, C.; He, F.; He, S.; Liu, R.; Li, Q.; Chen, W.; Wang, X. Exploring the Effectiveness of Traditional Machine Learning Models in IoT Malicious Traffic Detection. In Proceedings of the 28th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2025, Compiegne, France, 5–7 May 2025; Shen, W., Abel, M., Matta, N., Barthès, J.A., Luo, J., Zhang, J., Zhu, H., Peng, K., Eds.; IEEE: New York, NY, USA, 2025; pp. 740–745. [Google Scholar] [CrossRef]
Zainudin, A.; Ahakonye, L.A.C.; Akter, R.; Kim, D.; Lee, J. An Efficient Hybrid-DNN for DDoS Detection and Classification in Software-Defined IIoT Networks. IEEE Internet Things J. 2023, 10, 8491–8504. [Google Scholar] [CrossRef]
Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification. In Proceedings of the WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, 25–29 April 2022; Laforest, F., Troncy, R., Simperl, E., Agarwal, D., Gionis, A., Herman, I., Médini, L., Eds.; ACM: New York, NY, USA, 2022; pp. 633–642. [Google Scholar] [CrossRef]
Dai, J.; Xu, X.; Gao, H.; Xiao, F. CMFTC: Cross Modality Fusion Efficient Multitask Encrypt Traffic Classification in IIoT Environment. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3989–4009. [Google Scholar] [CrossRef]
Ge, Y.; Gao, Y.; Li, X.; Cai, B.; Xi, J.; Yu, S. EMTD-SSC: An Enhanced Malicious Traffic Detection Model Using Transfer Learning Under Small Sample Conditions in IoT. IEEE Internet Things J. 2024, 11, 30725–30741. [Google Scholar] [CrossRef]
Luo, Y.; Chen, X.; Sun, H.; Li, X.; Ge, N.; Feng, W.; Lu, J. Securing 5G/6G IoT Using Transformer and Personalized Federated Learning: An Access-Side Distributed Malicious Traffic Detection Framework. IEEE Open J. Commun. Soc. 2024, 5, 1325–1339. [Google Scholar] [CrossRef]
Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. In Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. [Google Scholar]
Guarino, I.; Wang, C.; Finamore, A.; Pescapè, A.; Rossi, D. Many or Few Samples?: Comparing Transfer, Contrastive and Meta-Learning in Encrypted Traffic Classification. In Proceedings of the 7th Network Traffic Measurement and Analysis Conference, TMA 2023, Naples, Italy, 26–29 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–10. [Google Scholar] [CrossRef]
Dao, T.; Gu, A. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In Proceedings of the Forty-First International Conference on Machine Learning, ICML 2024, Vienna, Austria, 21–27 July 2024. [Google Scholar]
Zhang, P.; Chen, F.; Yue, H. Detection and utilization of new-type encrypted network traffic in distributed scenarios. Eng. Appl. Artif. Intell. 2024, 127, 107196. [Google Scholar] [CrossRef]
Wang, T.; Xie, X.; Wang, W.; Wang, C.; Zhao, Y.; Cui, Y. Netmamba: Efficient Network Traffic Classification Via Pre-Training Unidirectional Mamba. In Proceedings of the 32nd IEEE International Conference on Network Protocols, ICNP 2024, Charleroi, Belgium, 28–31 October 2024; IEEE: New York, NY, USA, 2024; pp. 1–11. [Google Scholar] [CrossRef]
Wang, Z.; Thing, V.L.L. Feature mining for encrypted malicious traffic detection with deep learning and other machine learning algorithms. Comput. Secur. 2023, 128, 103143. [Google Scholar] [CrossRef]
Ucci, D.; Sobrero, F.; Bisio, F.; Zorzino, M. Near-real-time Anomaly Detection in Encrypted Traffic using Machine Learning Techniques. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 5–7 December 2021; pp. 1–8. [Google Scholar] [CrossRef]
Teerapittayanon, S.; McDanel, B.; Kung, H.T. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; IEEE: New York, NY, USA, 2016; pp. 2464–2469. [Google Scholar]
Bachman, P.; Hjelm, R.D.; Buchwalter, W. Learning representations by maximizing mutual information across views. Adv. Neural Inf. Process. Syst. 2019, 32, 15535–15545. [Google Scholar]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
Huang, G.; Chen, D.; Li, T.; Wu, F.; Van Der Maaten, L.; Weinberger, K.Q. Multi-scale dense convolutional networks for efficient prediction. arXiv 2017, arXiv:1703.09844. [Google Scholar]
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.N.; Anwar, A. TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
Mao, J.; Wei, Z.; Li, B.; Zhang, R.; Song, L. Toward Ever-Evolution Network Threats: A Hierarchical Federated Class-Incremental Learning Approach for Network Intrusion Detection in IIoT. IEEE Internet Things J. 2024, 11, 29864–29877. [Google Scholar] [CrossRef]
Chang, Y.; Chen, J.; Su, R.; Xie, J.; Li, A. Two-Phase Dual-Adversarial Agents with Multivariate Information for Unsupervised Anomaly Detection of IIoT-Edge Devices. IEEE Internet Things J. 2024, 11, 23577–23591. [Google Scholar] [CrossRef]
Zhao, M.; Fink, O. DyEdgeGAT: Dynamic Edge via Graph Attention for Early Fault Detection in IIoT Systems. IEEE Internet Things J. 2024, 11, 22950–22965. [Google Scholar] [CrossRef]
Zhang, X.; Lu, J.; Sun, J.; Xiao, R.; Jin, S. MEMTD: Encrypted Malware Traffic Detection Using Multimodal Deep Learning. In Web Engineering; Di Noia, T., Ko, I.Y., Schedl, M., Ardito, C., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; pp. 357–372. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of the proposed MeeDet framework. It consists of four phases: pre-processing for traffic tokenization, self-supervised pre-training using a Mamba backbone, fine-tuning with a dynamic early-exit mechanism, and LLM-assisted analysis for report generation. The figure illustrates how different samples dynamically follow different inference paths and exit at different layers based on calibrated confidence thresholds.

Figure 2. Confusion matrices for APT traffic detection across CART, CNN-LSTM, ET-BERT, and MeeDet.

Figure 3. Distribution of normalized flow features at different early-exit depths. (a) Flow Duration, (b) Total Length of Packets, and (c) Flow IAT Mean across Exit Layers 1, 3, 5, 7, 9, 11, and the Final Exit.

Figure 4. Exit-layer FPR under confidence thresholds

τ \in {0.80, 0.90, 0.95}

larger

τ

is stricter; lower indices denote shallower layers.

Figure 4. Exit-layer FPR under confidence thresholds

τ \in {0.80, 0.90, 0.95}

larger

τ

is stricter; lower indices denote shallower layers.

Figure 5. Post-detection workflow with LLM-assisted triage for malicious samples.

Table 1. The statistical information of the datasets.

Dataset	#Flow	#Label
CICIIoT2023 [30]	5,214,625	7
Edge-IIoTset [2]	2,287,781	5
X-IIoTID [3]	973,213	10
TON-IoT [31]	896,097	9
CICAPT-IIoT2024 [1]	1,463,863	9

Table 2. Balanced performance comparison of different methods on various datasets.

Method		CVAE-CatBoost	CART	CNN-LSTM	ET-BERT	MeeDet
Edge-IIoTset	AC	0.8520	0.8840	0.9010	0.9890	0.9870
	PR	0.8230	0.8510	0.8750	0.9030	0.9760
	RC	0.8810	0.9020	0.9230	0.9350	0.9910
	F1	0.8512	0.8761	0.8986	0.9188	0.9834
CICIoT2023	AC	0.8310	0.8630	0.8920	0.9150	0.9780
	PR	0.8020	0.8340	0.8610	0.9720	0.9680
	RC	0.8630	0.8850	0.9040	0.9210	0.9820
	F1	0.8315	0.8589	0.8821	0.9365	0.9312
X-IIoTID	AC	0.8430	0.8720	0.9030	0.9340	0.9850
	PR	0.8140	0.8420	0.8730	0.9790	0.9750
	RC	0.8720	0.8930	0.9120	0.9930	0.9920
	F1	0.8423	0.8671	0.8922	0.9859	0.9834
TON-IoT	AC	0.8240	0.8520	0.8830	0.9810	0.9760
	PR	0.7930	0.8210	0.8520	0.8830	0.9670
	RC	0.8520	0.8730	0.8940	0.9850	0.9830
	F1	0.8218	0.8464	0.8727	0.9326	0.9285

Note: Metrics in bold indicate the optimal value.

Table 3. Performance and efficiency under varying malicious ratios.

Method	Mal. Ratio	FLOPs (M)	Time-Cost (ms)	F1
CVAE-CatBoost	1%	-	0.75	0.8572
	5%			0.8625
	10%			0.8650
	20%			0.8703
CART	1%	-	3.17	0.8975
	5%			0.9030
	10%			0.9058
	20%			0.9112
CNN-LSTM	1%	2.72	0.96	0.9175
	5%			0.9220
	10%			0.9250
	20%			0.9315
ET-BERT	1%	6.26	10.16	0.9766
	5%			0.9720
	10%			0.9850
	20%			0.9815
MeeDet	1%	1.75	1.58	0.9775
	5%	2.12	2.82	0.9830
	10%	3.99	3.91	0.9750
	20%	5.63	6.81	0.9825

Note: Metrics in bold indicate the optimal value.

Table 4. Ablation study of component effectiveness on Edge-IIoTset.

Model Variant	Backbone	Pre-Train	Early Exit	F1 (%)	FLOPs (M)	Time (ms)
MeeDet (Ours)	Mamba	Yes	Yes	98.34	1.75	1.58
Variant A	Mamba	No	Yes	94.12	2.15	1.92
Variant B	Mamba	Yes	No	98.45	12.40	11.20
Variant C	Transformer	Yes	Yes	96.88	4.25	3.85

Note: Metrics in bold indicate the optimal value.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, J.; Jin, P.; Wang, Y.; Jin, S. MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios. Electronics 2026, 15, 1017. https://doi.org/10.3390/electronics15051017

AMA Style

Sun J, Jin P, Wang Y, Jin S. MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios. Electronics. 2026; 15(5):1017. https://doi.org/10.3390/electronics15051017

Chicago/Turabian Style

Sun, Jiakun, Pengfei Jin, Yabo Wang, and Shuyuan Jin. 2026. "MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios" Electronics 15, no. 5: 1017. https://doi.org/10.3390/electronics15051017

APA Style

Sun, J., Jin, P., Wang, Y., & Jin, S. (2026). MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios. Electronics, 15(5), 1017. https://doi.org/10.3390/electronics15051017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MeeDet: Efficient Malicious Traffic Detection Method via Mamba-Based Early-Exit Mechanism in IIoT Scenarios

Abstract

1. Introduction

2. Related Works

2.1. Mamba Structure

2.2. Early-Exit Mechanisms

3. MeeDet

3.1. Overview

3.2. Data Pre-Processing

3.3. Pretraining

3.4. Fine-Tuning with Early Exits

3.5. LLM-Driven Analysis

4. Experiments

4.1. Experiment Setup

4.1.1. Datasets and Downstream Tasks

4.1.2. Evaluation Metrics

4.1.3. Implementation Details

4.2. Comparison with State-of-the-Art Methods

4.2.1. General Malicious Traffic Detection in IIoT

4.2.2. APT Traffic Detection in IIoT

4.2.3. Detection Efficiency Evaluation

4.3. Ablation Study

4.4. Layer-Wise Analyses

4.5. Comprehensibility

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI