1. Introduction
Industry 4.0 is rapidly evolving to create a highly digital environment that will soon become a part of our daily lives. The IIoS, along with the IIoT, is a significant pillar of Industry 4.0, adding value and contributing to human interest. IIoS is an integral component of Industry 4.0, similar to IIoT. In a general context, the IIoT acts as the “eyes” of a system, providing visibility through connected sensors and devices that collect and transmit data. In contrast, IIoS functions as the “brain,” interpreting data and providing intelligent services to improve industrial efficiency. The switch from traditional networks to IoT has recently revolutionized the global economy. IIoS infrastructure can support smart manufacturing, agriculture, health, government, cities, logistics, and home automation. The IIoS is a new concept that is still emerging. IIoS is a way to connect products and services to work together more seamlessly for humans. IIoS is a new concept that is still emerging and ultimately allows products and services to work together more seamlessly.
Figure 1 depicts the overall architecture of Industry 4.0, which blends the Industrial Internet of Things (IIoT) and cloud computing under an Industrial Internet of Services (IIoS) framework that offers more smart and intelligent services such as smart manufacturing systems. In Industry 4.0, incoming data from smart connected devices and sensors are transmitted via IIoT to cloud/edge-based platforms, which enables advanced analytics, real-time monitoring, and centralized control. Organizations can optimize processes, improve efficiency, and rapidly adapt to changing market or operational demands by layering service-oriented technologies, such as smart manufacturing and intelligent service delivery, atop this infrastructure.
According to a recent statistical report, the market value of IoT was
$1.90 billion in 2018,
$25 billion in 2020, and
$925.2 billion as of 2023. It is forecasted to hit
$6 trillion in 2025, which makes the compound annual growth rate (CAGR) from 2018 to 2025 approximately 15.12%. The report declared that 25 billion devices were connected before 2020, with 50 billion permanent connections and over 200 billion intermittent connections. The report also projects that 29.4 billion by 2023 [
1]. These trends suggest that IoT adoption and investment continue to accelerate, with manufacturing poised to capture a significant share. Intel projected that the market value for IoT could hit 6.2 trillion dollars by 2025, and a substantial percentage of it is in manufacturing [
2,
3]. The growing trends of IoT can establish that rapid growth and high connectivity underscore the increasing importance of the IIoS. Factories adopt more sensors, edge devices, and data-driven solutions, which provide opportunities for advanced analytics, real-time monitoring, and predictive smart services. These projected shifts increase operational efficiency and assemble new revenue models that ultimately transform manufacturing processes through connectivity-based industrial service paradigms.
The IIoS can underpin numerous smart manufacturing services and operations, connecting machines and sensors, and control real-time systems. However, they are still vulnerable to cyber attacks that can disrupt production lines, damage machinery, or even endanger human operators. An unsecured IIoS architecture risks costly downtime, breaking the supply chain, and theft of sensitive intellectual property (IP). Attackers manipulating complex networks can vandalize manufacturing runs, produce inadequate products, or terminate operations, leading to significant financial and reputational losses. Therefore, robust cybersecurity measures are essential for maintaining smooth, safe, and resilient industrial operation.
Although intrusion detection within the industrial ecosystem has progressed in recent years, existing solutions often rely on static architectures that struggle to adapt to rapidly evolving, sophisticated threats, and typically neglect inherent data imbalance problems that impact the overall performance. In industrial landscapes, benign network traffic extensively surpasses malicious events, making it crucial to address data imbalances such that rare yet highly consequential attacks are not overlooked. Moreover, industrial data contain problematic noise generated by industrial equipment, sensors, and complex network topologies, which can degrade the performance of traditional machine-learning methods. Similarly, IIoS/IIoT-enabled industries can produce random noise in operational data generated by robotic arms, sporadic sensor calibrations, and inconsistent power supply, which can cause disruptions in conventional machine learning-based cyber defense techniques because conventional machine learning models are highly dependent on data. Furthermore, traditional solutions often struggle with dynamic conditions, lacking mechanisms to adapt swiftly as new threats emerge and underlying data shifts.
Considering these cybersecurity challenges, the proposed DA-MBA model offers a solution that can clean data from corrupted inputs by addressing the challenge of noise and shifting data distributions by learning to reconstruct. In smart industries, there is a chance of sensor malfunctions, environmental disturbances, and complex network protocols that can introduce anomalies that degrade the reliability of conventional machine-learning classifiers. The proposed solution overcomes this problem by training the autoencoder on noisy samples created via Gaussian noise, which learns a compressed and more consistent representation of network traffic that highlights essential features while canceling out noise. This DAE process yields a feature space that is significantly more robust to changes and irregularities in the data and allows the classifier to focus on meaningful patterns rather than spurious fluctuations. The proposed model, DA-MBA, is further described as more adaptive, with appropriate hyperparameter tuning using the Optuna framework and suitable regularization, allowing the model to keep pace with the dynamic nature of the threats. After feature extraction and making the incoming feature space more appropriate, the pipeline employs a hybrid-pronged classification, in which the MLP can distinguish direct and non-sequential relationships between the denoised features. Moreover, MLP can quickly learn global patterns and correlations characteristic of malicious behavior, which can be critical for detecting attacks that exhibit clear patterns of anomalies. However, BiLSTM, on the other hand, handles the sequential and contextual aspects of data. Recently, the signatures of many attacks have evolved, even if the data points are not strictly chronological, such as the escalation of suspicious ports and the increase in abnormal traffic over time. Sometimes, unfolding and identifying interrelated patterns is very difficult, but BiLSTM’s recurrent structure is well suited for this task. The model integrates the outputs of both MLP and BiLSTM to provide immediate feature-level insights with sequential and context-based insights, resulting in a more comprehensive view of potential cyber attacks.
The proposed model is also integrated with automated hyperparameter search, dropout, L2 regularization, and class weighting, which allows the hybrid classifier to adapt to shifting conditions corresponding to retraining and refining its parameters as new threats or data distributions emerge. This proposed flexibility is crucial for staying effective in continuously evolving industrial networks, where complex and sophisticated exploits and rare attack variations can rapidly sabotage static detection methods. The proposed model was trained and tested on two widely recognized IIoT-based datasets, Edge-IIoTset and WUSTL-IIOT-2021 and achieved over 99% accuracy and a near 0.02 loss, underscoring its effectiveness in detecting the most sophisticated attacks.
The proposed study incorporates multiple regularization strategies throughout the training process, which makes the model immune from overfitting and maintains a robust performance on unseen data. The first step is to apply dropouts and L2 regularizations in the key layers to penalize large weights and minimize co-adaptation among neurons. Furthermore, the DAE intervenes by adding Gaussian noise, forcing the encoder to learn fundamental feature representations rather than memorizing noisy and temporary patterns. The Gaussian noise process injects the model more adaptively to evening cyber attacks, which can reshape its behavior to exploit conventional machine learning methods. The study also employs early stopping based on validation loss, halting training once performance no longer improves, and preventing the DA-MBA from overfitting to erroneous fluctuations in the training set. Moreover, automated hyperparameter optimization via the Optuna framework further refines the dropout rates, learning rates, and encoding dimensions of the network to strike an optimal balance between capacity and generalization. The datasets were split into training, validation, and test sets to ensure an unbiased performance estimation and robust hyperparameter tuning. Furthermore, the DA-MBA was statistically validated by comparing the training and test accuracy, loss, F1 scores, and ROC-AUC curves.
Need for Specialized Cyber Attack Detection in IIoS
The Industrial Internet of Services (IIoS) represents the convergence of advanced technologies and industrial systems, creating a paradigm shift in how industries operate and deliver services.
Figure 2 describes the overall architecture along with five layers as perception layer, network layer, service layer, application layer and business layer.
In Industry 4.0, numerous enterprises are adopting layered industrial architectures to influence the synergy between the IIoT and IIoS. As illustrated in
Figure 2, the perception layer is surrounded by field-level devices and sensors for data capture. The perception layer hosts a network of sensors, actuators, and smart devices, continuously streaming precise data regarding the equipment status, environmental conditions, and operational process variables. However, the network layer comprises communication protocols, routers, and gateways that securely transmit collected information. The network layer, where low-latency communication protocols, edge computing nodes, and smart network hardware solutions, such as intelligent routers and next-generation firewalls, ensure secure and efficient data routing. Networked data then travel to the service layer based on the IIoS, where cloud-based services, virtualized microservices, and adaptive computing resources facilitate advanced analytics, real-time monitoring and reporting, and on-demand scalability. Moreover, the application layer merges sophisticated services from the service layer with end-user dashboards, SCADA systems, and machine-to-machine APIs, thereby promoting remote visibility, advanced process automation, and automated decision-making. Eventually, the business layer relies on consolidated analytics to optimize production procedures, improve supply chain management, and discover novel remuneration streams.
A complete smart industrial architecture requires robust cybersecurity measures across all the layers. However, the IIoS imposes additional challenges that necessitate specialized network-based cyber attack detection solutions. Compared to conventional industrial networks, IIoS relies heavily on cloud-based smart services, as it fundamentally provides industrial services through cloud platforms. The IIoS offers virtualized micro-services and dynamic resource configurations in industrial architecture, often transiting multiple geographic regions and heterogeneous complex network domains. Moreover, a complex ecosystem of IIoS invites complicated communication patterns and a broader attack surface, making IIoS vulnerable to cyber threats such as distributed denial-of-service (DDoS), supply chain vulnerabilities, and man-in-the-middle (MITM) attacks. Furthermore, the IIoS merges real-time data feeds from sensors and smart edge devices, making anomalies more challenging to detect within high-volume traffic. Implementing a network-based intrusion detection solution that incorporates technologies such as deep packet inspection, behavioral analytics, and machine learning-driven anomaly detection can enable industries to continuously monitor critical data flows, isolate suspicious activity, and adjust to emerging cyber exploits.
This paper is organized as follows:
Section 2 describes the literature.
Section 3 discusses the research methodology.
Section 4 discusses the results and findings of this work, and finally,
Section 5 concludes the work presented in this paper and gives recommendations for future work.
2. Literature Review
The rapid expansion of the IIoS, where modern service-centric architectures and industrial processes assemble, has significantly increased the connectivity and complexity of modern industrial ecosystems. Despite previous research on cybersecurity in industrial contexts that has predominantly focused on IIoT, the unique operational dynamics of IIoS introduce additional attack vectors and vulnerabilities that necessitate more sophisticated defense methods. For example, service-level reliance, real-time resource management, and cross-domain data interactions significantly increase the attack surface and create new opportunities for intruders to launch sophisticated exploitation. Researchers are working on continual learning and transformers, but conventional pipelines are still inadequate for evolved and zero-day cyber attacks. In the following section (
Table 1), key advances in intrusion detection methods are critically reviewed, highlighting their contributions and results. Existing studies have highlighted the effectiveness of Machine Learning, Deep Learning, and Hybrid ML/DL heuristic methods for intrusion detection in IIoT infrastructures. Taking advantage of the above-mentioned heuristic approaches, ref. [
4] systematically investigated DL approaches and explored how convolutional autoencoders (CAEs) can be utilized to enhance side-channel attacks, which are a critical threat vector in cryptographic systems. The authors compared multiple CAE architectures and hyperparameter configurations, providing a better understanding of the design choices that best capture and compress high-dimensional patterns without compromising critical leakage information and detecting network-based cyber attacks. However, despite the promising results of this research, a principal limitation of this study is its focus on a relatively constrained set of experimental settings (e.g., a single device type or single cryptographic algorithm), which may limit the broader applicability of the findings. Another study [
5] presented a more robust approach by proposing an adaptive cyber attack prediction framework that integrates an enhanced genetic algorithm (GA) with deep learning (DL) techniques to dynamically identify and respond to emerging cybersecurity threats. The proposed method optimizes both feature selection and hyperparameter configuration settings in a deep neural network (DNN), resulting in improved predictive accuracy and reduced false positives compared with baseline models. However, a key limitation is that this approach, which combines a genetic algorithm and a deep learning pipeline, may become computationally demanding when scaled to large datasets.
Further studies [
6,
7] have concentrated on automating and optimizing security mechanisms to mitigate sophisticated cyber threats. The author [
6] implemented automated machine-learning (AutoML) pipelines specifically for DL-based malware detection, highlighting the reduced manual overhead in setting hyperparameter configurations and selecting the appropriate architecture. Through the systematic exploration of optimal network arrangements, the proposed approach facilitates rapid adaptation to evolving malware variations. In parallel, ref. [
7] presented an intrusion-detection framework precisely designed for IoT domains that combines selective feature engineering with lightweight classification to achieve a balance between accuracy and computational efficiency. Although both proposed solutions originate from different applications, they work on the same objective of streamlining model development to generate quicker and more effective responses to cyber hazards. However, questions remain regarding their scalability and adaptability in highly dynamic scenarios. Other studies [
8,
9,
10] focus on a shared objective: strengthening IoT and IIoT ecosystems against emerging cyber threats through advanced DL techniques. The study [
8] highlights hybrid deep learning techniques that combine various neural architectures, thereby improving detection robustness while considering the heterogeneity often inherent in IoT infrastructures. Moreover, ref. [
9] disseminated this perspective by proposing a hybrid deep random neural network for the industrial IoT (IIoT), demonstrating how specialized randomization mechanisms can mitigate overfitting and adapt quickly to new attack patterns. In parallel, ref. [
10] introduces a distributed attack detection framework powered by a deep learning (DL) architecture, improving scalability by enabling intrusion detection tasks to be distributed and coordinated across multiple nodes in an Internet of Things (IoT) environment. Overall, these studies demonstrate that deep learning strategies, whether hybrid, can yield substantial improvements in the detection of complex cyber attacks in complex, interconnected IoT environments. However, these studies also illustrate the ongoing challenges of computational overhead, real-time flexibility, and evolving nature of threat profiles.
Moreover, further studies [
11,
12,
13], which try to overcome recent challenges in cyber attack detection [
11], propose a DL-based intrusion detection pipeline tailored for IIoT, focusing on high detection accuracy and real-time responsiveness in typically under-resourced operational environments. Ref. [
12] expanded this perspective by emphasizing robust feature selection and overfitting mitigation through hybrid machine-learning approaches, which are crucial for handling large-scale noisy datasets that can weaken intrusion indicators. Similarly, ref. [
13] proposed a hybrid deep learning approach for network security, which highlights the potential of various neural network architectures to improve threat detection rates. Overall, studies have demonstrated that sophisticated learning pipelines can strictly be based on deep learning or hybrid methods and benefit from careful data preprocessing, feature engineering, and computational efficiency parameters. Moreover, studies have demonstrated the persistent need for adaptable, self-optimizing models that can survive evolving cyber hazards, particularly in complex IIoT scenarios, where even minor security breaches can lead to significant operational and financial consequences. Further studies [
11,
12,
13] investigate more robust architectures to overcome existing research gaps and detect evolved cyber hazards. The author [
11] leveraged a CNN to enhance network-based intrusion detection, revealing notable progress in detection accuracy while maintaining manageable computational complexity. Moreover, ref. [
12] extended the paradigm by merging neural networks with ML algorithms, effectively capturing network traffic data and detecting cyber hazards. Their proposed model not only improves the detection precision for varying threat types but also supports high performance. Meanwhile, ref. [
13] focused on a hyper-tuned, compact LSTM framework for hybrid intrusion detection, aiming to optimize resource utilization without compromising the performance. Together, these studies demonstrate the shift toward specialized and hybrid deep learning models that tackle both the high dimensionality and real-time limitations of complex networks, signifying that further improvements in architecture design, hyperparameter tuning, and scalability may generate even more robust intrusion-detection abilities.
However, despite the substantial benefits offered by the previously mentioned studies in applying deep learning (DL) and hybrid learning pipelines for intrusion detection, these approaches reveal constraints such as exposure to overfitting on noisy, imbalanced datasets, inadequate hyperparameter tuning, and insufficient measurement of real-time inference performance, particularly in resource-constrained IIoT contexts. Alternatively, the proposed model in this study integrates a denoising autoencoder (DAE) to handle noise robustly, and utilizes bidirectional long short-term memory (BiLSTM) alongside a multilayer perceptron (MLP) for complementary spatial-temporal feature extraction. The study also included automated hyperparameter optimization via the Optuna framework, ensuring that the proposed pipeline adapts effectively to diverse network conditions while minimizing manual intervention. Furthermore, the proposed research systematically measures decision time per detection, and the framework explicitly addresses operational latency concerns often overlooked in prior work, making it a more comprehensive and practical solution for real-world IIoT cyber defense.
State of Arts Cyber Attacks and Available Datasets
A Comprehensive List of Cyber Attacks provides an organized and detailed catalog of known cyber threats, vulnerabilities, and attack techniques that have been observed across different domains and industries in
Figure 3. This list typically includes various attack types such as phishing, ransomware, distributed denial-of-service (DDoS), man-in-the-middle (MITM), SQL injection, and advanced persistent threats (APTs), among others. For each attack type, the list often provides an overview of the methodology, impact, targeted systems, and historical examples of incidents. Such a resource is crucial for understanding the evolving landscape of cybersecurity threats, enabling organizations to identify potential risks, prioritize defense strategies, and implement robust incident response plans. A comprehensive attack list also serves as a foundation for researchers and practitioners to simulate real-world scenarios and test the resilience of their systems against emerging threats.
A Comprehensive List of Available Datasets offers an exhaustive inventory of datasets relevant to a variety of research and application domains, including cybersecurity, machine learning, healthcare, and industrial systems as given in
Table 2. This list typically includes information about datasets that are publicly available or accessible through partnerships, specifying their focus areas, formats, and intended use cases. For instance, in cybersecurity, datasets might cover network traffic logs, malware samples, or intrusion detection system (IDS) events, while other datasets may focus on areas like IoT, financial fraud, or natural language processing. By providing detailed descriptions, licensing terms, and potential applications, such a list supports researchers, developers, and analysts in identifying the most appropriate datasets for their needs, fostering innovation and reproducibility in research while accelerating the development of advanced solutions.
3. Research Methodology
This section presents the proposed DA-MBA designed for generalized and robust intrusion and zero-day attack detection in the Industrial Internet of Services. The proposed pipeline integrates noise-robust feature learning, temporal analysis, mutual information (MI)-based feature selection, and adaptive thresholding, aided by the explainability.
3.1. Problem Formulation
In the era of IIoS, data are generated from heterogeneous multivariate telemetry, including sensor readings, protocol metadata, actuator commands, and device-level events. Cyber attacks also blend into benign operations and traffic and appear through both instantaneous anomalies. This section presents the proposed DA-MBA designed for generalized and robust intrusion detection and zero-day attack detection in the Industrial Internet of Services. Given dataset D =
, where
represents input features and
indicates benign or malicious behavior and the ultimate goal is to learn mapping.
This Equation (1) represents high generalization to unseen families of attacks.
3.2. Datasets
The proposed model was evaluated on two industry-standard intrusion detection datasets from IIoT environments.
Edge IIoT dataset (
Table 3): A renowned IIoT cyber range dataset containing cross-protocol network traffic (such as MQTT, CoAP, and Modbus), device synchronization, and more than 14 attack families, including scanning, spoofing, DDoS, data modification, and malware spreading. The Edge IIoT dataset includes “Attack type” as a family identifier and “Attack label” as the binary target.
The Edge IIoT Dataset is an extensive and diverse dataset specifically designed to facilitate research in the field of the Industrial Internet of Things (IIoT), as described in
Table 2. This dataset captures the operational data from edge devices deployed in industrial settings, including sensors, actuators, and controllers. It often includes real-time data streams and structured data for monitoring and analyzing industrial processes such as manufacturing, energy management, and predictive maintenance. The Edge IIoT dataset is valuable for analyzing key areas such as anomaly detection, fault prediction, and cybersecurity in industrial IoT systems. A combination of temporal, spatial, and contextual features serves as a robust foundation for testing machine learning models, validating edge analytics solutions, and improving system reliability and scalability in industrial Internet of Things networks.
WUSTL-IIoT dataset (
Table 4): Another well-known multi-domain IIoT dataset with operational traffic from industrial devices, such as PLC, RTU, and sensors, consisting of both benign and malicious events. In the WUSTL-IIoT dataset, the metadata included “Traffic” (family type) and “Target” (label). The WUSTL-IIoT dataset is significantly imbalanced and mimics real factory conditions.
The WUSTL Dataset serves as a benchmark for evaluating novel machine learning pipelines, enabling applications in diagnostics, real-time monitoring, and adaptive systems.
For each dataset, the proposed model has two main evaluation settings: Baseline Detection (seen-attack): the model analyzes the sample from all attack families during training for binary classification. Zero-Day Detection (leave-one-family-out): For each attack family, excluding normal traffic, the model builds a leave-one-family-out split. All samples from the dataset belonging to family form the unknown (zero-day) set and are entirely excluded from training. The remaining samples were considered known, and the dataset was split into subsets. Two test cases are assessed for each held-out family: one is Strict Zero-Day, where only normal and zero-day attacks of family ff are contained. In other cases, mixed where normal, known attack families, and zero-day attacks are together.
3.3. Machine Learning Process
This study employs a comprehensive machine learning pipeline to optimize and evaluate a hybrid model for binary classification tasks.
This study employed a comprehensive machine learning pipeline to optimize and evaluate a hybrid binary classification model. A crucial design principle of DA-MBA is to be leakage prevention: no transformation is allowed to “observe” test data during the fitting. The preprocessing of this pipeline consisted of the following: Split first: Each dataset was split into three subsets, namely training, validation, and testing, to ensure robust model implementation and reliable performance assessment. The validation subset was used strictly for early stopping, hyperparameter tuning, and monitoring progress without influencing the final testing results. Moreover, the most efficient data distribution was approximately 72% training, 8% validation, and 20% testing, which ensured a robust, leakage-free evaluation pipeline. To prevent cross-family information leakage, this system implements a structured preprocessing pipeline that removes identifier-related fields, reduces high-cardinality features, and transforms sensitive flow attributes. Feature audits based on uniqueness ratios and mutual information values verified that the cleaned dataset contained no unnecessary fields that would bias the evaluations. The leakage-controlled preprocessing pipeline includes leakage audits that report uniqueness ratios and mutual information scores for all sanitized features. The audit confirmed that after sanitization, no identifier-like or high-cardinality fields remained. Handling missing values: Numerical features from the dataset (such as packet counts, byte counts, and durations) were modeled with the median of the training distribution for that feature.
Categorical features (such as protocol types and status codes of events) are modeled from their most frequent training values. The proposed process (
Figure 4) preserves empirical organization while avoiding test statistic bias.
Encoding categorical attributes: Categorical features were encoded as numerical values using ordinal encoding. The zero day sets map unknown categories to a dedicated “unknown” code, which ensures that the model cannot fail when processing unseen values. Feature Selection using Mutual Information (MI): Following data imputation and encoding, all numerical and encoded categorical features were consolidated. For every feature, its MI with the binary label is evaluated.
Only the top-K features with the highest MI are retrieved, where K is an adjustable hyperparameter. The proposed setup reduces dimensionality, eliminates irrelevant and noisy features, and boosts training to be more reliable and efficient. Standardization: In this step, the selected features were standardized using the mean and standard deviation of the training set. A zero mean and unit variance were assigned to each feature. This is advantageous for subsequent deep learning stages, which assume comparable feature scales.
The DA-MBA model combines multiple components to optimize robustness and reliability under realistic IIoS operating conditions. First, the model acts as an implicit ensemble, combining a DAE for compressed latent representation with a combined MLP and a bidirectional LSTM pathway. The robust design allows the model to capture both static feature patterns and short-range temporal dependencies, significantly improving generalization across diverse traffic events. Furthermore, to strengthen decision reliability, the DA-MBA employs threshold calibration, where validation-time normal samples are used to derive an adaptive decision boundary that achieves a specified false-positive rate, which ensures that the classifier remains steady and interpretable when deployed in environments where benign traffic may vary over time.
In the proposed DA-MBA pipeline, the calibrated decision module refines the raw classifier outputs to ensure reliable predictions under IIoS operating conditions. After the hybrid deep learning MLP–BiLSTM classifier generates the attack probabilities, the predictions are calibrated using standard validation samples to adjust the decision threshold. This calibration process was implemented through an FPR-controlled quantile calibration step to produce a calibrated probability distribution that aligned the model’s output with the desired false-positive bounds. The calibrated predictions are then passed to an adaptive thresholding mechanism that selects the optimal cutoff for distinguishing benign from malicious events in both strict zero-day and mixed evaluation modes. The final controlled calibration layer strengthens deployment reliability by minimizing unnecessary alerts while preserving sensitivity to genuinely anomalous patterns, ensuring that the DA-MBA maintains precise and explainable decision-making across unseen or evolving IIoS attack patterns.
Finally, the framework performs an extensive zero-day assessment using a leave-one-family-out (LOFO) approach, where each attack family is omitted during training and later treated as entirely unseen during testing. This strict approach assesses the model’s ability to detect previously unseen attacks, emphasizing DA-MBA’s suitability of DA-MBA for real-world IIoS security scenarios, where adversaries continually introduce new and evolving threat patterns.
3.4. DA-MBA Model Architecture
The proposed DA-MBA architecture includes two main modules: a denoising autoencoder that develops a reliable and concise representation of the input features. Hybrid classifier: that integrates a fully coupled (MLP) branch with a bidirectional LSTM branch.
DAE: The DAE utilizes the standardized top-K feature matrix as the input set and learns to reassemble it. To enhance robustness, Small Gaussian noise is injected into the input layer during the training phase. The encoder is composed of dense layers with nonlinear activations and L2 regularization, resulting in a latent code. The decoder phase replicates the structure by mapping the latent code back to the original feature dimension. The DAE is designed to minimize reconstruction errors, enabling it to discover a low-dimensional manifold that captures the most salient structure of both benign and malicious traffic. After convergence, only the encoder was kept and used to transform each feature vector into a latent vector.
Hybrid Classifier: The proposed hybrid classifier utilizes the latent representation and outputs an estimate of the possibility of an attack. The latent function is fed into a dense node, which generates fully connected layers with nonlinear activations and dropout, resulting in a transformed feature vector. Moreover, the same latent vector is reshaped into a length-1 sequence and passed through a BiLSTM layer. However, despite having a single time step, the BiLSTM branch provides increased representational power and can be used to model complex nonlinear transformations, unlike the purely feedforward branch. The outputs of the dense and BiLSTM branches were combined and fed into a final dense layer that yielded a single scalar, [0, 1], expressed as the probability of an attack. The proposed hybrid design maximizes the potential of both fully connected and recurrent layers while ensuring that the overall model is sufficiently lightweight for edge deployment. The hybrid classifier was trained with class weighting and early stopping, with the AUC on the validation set monitored.
A denoising autoencoder is designed to extract robust features from the input data as given in
Figure 5. Gaussian noise is added to the training data to simulate real-world scenarios, and the autoencoder is trained to reconstruct the original data. The encoder component of the autoencoder extracts compressed, high-dimensional representations, which are further utilized as input features for the hybrid model. Regularization techniques, such as dropout and L2 regularization, are incorporated to enhance generalization and prevent overfitting.
3.5. Training and Evaluation Process
The hybrid model combines the strengths of Multi-Layer Perceptron (MLP) and Bidirectional Long Short-Term Memory (BiLSTM) networks. The MLP pathway processes the encoded features, while the BiLSTM pathway models temporal relationships in reshaped encoded features. Outputs from both pathways are concatenated and passed through a dense layer with sigmoid activation for binary classification. Dropout layers are strategically integrated in both pathways to regularize the model.
Optuna, an advanced hyperparameter optimization framework, is employed to identify the optimal set of hyperparameters for the denoising autoencoder and the hybrid model. The optimization process dynamically explores the hyperparameter search space, including encoding dimensions, noise factors, dropout rates, LSTM and MLP units, L2 regularization strength, and learning rate. Optuna leverages the Tree-Structured Parzen Estimator (TPE) to balance exploration and exploitation, while early stopping mechanisms prune underperforming trials to minimize computational overhead.
4. Results and Discussion
The performance evaluation of our proposed method was conducted on two benchmark datasets, WUSTL and EdgeIIoT. Both datasets demonstrated exceptional results, with accuracy metrics reaching nearly 99% across training, testing, and validation phases. These results underline the robustness and reliability of the approach in addressing the challenges inherent in the datasets and highlight its applicability to real-world IoT and edge computing environments.
Table 5 describes all the used resources along with the environment and libraries that have been used in this hybrid model.
Table 6 describes the used hyperparameters along with configurations which are automatically optimized and selected by the framework Optuna.
For the WUSTL dataset, the training, testing, and validation phases showed minimal variance, indicating the model’s ability to effectively generalize across different data splits. The training accuracy remained consistently high, reflecting the model’s capability to learn intricate patterns within the dataset. Similarly, the testing and validation accuracies validate the predictive power of the model and its potential to perform well on unseen data, as shown in
Table 7. This uniform performance across all phases suggests that the model is free from overfitting and is well optimized for this dataset. The EdgeIIoT dataset, which presents unique challenges owing to its high-dimensional features and noise levels, also yielded nearly 99% accuracy across the training, testing, and validation phases. The results for this dataset confirm the scalability and adaptability of the proposed approach. The minimal loss observed during the training phase indicates efficient convergence of the model, whereas the low testing and validation losses corroborate its robustness against overfitting and its capability to retain high performance on diverse data distributions.
The results achieved in this study significantly outperformed those reported in similar studies using the WUSTL and EdgeIIoT datasets.
Table 8 also presents comparisons of the baseline MLP and BiLSTM models with the proposed model. Although previous approaches have often been limited by issues such as overfitting, scalability, or inadequate performance on noisy datasets, our method demonstrates an unprecedented level of accuracy and stability. MLP and BiLSTM provided less accuracy than the proposed model by approximately 92% to 97%, whereas the proposed model yielded approximately 99% results. This underscores the importance of incorporating innovative features and training techniques. The high accuracy and low loss across diverse phases suggest that this approach is both theoretically robust and practically viable. In real-world applications, such reliability is crucial for decision-making in IoT and edge computing environments, where data integrity and rapid processing are paramount. The proposed study provides a strong foundation for further exploration and improvement, presenting a significant leap forward in addressing the challenges associated with edge computing and IoT applications. Nearly perfect accuracy metrics demonstrate the potential of the proposed approaches in addressing these issues and provide a method for innovative, practical solutions in the field.
The leave-one-family-out (LOFO) method yields
Table 9, which shows that DA-MBA sustains a strong zero-day performance on Edge-IIoT dataset. The DA-MBA model exhibited sophisticated generalization, yielding ROC-AUC and PR-AUC scores (≈0.99–1.00) across nearly all attack types. Therefore, the obtained results indicate that the learned latent representation and hybrid classifier can clearly isolate the majority of malicious behaviors from benign events, even when using the LOFO method. However, a notable exception is the MITM attack class, which shows a slightly lower recall (0.813) despite retaining a high ROC-AUC (0.9996) and PR-AUC (0.9883). According to this pattern, the model correctly ranked the MITM samples (high separability), but setting the decision threshold to 0.5 resulted in a modest number of missed detections. The reason for the low results is that MITM traffic often overlaps with benign communication patterns and exhibits fewer statistical deviations than volumetric attacks, such as DDoS and scanning. Moreover, the marginally lower performance against MITM attacks is also due to the limited number of training data points. Although MITM attacks are more challenging than regular attacks, the DA-MBA maintains excellent detection performance (>80% recall) and near-perfect ranking metrics, confirming the model’s ability to detect both high-volume and stealthy attacks.
Table 10 presents the per-attack (attack class) results of the proposed pipeline, DA-MBA, on the WUSTL-IIoT dataset under strict zero-day and mixed evaluation conditions. The analysis demonstrated the consistent existence of a strong detection ability across most attack classes, with ROC-AUC and PR-AUC values near 1.0, demonstrating excellent sensitivity tole malicious and benign events. The model offers good recall for DoS (0.95) and strong recall for Backdoor (0.76415) and Command Injection (0.80309), demonstrating its ability to correctly identify a wide range of previously unseen attack types. However, the effectiveness is marginally lower for the reconnaissance class, which shows reduced recall at a threshold of 0.5; this class still retains a high ROC-AUC (0.95) and PR-AUC (0.87), indicating that the model ranks these samples well despite the complex characteristics of the dataset. Overall, the table shows that the DA-MBA delivers robust and reliable zero-day detection across diverse attack classes, with good results against high-impact attack events such as Backdoor, Command Injection, and DoS.
Table 11 summarizes the computing environment used to execute the DA-MBA model.
Table 12 shows the computational footprint of the DA-MBA model. Despite its small trainable parameter count (90,881) and compact FP32 footprint (0.35 MB), the overall process memory is high because of the Python runtime overheads, data loading, and temporary arrays generated during preprocessing and batch inference. The given values represent the execution environment of the entire pipeline. Based on the results, the proposed model appears to be lightweight and suitable for edge deployment.
Table 12 presents the latency metrics for the full DA-MBA pipeline, which consists of preprocessing, encoder forward pass, and classification in a batch-based analysis setting. Both industrial datasets exhibit extremely low processing times, with median latencies of 4.23 µs (EDGE-IIoT dataset) and 2.35 µs (WUSTL-IIoT dataset). Even at the 99th percentile, the latencies remained within a few microseconds, demonstrating highly stable real-time behavior.
The assessed values were high throughput levels, surpassing 230 k samples/s on EDGE-IIoT and 425 k samples/s on WUSTL-IIoT. The statistics indicate that the proposed DA-MBA model is well suited to real-time industrial IoT environments that require fast and consistent detection across large data streams.
Figure 6 and
Figure 7 present the SHAP summary plots illustrating the contribution of the top features to the prediction behavior of the DA-MBA model in the WUSTL IIoT and Edge-IIoT datasets. The visualization shows a clear and interpretable distribution of the feature influences. In the WUSTL-IIoT dataset, high-impact features such as DIntPkt, DstJitter, Dport, and various byte- and packet-level attributes largely drive the model output. The well-balanced distribution of red (high feature values) and blue (low feature values) features illustrates that the model uses both large and small fluctuations in network statistics to distinguish between benign and malicious events. The given pattern demonstrates that the model is not entirely dependent on a single dominant feature. However, it integrates information from multiple network-level feature spaces, suggesting stable and consistent learning patterns. Moreover, the Edge-IIoT dataset highlighted the features that most influenced the model’s predictions. The distribution of SHAP values indicates that protocol-level attributes such as mqtt.prtoname, mqtt.topic, dns.qry.name.len and mqtt.conack.flags contribute strongly to distinguishing benign from malicious events, whereas HTTP- and ICMP-related fields also play significant roles in this model. The balanced distribution of red (high influenced values) and blue (low influenced values) across both positive and negative SHAP contributions indicates that the model learns complex patterns rather than relying on any single dominant feature. The visualization indicates that the model is based on various contextually appropriate IIoT communication attributes.
The proposed DA-MBA model offers essential insights into its stability and generalization capability during the training process across heterogeneous IIoT environments, as it reflects both industrial datasets. As shown in
Figure 8 and
Figure 9, the training and validation accuracy curves for the renowned industrial datasets, the classifier is stabilized and demonstrates highly stable learning patterns.
In both datasets, the training and validation curves remained closely aligned throughout the training, indicating the absence of overfitting and confirming that the model learned robust features that generalize well to unseen events. Considering the complexity and volume of IIoS traffic, where temporal fluctuations, unreliable protocol characteristics, and class imbalance are critical challenges to the training stability of industrial intrusion detection models, such consistency is significant.
Balancing the ROC analysis, the PR curves presented in
Figure 10 and
Figure 11 for the WUSTL-IIoT dataset and
Figure 12 and
Figure 13 for the Edge-IIoT dataset provide a comprehensive analysis of the performance under class-imbalanced circumstances, which is a significant challenge in IIoS security. In summary, visual analyses demonstrated that DA-MBA delivers consistent and robust detection performance across both datasets, positioning it as a reliable deployment-ready solution for safeguarding modern IIoS infrastructures.