Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling

Ammar, Heyfa; Cherif, Asma

doi:10.3390/iot6040074

Open AccessArticle

Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling

by

Heyfa Ammar

^1,2,3,*

and

Asma Cherif

^4,5,*

¹

Computer Science Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

²

RIOTU Lab, Prince Sultan University, Riyadh 11586, Saudi Arabia

³

RISC-ENIT Lab, National Engineering School of Tunis, University of Tunis El Manar, Tunis 1002, Tunisia

⁴

Information Technology Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁵

Center of Excellence in Smart Environment Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

IoT 2025, 6(4), 74; https://doi.org/10.3390/iot6040074 (registering DOI)

Submission received: 20 October 2025 / Revised: 20 November 2025 / Accepted: 26 November 2025 / Published: 29 November 2025

Download

Browse Figures

Versions Notes

Abstract

The proliferation of Internet of Medical Things (IoMT) devices in healthcare requires robust intrusion detection systems to protect sensitive data and ensure patient safety. While existing neural network-based Intrusion Detection Systems have shown considerable effectiveness, significant challenges persist—particularly class imbalance and high data dimensionality. Although various approaches have been proposed to mitigate these issues, their actual impact on detection accuracy remains insufficiently explored. This study investigates advanced Artificial Neural Network (ANN) architectures and preprocessing strategies for intrusion detection in IoMT environments, addressing critical challenges of feature dimensionality and class imbalance. Leveraging the WUSTL-EHMS-2020 dataset—a specialized dataset specifically designed for IoMT cybersecurity research—this research systematically examines the performance of multiple neural network designs. Our research implements and evaluates five distinct ANN architectures: the Standard Feedforward Network, the Enhanced Channel ANN, Dual-Branch Addition and Concatenation ANNs, and the Shortcut Connection ANN. To mitigate the class imbalance challenge, we compare three balancing approaches: the Synthetic Minority Over-sampling Technique (SMOTE), Hybrid Over-Under Sampling, and the Weighted Cross-Entropy Loss Function. Performance analysis reveals nuanced insights across different architectures and balancing strategies. SMOTE-based models achieved average AUC scores ranging from 0.8491 to 0.8766. Hybrid sampling strategies improved performance, with AUC increasing to 0.8750. The weighted cross-entropy loss function demonstrated the most consistent performance. The most significant finding emerges from the Dual-Branch ANN with addition operations and a weighted loss function, which achieved 0.9403 Accuracy, 0.8786 AUC, a 0.8716 F1-Score, 0.8650 Precision, and 0.8786 Recall. Compared to the related work’s baseline, it demonstrates a substantial increase in F1 Score by 8.45% and an improvement of 18.67% in AUC and Recall, highlighting the model’s superiority at identifying potential security threats and minimizing false negatives.

Keywords:

intrusion detection; IoT; healthcare security; neural networks; ANN; autoencoders; class imbalance

1. Introduction

The extensive integration of Internet-connected technologies into healthcare delivery systems has fostered the emergence of the Internet of Medical Things (IoMT). This specialized digital framework has transformed both patient care provision and medical information management [1]. This technological evolution has reshaped healthcare paradigms by creating interconnected environments where medical devices, healthcare professionals, and patients communicate seamlessly, generating continuous data flows that enable informed clinical decisions and timely interventions. The profound impact of the IoMT is evidenced by remarkable market expansion, with the global sector growing from approximately USD 230 billion in 2024 and anticipated to reach between USD 419–814 billion by 2030–2032, according to analyses by Grand View Research [2] and Mordor Intelligence [3]. This technological evolution, while beneficial, introduces significant security vulnerabilities that could compromise patient safety and data confidentiality. Unlike general IoT systems, which face connectivity and security challenges that lead to privacy breaches or financial losses, IoMT security failures pose uniquely critical risks, directly threatening patient lives and care outcomes. The sensitive nature of healthcare data makes these systems particularly attractive targets for cyber attackers, necessitating robust security mechanisms that extend beyond conventional IoT protections. For instance, in 2023 alone, 725 healthcare data breaches were reported to the U.S. Department of Health and Human Services, exposing over 133 million patient records—representing a threefold increase from the previous year [4]. The security landscape is further complicated by inherent vulnerabilities in IoMT infrastructure. Recent studies reveal that healthcare delivery organizations face an average of 43 cyberattacks annually, with more than six vulnerabilities per IoMT device [5]. To address these growing security risks, Intrusion Detection Systems (IDSs) serve as critical defensive components in healthcare cybersecurity frameworks, continuously monitoring network communications to identify suspicious activities and security breaches [6]. These systems employ various detection methodologies, including signature-based, behavior-based, and anomaly-based approaches [7,8,9,10]. Among these, anomaly-based detection has emerged as particularly valuable for identifying novel threats by detecting deviations from established baseline behaviors. However, this approach faces challenges related to high false-positive rates and vulnerability to sophisticated attack vectors [11].

The advent of advanced machine learning techniques has created new opportunities for enhancing intrusion detection capabilities. Previous research has demonstrated the efficacy of Extreme Learning Machines (ELMs) for IoT security applications [12,13], highlighting their fast training speeds and reasonable generalization capabilities. However, more sophisticated Artificial Neural Network (ANN) architectures may offer superior performance for the complex patterns characteristic of network intrusions, particularly in specialized IoMT environments.

Contemporary neural network research has produced numerous architectural innovations, including multi-branch networks [14], concatenation operations, and shortcut connections [15], which have demonstrated remarkable success in computer vision and natural language processing. These advanced designs have yet to be thoroughly explored in the context of healthcare intrusion detection, where their capacity to model complex relationships between features could potentially yield significant performance improvements [16].

A parallel challenge in developing effective intrusion detection systems is managing the high dimensionality of network traffic data. Autoencoders (AEs), a class of neural networks designed for unsupervised learning, offer a promising approach to dimensionality reduction by learning compressed representations of input data. Their ability to capture essential patterns while filtering noise makes them potentially valuable for preprocessing network traffic features before classification [17].

Despite these advances, several critical challenges persist in IoMT intrusion detection. First, class imbalance—where normal traffic significantly outnumbers attack instances—presents a major obstacle to the development of effective detection models. This imbalance typically biases models toward the majority class, resulting in poor detection of the minority attack instances that are of greatest interest. Second, the high dimensionality and potential redundancy in network traffic features may impede model performance and increase computational requirements. Third, the relationship between feature dimensionality reduction and detection accuracy is not explored enough, particularly in healthcare settings.

This study addresses these challenges by presenting a comprehensive evaluation of advanced ANN architectures for intrusion detection in IoMT environments, with particular attention to the impact of autoencoder preprocessing for dimensionality reduction. We leverage the WUSTL-EHMS-2020 dataset [18], specifically designed for IoMT systems, to ensure the relevance and applicability of our findings. Our research focuses on implementing and evaluating multiple ANN architectures, from standard feedforward networks to sophisticated designs featuring dual-branch inputs, concatenation operations, and shortcut connections. We compare these architectures both with and without autoencoder dimensionality reduction to assess the trade-offs between feature compression and detection performance.

To mitigate the class imbalance inherent in network traffic data, we employ and compare three distinct approaches: Synthetic Minority Over-sampling Technique (SMOTE), weighted loss functions, and a hybrid over- and under-sampling strategy [19]. This comprehensive methodology enables us to identify optimal combinations of preprocessing techniques, neural architectures, and class balancing strategies for IoMT intrusion detection.

Contributions. This work advances the development of secure IoMT frameworks through improved intrusion detection techniques based on sophisticated neural network approaches. The key contributions of this research include:

1.: A systematic evaluation of multiple advanced ANN architectures for intrusion detection in IoMT environments, including standard feedforward networks, dual-branch models with addition and concatenation operations, and networks incorporating shortcut connections.
2.: Comprehensive assessment of autoencoder preprocessing for dimensionality reduction in intrusion detection, revealing critical trade-offs between feature compression and detection performance.
3.: Comparative analysis of three class imbalance mitigation strategies (SMOTE, weighted loss functions, and hybrid sampling) across different neural architectures, identifying optimal combinations for effective attack detection.

Outline. The remainder of this paper is structured as follows. Section 2 discusses related work in intrusion detection. Section 3 presents our comprehensive research methodology. Section 4 details the WUSTL-EHMS-2020 dataset selection, preprocessing steps, and feature engineering techniques employed. Section 5 introduces our proposed neural network architectures, highlighting their structural elements and theoretical advantages. Section 6 provides a detailed analysis of experimental findings, comparing performance across architectures and preprocessing approaches, with statistical validation of key results. Finally, Section 7 synthesizes our contributions, discusses limitations, and outlines promising directions for future research in IoMT intrusion detection.

2. Related Work

The growing adoption of IoMT devices has heightened the need for robust security mechanisms, particularly intrusion detection systems. This section reviews significant research contributions in this domain, highlighting various approaches to enhance intrusion detection in IoMT and healthcare-specific environments.

2.1. Traditional Machine Learning Approaches

Early research on network intrusion detection extensively utilized the KDD’99 dataset [20]. Zhang et al. [21] proposed a Random Forest (RF) model for anomaly detection, achieving 95% accuracy with a 1% false-positive rate. However, as noted by Hady et al. [22], this dataset lacks healthcare specificity and fails to represent contemporary network environments.

Li et al. [23] employed an integrated approach combining clustering methods, ant colony algorithms, and support vector machines (SVM) on the KDD’99 dataset. Their classifier achieved 98.6% accuracy in cross-validation with a Matthews correlation coefficient of 0.861. To address the high dimensionality challenge, researchers explored feature reduction techniques. Tesfahun et al. [24] applied Information Gain (IG) with random forests, while Shah et al. [25] demonstrated that reduced feature sets improved Back Propagation Neural Network performance in terms of size, complexity, and generalization ability.

The NSL-KDD dataset [26] was introduced to overcome KDD’99 limitations, though it still does not fully represent modern network environments. Kale et al. [27] utilized this dataset in their three-phase deep learning framework combining K-means clustering, GANomaly, and Convolutional Neural Networks. Albulayhi et al. [28] proposed an innovative feature selection approach using mathematical set theory, achieving a remarkable 99.98% classification accuracy.

Iwendi et al. [29] integrated RF with genetic algorithms for feature optimization, achieving a 98.81% detection rate with only a 0.8% false alarm rate across multiple datasets. Their approach demonstrated the critical importance of feature selection in enhancing intrusion detection system performance.

2.2. IoMT-Specific Approaches

Research specifically targeting IoMT environments has gained momentum recently. Hady et al. [22] explored the integration of both medical and network data to enhance intrusion detection in healthcare. They developed a real-time Enhanced Healthcare Monitoring System (EHMS) testbed and created a specialized IoMT dataset comprising over 16,000 records. Their findings demonstrated that utilizing both network and biometric metrics as features in IDSs yields superior performance compared to using either type alone, with improvements ranging from 7% to 25%. To address class imbalance, they employed SMOTE as a resampling technique, with their SVM implementation achieving 92.46% accuracy, while the ANN exhibited a 92.98% AUC score.

Mohammed et al. [30] introduced an ensemble-based IDS for IoMT environments using the WUSTL-EHMS-2020 dataset. To mitigate class imbalance, they implemented random over-sampling, creating a balanced corpus of 28,540 entries. Their system achieved a 99.80% accuracy rate in testing and 99.96% average accuracy through 10-fold cross-validation, with a 0.9980 F1 score. However, their use of random oversampling potentially introduces overfitting risks by duplicating existing minority samples without introducing new information.

Neural networks have gained significant traction in intrusion detection research due to their adaptive learning capabilities. Cherif et al. [13] Cherif et al. presented a comparative study of Extreme Learning Machine (ELM) architectures, leveraging the WUSTL-EHMS-2020 dataset. The study addressed the critical challenge of class imbalance in network traffic data by comparing three mitigation strategies: Synthetic Minority Over-sampling Technique (SMOTE), weighted loss functions, and a hybrid over- and undersampling method. The research implemented multiple ELM architectures and evaluates their performance across key metrics including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC). Results demonstrated that the weighted loss function significantly enhances detection capabilities, achieving high accuracy and precision suitable for real-time intrusion detection in healthcare. The study achieved a maximum accuracy of 0.9305, a precision of 0.9518, and a maximum recall of 0.7789. Nayak et al. [31] proposed a hybrid model combining Bayesian optimization and ELM for IoMT security. Using the ToN_IoT dataset [32], their approach achieved impressive results: 0.990 for precision, recall, and F1 score. However, their study did not address the critical class imbalance problem inherent in intrusion detection datasets.

Ramya et al. [33] introduced an intrusion detection framework that integrates Particle Swarm Optimization for feature selection with a Probabilistic Neural Network for classification. Their system leverages a hybrid dataset combining network traffic and patient biometric data to enable real-time attack detection, achieving an average accuracy of 96.4% and an F1-score up to 95.67%. Despite these promising results, the study does not explicitly address the challenge of class imbalance.

Akar et al. [34] developed a Long Short-Term Memory (LSTM)–based deep learning model comprising two LSTM and two Dense layers, optimized with the AdamW optimizer to improve generalization and convergence. The authors investigated the class imbalance problem in their model by testing SMOTE, weight balancing, and focal loss [35] on the CIC-IoMT2024 dataset [36]. However, their experimental results demonstrated that none of these approaches improved the model’s performance, with SMOTE achieving 95% accuracy (no improvement over baseline), weight balancing reaching 96% (marginal improvement), and focal loss actually degrading performance to 94%. Consequently, their final model achieved the best results (98% accuracy for 19-class classification) without employing any class balancing technique and using the full set of 45 features without any dimensionality reduction.

A hybrid IDS designed for the IoMT is presented in [37]. The proposed model combines a Fully Connected Convolutional Neural Network (FCNN) to identify image-based intrusions with an LSTM network to analyze textual network traffic, achieving an accuracy of

97.66 %

and an F1-score of

97.85 %

on the CIC-IoMT2023 [38] and Malimg [39] datasets. However, these datasets do not capture IoMT-specific characteristics.

The summary of related works is provided in Table 1.

2.3. Research Gaps and Opportunities

Table 2 provides a comparative analysis of reviewed research works. The table evaluates existing studies across four critical dimensions: advanced architectural approaches, utilization of IoMT-specific datasets, handling of class imbalance, and use of feature selection techniques.

The analysis reveals a progressive trend in research, with earlier works primarily focusing on traditional machine learning techniques. Reported techniques demonstrate significant improvements by incorporating IoMT-specific datasets and class imbalance handling techniques. Though feature selection has been applied in some works, investigating advanced methods such as autoencoders remains unexplored.

Despite significant advances in intrusion detection systems for IoMT, several research gaps remain:

Most studies focus on either traditional machine learning or basic neural network structures, with limited exploration of advanced neural network architectures specifically designed for healthcare intrusion detection.
While class imbalance is acknowledged as a challenge, comprehensive comparisons of different balancing techniques and their impact on various neural network architectures are scarce.
The interaction between dimensionality reduction techniques and different neural network designs remains underexplored, particularly in healthcare-specific contexts and with advanced techniques like Autoencoders.
The impact of channel number optimization on neural network performance for intrusion detection has received insufficient attention.

Building upon the identified gaps, this research seeks to answer the following questions:

RQ1: How do different neural network architectures compare in their effectiveness for IoMT intrusion detection relative to traditional machine learning models?
RQ2: What is the impact of various class balancing techniques on the detection accuracy and robustness of neural network–based intrusion detection systems in healthcare environments?
RQ3: How do Autoencoders for dimensionality reduction affect detection performance and computational efficiency across neural network architectures?

To answer these questions, this research addresses the previously mentioned gaps by providing a systematic evaluation of multiple neural network architectures across different class balancing techniques, offering insights into optimization strategies for IoMT intrusion detection systems. Moreover, it investigates the effect of the autoencoder neural network as an advanced feature reduction technique on the overall performance of the intrusion detection system.

3. Methodology

Our research methodology follows a systematic approach to evaluate the effectiveness of various ANN architectures and autoencoder preprocessing methods for intrusion detection in IoMT environments. Figure 1 illustrates the comprehensive workflow, highlighting the parallel processing paths with and without dimensionality reduction.

Our methodology encompasses six key phases:

1.

Dataset Selection and Preparation: We utilize the WUSTL-EHMS-2020 dataset specifically designed for IoMT environments, performing initial cleaning and feature standardization.

2.

Feature Processing: We implement two parallel processing paths:

Direct Feature Processing: Original features are standardized and used directly for model training.
Autoencoder Preprocessing: Features undergo dimensionality reduction through an autoencoder network before being fed to classification models.

3.

Class Imbalance Handling: We implement and compare three distinct strategies:

Synthetic Minority Over-sampling Technique (SMOTE)
Weighted loss function approach
Hybrid over-under sampling method

4.

Neural Network Architecture Design: We implement five distinct ANN architectures:

Standard ANN (baseline)
Enhanced Channel ANN (ANN_v1)
Dual-Branch Addition ANN (ANN_v2)
Dual-Branch Concatenation ANN (ANN_v3)
Shortcut Connection ANN (ANN_v4)

5.

Model Training and Validation: Each architecture is trained with consistent hyperparameters across multiple class balancing configurations, using evaluation at regular intervals to assess performance.

6.

Performance Evaluation: Models are evaluated using multiple metrics (Area Under the Curve (AUC), Accuracy, Precision, Recall, and F1-score) to provide a comprehensive assessment of their detection capabilities.

This structured approach enables systematic comparison of different architectural designs and preprocessing strategies, facilitating identification of optimal configurations for IoMT intrusion detection.

4. Dataset Selection and Preprocessing

The WUSTL-EHMS-2020 dataset was specifically developed for IoMT cybersecurity research using an Enhanced Healthcare Monitoring System (EHMS) testbed [18,22]. This testbed captures both network flow metrics and patient biometric data in real-time, creating a realistic representation of IoMT environments. The selection of the WUSTL-EHMS-2020 dataset addresses critical limitations in existing cybersecurity research for IoMT environments [13]. Many previous studies relied on outdated or generic datasets that fail to capture the unique characteristics of healthcare systems. Indeed, the dataset incorporates patient biometric features. Unlike traditional datasets, this collection includes critical patient-specific information, providing a more comprehensive view of IoMT security challenges.

The dataset incorporates two types of man-in-the-middle attacks:

Spoofing attacks: Intercept communications between gateway and server, potentially exposing confidential patient information.
Data injection attackss: Alter packets in transit, compromising data integrity.

Table 3 presents the statistical distribution of the dataset, highlighting the significant class imbalance that must be addressed (see Section 4.3).

4.1. Data Preprocessing

The dataset comprises 44 features: 35 network flow metrics, 8 patient biometric indicators, and 1 label feature (1 for anomalous samples, 0 for normal samples). Samples containing MAC addresses associated with the attacker’s laptop are labeled as attacks. After preprocessing and feature selection, our final input vector contains 34 dimensions, consisting of 26 network features and 8 biometric features.

The feature selection process involved strategic removal of network identifiers to enhance the intrusion detection system’s effectiveness and generalizability. By eliminating the metadata columns related to network traffic (Dir, Flgs, SrcAddr, DstAddr, Sport, Dport) along with MAC addresses and packet sequence numbers, we focused the model on behavioral patterns rather than environment-specific identifiers. This approach prevents overfitting to particular network configurations and avoids data leakage from features used in attack labeling. Indeed, removing the attacker’s MAC addresses from the dataset is critical for developing a valid intrusion detection system. Since samples were labeled as attacks based on their association with the attacker’s MAC address, including these identifiers would create significant data leakage as the model would simply learn to detect specific MAC addresses rather than actual attack behaviors. This shortcut would produce artificially high accuracy during testing that cannot be generalized to real-world environments where attackers use different or spoofed MAC addresses. The final 34-dimensional feature vector provides sufficient information for effective intrusion detection while maintaining applicability across diverse network environments.

Normalization was also applied as a preprocessing step following feature selection. Using scikit-learn’s StandardScaler, the features were standardized by removing the mean and scaling to unit variance. This scaler is fit on the training data and then applied to both training and testing sets to ensure consistent scaling.

4.2. Dimensionality Reduction

A critical component of our methodology is the exploration of autoencoder networks for dimensionality reduction in preprocessing. Autoencoders are neural networks designed to learn efficient encodings of input data in an unsupervised manner. Figure 2 illustrates our autoencoder architecture.

The autoencoder consists of:

Encoder: Compresses the 34-dimensional input features to a 16-dimensional bottleneck representation
Decoder: Reconstructs the original 34-dimensional features from the bottleneck representation

During training, the autoencoder minimizes reconstruction error, learning to preserve the most important information while discarding noise and redundancy. After training, we discard the decoder and use only the encoder portion to transform input features before feeding them to classification models.

4.3. Class Imbalance Strategies

Class imbalance is a recurrent challenge in machine learning, particularly pronounced in domains such as healthcare, fraud detection, and security where minority classes often represent critical cases. In our dataset, attack samples constitute only 12.5% of the total, creating a significant imbalance that can lead to biased models favoring the majority class. To address this challenge, we implement and compare three distinct approaches:

4.3.1. Synthetic Minority Over-Sampling Technique (SMOTE)

This method generates synthetic samples in the feature space of the minority class, creating new instances that are not exact copies but share characteristics with existing minority samples. SMOTE operates by selecting a minority class instance and its k-nearest neighbors, then generating new samples along the line segments connecting the selected instance to its neighbors [19]. This approach increases the minority class representation while introducing greater variability than simple duplication.

4.3.2. Hybrid Over-Under Sampling

This strategy combines under-sampling of the majority class with over-sampling of the minority class. Specifically, we select 50% of the negative (normal) samples and then duplicate the positive (attack) samples to achieve an even distribution between classes. This balanced approach reduces computational burden while maintaining sufficient representation of normal network behavior [42].

4.3.3. Weighted Cross-Entropy Loss Function

Rather than altering the dataset distribution, this method maintains the original class distribution but assigns different weights to each class in the cross-entropy loss function. The weight for each class is inversely proportional to its frequency in the training set, effectively penalizing misclassification of minority class samples more heavily. The weighted cross-entropy loss function is formulated as

L_{C E}^{w} (y, \hat{y}) = - \sum_{i = 1}^{N} [w_{1} y_{i} log ({\hat{y}}_{i}) + w_{0} (1 - y_{i}) log (1 - {\hat{y}}_{i})]

(1)

where

y_{i}

is the true label,

{\hat{y}}_{i}

is the predicted probability, and

w_{1}

and

w_{0}

are the weights for the positive (attack) and negative (normal) classes, respectively. In our implementation, we use weights of

w_{0} = 0.15

and

w_{1} = 0.8

, assigning approximately 5.3 times more importance to correctly classifying attack instances compared to normal instances.

This approach preserves the natural data distribution while still addressing the learning bias [35], which is particularly valuable in intrusion detection where maintaining the realistic ratio of normal to attack traffic is important for model generalization.

5. Neural Network Architectures

This study implements and evaluates five distinct neural network architectures for intrusion detection: a baseline Standard ANN with progressively narrowing layers; an Enhanced Channel ANN (ANN_v1) with increased layer widths; two Dual-Branch models (ANN_v2 and ANN_v3) that separately process network and biometric features with different fusion methods; and a Shortcut Connection ANN (ANN_v4) incorporating residual-like skip connections. Each architecture is designed to investigate different structural strategies and their impact on classification performance.

5.1. Standard and Enhanced Channel Neural Network Architectures

The baseline model is a deep multi-layer perceptron consisting of eight layers that processes a 34-dimensional input feature vector, which encapsulates key network and biometric parameters. The architecture begins with an input layer of 34 features, followed by seven fully connected hidden layers with progressively decreasing dimensions of (40, 40, 20, 10, 10, 10, 10) neurons, respectively. The output layer comprises two neurons designed for binary classification. ReLU activation functions are applied to all hidden layers. This design facilitates a gradual reduction in dimensionality, encouraging hierarchical feature abstraction by compressing the feature representation from 40 neurons in the initial hidden layers down to 10 neurons in the deeper layers.

The Enhanced Channel ANN (ANN_v1) expands the representational capacity of the baseline model by significantly increasing the width of each hidden layer while preserving the original depth. It processes a 34-dimensional input feature vector through seven fully connected hidden layers with dimensions (256, 256, 128, 64, 64, 64, 64), followed by a two-neuron output layer for binary classification. ReLU activation functions are applied to all hidden layers. This architecture enables the network to model more complex feature interactions, potentially improving classification accuracy.

Both the ANN and ANN_v1 are illustrated in Figure 3.

5.2. Dual-Branch Architectures (ANN_v2 and ANN_v3)

To leverage the multimodal nature of IoMT data, the dual-branch models process network and biometric features through separate pathways before the fusion. The input vector is partitioned into two subsets: 26 network-related features and 8 biometric features. Each subset is processed independently through dedicated branches, enabling modality-specific feature extraction (see Figure 4). Both models follow the same preprocessing approach:

Data Splitting: The 34-dimensional input is divided into network metrics (first 26 features) and biometric parameters (remaining 8 features)
Specialized Processing: Each feature type is processed through dedicated network branches
Different Fusion Mechanisms: The two models differ in how they combine branch outputs

The Dual-Branch Addition ANN (ANN_v2) model combines the outputs of the two branches via element-wise addition scaled by a factor of 0.5, effectively averaging the representations. As illustrated in Figure 4, the network branch consists of two fully connected layers with dimensions (256, 256), and the biometric branch similarly includes two fully connected layers with dimensions (256, 256). The fusion mechanism performs element-wise addition of the branch outputs multiplied by 0.5. Following fusion, the combined representation passes through three shared fully connected layers with dimensions (128, 64, 64), each followed by ReLU activation and dropout with a rate of 0.4. The output layer consists of 2 neurons for binary classification. This averaging fusion encourages the branches to learn complementary representations that can be effectively integrated, while the scaling factor maintains consistent feature magnitudes. This scaling mechanism prevents feature magnitude explosion while allowing direct interaction between different feature types. The approach encourages learning of complementary representations by creating a unified feature representation that captures interactions between network traffic metrics and patient biometric signals. By maintaining a balanced contribution from both branches and applying ReLU activation after fusion, the model can dynamically learn complex relationships between different modalities, potentially improving the intrusion detection system’s ability to identify subtle anomalies in IoMT environments.

The Dual-Branch Concatenation ANN (ANN_v3) differs from the addition-based model by preserving the full information from both branches through concatenation of their outputs, resulting in a 512-dimensional feature vector. Similar to the addition model, both the network and the biometric branches consist of two fully connected layers with dimensions (256, 256). However, instead of element-wise addition, the fusion mechanism concatenates the branch outputs, allowing the model to retain all modality-specific features without averaging. The concatenated representation is then processed by three shared fully connected layers with dimensions (256, 128, 64), each followed by ReLU activation and dropout with a rate of 0.4. The output layer consists of 2 neurons for binary classification. This concatenation approach enables the subsequent layers to learn more complex relationships between the modalities, unlike the addition model, which blends the features earlier and encourages complementary representations through averaging, as illustrated in Figure 4.

This architectural design exploits domain knowledge by explicitly modeling the distinct characteristics of network traffic and biometric signals, allowing the model to capture modality-specific patterns that may be obscured in unified architectures.

5.3. Shortcut Connection ANN (ANN_v4)

The Shortcut Connection ANN architecture processes the entire 34-dimensional input vector, which combines network and biometric parameters, without domain-specific feature splitting. It consists of seven fully connected hidden layers with dimensions (256, 256, 128, 64, 64, 64, 64), each employing ReLU activation functions. A key feature of this architecture is the incorporation of four identity shortcut connections forming residual blocks: the output ×1 of Layer 1 is added to Layer 2, the output ×4 of Layer 4 is added to Layer 5, the output ×5 of Layer 5 (including the previous shortcut) is added to Layer 6, and the output ×6 of Layer 6 (including the previous shortcut) is added to Layer 7. These shortcut connections facilitate gradient flow during backpropagation by creating multiple pathways for information to bypass intermediate layers. This design maintains a parameter count comparable to the Enhanced Channel ANN (ANN_v1) while enhancing the network’s ability to refine feature representations at various abstraction levels, potentially improving its capacity to learn complex patterns. The architecture culminates in an output layer with 2 neurons for binary classification (see Figure 5).

Table 4 summarizes the architectures, activation functions, dropout usage, fusion mechanisms, and shortcut connections of the different neural network models.

6. Testing and Discussion

The training and testing environment was set up in Google Colab using Python 3 with key libraries including pandas and NumPy for data handling, scikit-learn for preprocessing, train-test splitting, and evaluation metrics, and PyTorch 2.9.0+cu126 for building and training neural network models. The environment utilized the Colab machine GPU-enabled runtime, which typically includes NVIDIA T4 GPU and approximately 12 GB of RAM, without GPU acceleration. Data preprocessing involved reading the dataset with pandas, feature scaling using scikit-learn’s StandardScaler, and balancing the dataset with a custom SMOTE implementation. The dataset is split into training and testing sets using an 80–20 split, where a hold-out validation method is used.

The training process is conducted for 200–500 epochs depending on the model architecture, with evaluation performed every 50 epochs to monitor performance metrics including AUC, accuracy, precision, recall, and F1-score. This facilitates early detection of overfitting trends and selection of the best model checkpoint.

All neural network models are trained with consistent hyperparameter settings to enable fair comparison, as summarized in Table 5. The learning rate is set to

1 \times 10^{- 3}

, with a batch size of 64 and weight decay of

5 \times 10^{- 4}

for regularization. We employ the AdamW optimizer, which provides adaptive learning rates with improved weight decay regularization.

The batch size of 64 strikes an optimal balance between gradient stability and computational resource utilization. To validate our hyperparameter choices, a grid search was conducted, exploring alternative configurations such as a batch size of 32 and a learning rate of

4 \times 10^{- 4}

, which helped confirm the superiority of our selected parameters. The batch size of 64 is suitable for the medium size of the dataset used in this study and the available RAM. By choosing 200 to 500 epochs, we ensure comprehensive learning potential for the complex architectures, allowing the model sufficient iterations to capture intricate dataset patterns while mitigating the risk of underfitting. The learning rate of

1 \times 10^{- 3}

facilitates smooth gradient descent, enabling meaningful weight updates without risking parameter divergence. Complementing this, a weight decay of

5 \times 10^{- 4}

introduces regularization, helping to prevent overfitting by penalizing complex model configurations and promoting generalization.

For addressing class imbalance using the weighted loss approach, we modify the standard cross-entropy loss by applying class weights of [0.15, 0.8] for normal and attack classes, respectively. This assigns approximately 5.3 times more importance to correctly classifying attack instances compared to normal instances.

6.1. Evaluation Metrics

To comprehensively assess model performance in the imbalanced dataset context of intrusion detection, we employ five complementary evaluation metrics:

Area Under the ROC Curve (AUC): Measures the model’s discrimination capability across all possible classification thresholds. AUC represents the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. Mathematically:

$AUC = \int_{0}^{1} TPR (t) \cdot FPR (t) d t$

(2)

where TPR is the true positive rate and FPR is the false positive rate at threshold t. AUC values range from 0.5 (random classification) to 1.0 (perfect classification). This metric is particularly valuable for imbalanced datasets as it is insensitive to class distribution.
Accuracy (ACC): The proportion of correctly classified instances among all instances:

$ACC = \frac{TP + TN}{TP + TN + FP + FN}$

(3)

While intuitive, accuracy can be misleading in imbalanced datasets, as high accuracy can be achieved by simply classifying all instances as the majority class.
Precision (PR): The proportion of true positive predictions among all positive predictions:

$PR = \frac{TP}{TP + FP}$

(4)

High precision indicates a low false positive rate, which is particularly important in intrusion detection systems where false alarms can lead to alert fatigue and reduced trust in the system.
Recall (RC): Also known as sensitivity or true positive rate, recall measures the proportion of actual positives that are correctly identified:

$RC = \frac{TP}{TP + FN}$

(5)

High recall indicates that the model successfully captures most attack instances, which is critical in security applications where missing an attack (false negative) can have severe consequences.
F1-score (F1): The harmonic mean of precision and recall, providing a balance between these two potentially competing metrics:

$F 1 = 2 \cdot \frac{PR \cdot RC}{PR + RC}$

(6)

F1-score ranges from 0 to 1, with higher values indicating better performance. This metric is particularly useful when seeking a balance between precision and recall.

In the context of intrusion detection, these metrics offer complementary perspectives on model performance. High precision is important to avoid overwhelming security administrators with false alarms, while high recall is essential to ensure that actual attacks are not missed. The F1-score helps balance these requirements, while AUC provides a threshold-independent assessment of discriminative ability. We report all these metrics using the macro-averaging approach, which calculates metrics independently for each class and then takes the average, treating all classes equally regardless of their frequency in the dataset.

By examining these metrics collectively rather than focusing on any single measure, we gain a more comprehensive understanding of each model’s strengths and weaknesses in detecting intrusions in IoMT environments.

6.2. Training Dynamics

The training dynamics across the six proposed neural network architectures are presented in Figure 6. The curve shown in Figure 6a demonstrates a classic, smooth loss reduction for the base ANN model, showing a rapid initial decline with minor fluctuations toward the end, indicating a stable convergence. The subplot of Figure 6b shows that ANN_v1 exhibits a faster convergence than the baseline, with a sharp drop in loss within the first 100 epochs. Subsequently, the loss stabilizes near zero with minor noise suggesting that the enhancement helps the model learn more efficiently. The curve shown in Figure 6c highlights that ANN_v2 has a behavior similar to ANN_v1. It shows rapid loss reduction early on and maintains a very low loss afterward. The ANN_v3 architecture as shown in Figure 6d reveals that the loss decreases quickly in the first 50 epochs and remains low. There is a noticeable spike around epoch 150, which may indicate a temporary instability. Figure 6e showcases that ANN_v4 converges very quickly, with a loss dropping within the first 50 epochs. The loss stabilizes at a very low value with minimal fluctuations, showing effective training. The shortcut connections likely facilitate gradient flow, improving convergence speed and stability. Finally, Figure 6f presents AE+ANN_v4 training loss. This model starts with a higher initial loss compared to ANN_v4 but decreases steadily over 400 epochs. The convergence is slower and less smooth, with more fluctuations throughout training. The final loss is higher than ANN_v4, suggesting that the autoencoder-based feature reduction may have introduced some information loss or complexity affecting training.

6.3. Testing Results

We evaluate all model architectures using five key performance metrics: Area Under the ROC Curve (AUC), Accuracy (ACC), Precision (PR), Recall (RC), and F1-score (F1). The performance results depicted in Table 6 and Figure 7 reveal critical insights for deploying intrusion detection, where patient safety and system reliability are paramount.

6.3.1. Impact of Neural Network Architecture

The results demonstrate a clear progression in performance as we move from standard architectures to more sophisticated neural network designs:

Standard vs. Enhanced ANNs: The enhancement through increased channel numbers (ANN_v1) consistently improves performance, confirming that greater parametrization enables better feature learning for this task.
Dual-Branch Architectures: The dual-branch models (ANN_v2 and ANN_v3) consistently achieve the highest performance across all balancing methods. The addition-based combination (ANN_v2) generally outperforms the concatenation approach (ANN_v3), suggesting that the summation of features from parallel branches provides more effective feature integration for intrusion detection.
Shortcut Connections: The ANN_v4 model with shortcut connections shows comparable performance to ANN_v1, indicating that for this particular task and dataset size, shortcut connections do not provide substantial additional benefits over simply increasing channel numbers.

Consequently, the dual-branch architecture with addition operations (ANN_v2) emerges as the most effective design, achieving the highest performance metrics across different balancing methods (0.90–0.94 accuracy) and showing its effectiveness in capturing the complex temporal and behavioral patterns characteristic of IoMT attacks. The poorest performers, ANN_v3 combined with weighted loss (0.82 accuracy, 0.69 precision) and ANN_v4 combined with weighted loss (0.84 accuracy, 0.71 precision), are unacceptable for IoMT intrusion detection as their low precision would generate excessive false alarms disrupting medical workflows, while their reduced accuracy creates dangerous security gaps that attackers could exploit to compromise patient care systems undetected.

6.3.2. Effectiveness of Class Balancing Methods

Our comparison of three class balancing approaches reveals important insights. First, SMOTE generally yields lower accuracy and precision compared to the other balancing methods. However, it maintains reasonably good recall, indicating its ability to identify attack instances.

Secondly, the Hybrid Over-Under Sampling approach outperforms SMOTE across all architectures except for AE+ANN_v4 Accuracy and Precision. It also achieves a better balance between precision and recall. The improved performance suggests that selective under-sampling of majority class instances combined with minority class duplication provides an effective balance for intrusion detection. Finally, the weighted cross-entropy Loss Function method yields the highest overall performance, particularly when combined with the ANN_v2 architecture (0.8786 AUC, 0.9403 accuracy, 0.8786 recall, an 0.8716 F1-score), making it the optimal choice for comprehensive IoMT threat detection where both false positives (unnecessary alerts causing alarm fatigue for medical staff) and false negatives (missed attacks endangering patient safety) must be minimized. This method demonstrates superior results compared to other methods while maintaining competitive precision. This suggests that preserving the original data distribution while adjusting the learning process is more effective than artificially altering the dataset distribution.

6.3.3. Impact of Dimensionality Reduction

The autoencoder-based dimensionality reduction (AE+ANN_v4) shows mixed results. With SMOTE, the AE+ANN_v4 model achieves the highest precision (0.9383) among all SMOTE-based models, which means when it flags malicious activity targeting critical medical devices (pacemakers, insulin pumps, ventilators), the alert is highly reliable—a crucial characteristic for preventing false alarms that could lead clinicians to ignore genuine security warnings. The trade-off is lower recall (0.7485) compared to other ANN architectures, indicating some attacks may go undetected.

With the weighted cross-entropy loss function, the AE+ANN_v4 model shows similar trends: high precision (0.8705) but reduced recall (0.7463), resulting in a lower overall F1-score (0.7909) compared to other ANN architectures.

These results suggest that while dimensionality reduction through autoencoders can improve precision by creating a more compact feature representation, it typically results in information loss that negatively impacts recall. For intrusion detection in IoMT environments, where detecting all potential attacks is critical, this trade-off may not be desirable.

Although this approach showed lower overall accuracy and recall compared to the ANN_v2 + Weighted Loss model, it offers valuable insights into the trade-offs between feature compression and detection effectiveness. Specifically, the autoencoder reduces the input feature dimensionality, which leads to faster inference times as demonstrated by the computational complexity analysis (Section 6.4), important consideration for resource-constrained IoMT devices. Additionally, the AE+ANN_v4 + SMOTE model demonstrated higher precision in some cases, indicating its ability to reduce false positives, which is useful in practical security applications to avoid alert fatigue.

6.3.4. Robustness Against Overfitting

The models implemented in this study incorporate several measures to reduce overfitting and ensure generalization to unseen data. Key strategies include the use of dropout regularization and weighted cross-entropy loss.

Indeed, in the dual-branch models (ANN_v2 and ANN_v3), dropout with a rate of 0.4 is applied after several shared layers. This technique randomly disables neurons during training, preventing co-adaptation of features and reducing overfitting.

Moreover, instead of relying solely on data augmentation techniques like SMOTE, the best-performing model uses a weighted loss function that assigns higher importance to the minority (attack) class. This approach helps the model learn balanced representations without artificially inflating the minority class, which can sometimes lead to overfitting.

Finally, the training process includes regular evaluation every 50 epochs, allowing monitoring of performance metrics such as loss, AUC, precision, recall, and F1-score. This facilitates early detection of overfitting trends and selection of the best model checkpoint.

The results demonstrate that models trained with weighted cross-entropy loss achieve high accuracy and balanced performance metrics, indicating effective generalization rather than overfitting. The consistent evaluation and regularization techniques contribute to these acceptable results, supporting the reliability of the models in intrusion detection tasks within IoMT environments.

6.4. Computational Complexity

Given the resource constraints typical of IoMT devices, computational complexity is a critical factor in the practical deployment of intrusion detection models.

We analyzed execution time as it is crucial for an energy-efficient deployment of machine learning models in the context of resource-constrained IoMT environments. The performance analysis of our neural network models reveals significant variations in computational efficiency, as detailed in Table 7. The training times demonstrate a notable progression across different model architectures, with ANN requiring 234.992 s, escalating to 471.676 s for ANN_v1, and reaching 730.803 s for ANN_v2, 950.360 s for ANN_v3, and 1110.085 for AE+ANN_v4. These increasing training durations suggest growing model complexity and potentially more sophisticated feature extraction mechanisms. Inference time, while comparatively brief, also exhibits model-specific characteristics, with ANN_v4 showing the most efficient inference time of 0.0331 s. The achieved results remain acceptable, as the training is intended to be performed on edge nodes.

The architectures evaluated in this study vary significantly in terms of model size and computational demands. For instance, the Enhanced Channel ANN (ANN_v1) and Shortcut Connection ANN (ANN_v4) feature wider layers with up to 256 neurons per layer, resulting in larger parameter counts and increased memory requirements compared to the Standard ANN (ANN) with narrower layers. Dual-branch models (ANN_v2 and ANN_v3) introduce additional complexity due to parallel processing paths and fusion mechanisms, which can increase both training time and inference latency.

Training times reported in the study, conducted on a standard Google Colab environment with GPU acceleration, ranged from 200 to 500 epochs depending on the model architecture, with larger models naturally requiring longer training durations. While this training overhead is acceptable in offline settings, it highlights the need for efficient training strategies or hardware acceleration when deploying such models in real-world IoMT environments.

Inference delay is particularly critical for IoMT devices that operate in real-time or near-real-time scenarios. Models with extensive layers and shortcut connections, such as ANN_v4, may introduce additional computational overhead during inference due to the residual operations, while dual-branch architectures require processing two separate input streams before fusion, potentially increasing latency. However, the modular design of dual-branch models may allow for parallel computation optimizations.

Model size also impacts storage and memory footprint on IoMT devices, which often have limited capacity. The Standard ANN, with its smaller layer sizes, offers a more lightweight solution but at the cost of reduced detection performance. Conversely, the more complex architectures provide improved accuracy and robustness but require careful consideration of deployment constraints.

While advanced neural network architectures demonstrate superior intrusion detection capabilities, their computational complexity poses challenges for resource-constrained IoMT devices. This study serves as an initial but comprehensive evaluation of advanced neural network architectures for IoMT intrusion detection, providing critical performance benchmarks and architectural insights that can inform real deployment decision-making. While the current work focuses on centralized offline training and evaluation, its findings lay the groundwork for integrating more scalable and privacy-preserving techniques such as federated learning, which enables collaborative model training across distributed devices without sharing sensitive data. Additionally, batch processing frameworks like Apache Spark can be leveraged to optimize training and inference workflows at scale, enhancing computational efficiency in real-world settings. Thus, this study not only advances the state of the art but also offers a practical foundation for future integration with these advanced deployment strategies.

6.5. Comparative Analysis with Previous Work

In [13], the application of ELM architectures for intrusion detection in IoMT environments was explored using the same dataset. That study established baseline performance metrics for ELM models with varying hidden layer sizes (64, 128, and 256 nodes) and demonstrated that ELM (256) with SMOTE could achieve an AUC of 0.7789 and an F1-score of 0.7223. While promising, these results indicated limitations in the ELM’s ability to capture the complex patterns necessary for optimal intrusion detection.

The current study represents a significant advancement over this previous work in several key aspects:

Performance Improvement: The suggested dual-branch ANN architecture with addition operations (ANN_v2) combined with a weighted loss function achieves an AUC of 0.8786 and an F1-score of 0.8716, representing relative improvements of 12.8% and 20.7% respectively over the best ELM model from [13].
Architectural Sophistication: Moving beyond the single hidden layer constraint of ELM, our current work explores multi-layer architectures with various connectivity patterns, demonstrating that architectural design choices significantly impact detection performance.
Dimensionality Reduction Analysis: While [13] work focused on direct classification of input features, this study provides critical insights into the trade-offs associated with autoencoder preprocessing, revealing that the information loss during dimensionality reduction compromises recall—a crucial metric for security applications.

Compared with other recent studies using the same dataset, this work offers distinctive contributions. Hady et al. [22] achieved an accuracy of 92.46% with SVM and an AUC of 92.98% with ANN by integrating both medical and network data. While their approach uses the same dataset, our best model achieves superior accuracy (94.03%) through architectural innovation. Moreover, the use of weighted loss offers robustness against overfitting compared to the SMOTE approach used in [22].

Mohammed et al. [30] reported impressive results using an ensemble-based approach with random over-sampling, achieving 99.80% accuracy and a 0.9980 F1-score. However, their use of random over-sampling (direct duplication of minority samples) potentially introduces overfitting risks, as the model may memorize specific attack instances rather than learning generalizable patterns. In contrast, our approach employs more sophisticated class balancing techniques and achieves robust performance without ensemble methods, offering a more streamlined deployment path.

Table 8 summarizes these comparisons, highlighting the progression from ELM-based approaches to our current advanced ANN architectures.

This comparative analysis highlights the evolutionary progression in IoMT intrusion detection techniques. While the previous ELM-based work established important baselines, the current study’s advanced ANN architectures offer substantial performance improvements. It demonstrates a substantial increase in F1 Score by 8.45%, a marked improvement of 18.67% in AUC and Recall, a slight increase in Accuracy by +1.05%, and a minor trade-off in Precision by 9.12%. The substantial increase in Recall is particularly noteworthy. In security-critical applications, the ability to detect potential attacks is paramount. The 18.67% improvement in Recall means that the model is significantly more effective at identifying potential security threats and minimizing false negatives. Moreover, the analysis of autoencoder preprocessing provides crucial insights into the limitations of dimensionality reduction in security applications, where the preservation of all potentially relevant features may be more important than computational efficiency. These findings contribute valuable knowledge to guide future development of intrusion detection systems for healthcare environments, where the balance between detection accuracy and computational efficiency must be carefully optimized.

7. Conclusions

This study conducted a comprehensive evaluation of advanced neural network architectures and autoencoder preprocessing for intrusion detection in Internet of Medical Things (IoMT) environments. Our systematic investigation has yielded several significant findings with important implications for the design and implementation of security systems in healthcare contexts.

The results demonstrate that architectural design significantly impacts intrusion detection performance. Our dual-branch neural network with addition operations (ANN_v2) combined with a weighted cross-entropy loss function achieved superior performance (0.9403 accuracy, 0.8786 AUC, and a 0.8716 F1-score), substantially outperforming conventional architectures. This finding underscores the value of specialized network designs that explicitly account for the heterogeneous nature of IoMT data, processing networks, and biometric features through separate pathways before integration.

Our analysis of dimensionality reduction through autoencoders revealed an important trade-off: while autoencoder preprocessing improved precision (up to 93.83%), it consistently reduced recall (down to 74.85%). This precision–recall trade-off is particularly problematic in security-critical applications like intrusion detection, where missed attacks (false negatives) can have severe consequences. The results suggest that preserving the full feature space is preferable for IoMT intrusion detection, as the information loss during compression appears to disproportionately affect the model’s ability to identify attack instances.

Among class imbalance mitigation strategies, weighted loss functions consistently outperformed both SMOTE and hybrid sampling approaches across most architectures. This indicates that maintaining the natural distribution of network traffic while adjusting the learning objective provides more effective model training than artificially altering the dataset distribution. This approach better preserves the realistic context in which intrusion detection systems must operate, where normal traffic significantly outnumbers attack instances.

Several limitations of this study present opportunities for future research. First, while our models achieve high performance on the WUSTL-EHMS-2020 dataset, validation across multiple IoMT datasets would strengthen the generalizability of our findings. Second, the current work focused primarily on binary classification (normal vs. attack); future research should extend to multi-class classification to distinguish between different attack types, enabling more targeted security responses.

Author Contributions

Conceptualization, A.C. and H.A.; methodology, A.C.; software, A.C.; validation, A.C. and H.A.; formal analysis, A.C. and H.A.; investigation, A.C. and H.A.; resources, A.C.; data curation, A.C.; writing—original draft preparation, A.C.; writing—review and editing, A.C. and H.A.; visualization, A.C. and H.A.; supervision, A.C. and H.A.; project administration, A.C.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia under grant no. (IPP: 1163-612-2025). The authors, therefore, acknowledge with thanks DSR for technical and financial support.

Data Availability Statement

The original data used in the study are openly available at https://www.cse.wustl.edu/~jain/ehms/index.html (accessed on 19 October 2025).

Acknowledgments

The authors acknowledge with thanks the technical and financial support provided by Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Islam, S.R.; Kwak, D.; Kabir, M.H.; Hossain, M.; Kwak, K.S. The Internet of Things for health care: A comprehensive survey. IEEE Access 2015, 3, 678–708. [Google Scholar] [CrossRef]
Grand View Research. Internet of Medical Things Market Size, Share & Trends Analysis Report by Component (Hardware, Software, Services), by Deployment (On-Premise, Cloud), by Application (Telemedicine, Clinical Operations, Medication Management), by End Use (Hospitals, Clinics, Research Institutes), by Region, and Segment Forecasts, 2025–2030; Technical Report; Report ID: GVR-4-68040-134-8; Grand View Research, Inc.: San Francisco, CA, USA, 2025. [Google Scholar]
Mordor Intelligence. Internet of Medical Things Market Size & Share Analysis—Growth Trends & Forecasts (2025–2030); Technical Report; Mordor Intelligence: Hyderabad, India, 2025. [Google Scholar]
Armis Security. Mastering Patient-Centric Cybersecurity for Healthcare: A Playbook for Healthcare Delivery Organizations; Armis Security: San Francisco, CA, USA, 2025. [Google Scholar]
Federal Bureau of Investigation. Unpatched and Outdated Medical Devices Provide Cyber Attack Opportunities; Private Industry Notification PIN 20220912-001; Federal Bureau of Investigation, Cyber Division: Washington, DC, USA, 2022.
Zarpelão, B.B.; Miani, R.S.; Kawakani, C.T.; de Alvarenga, S.C. A survey of intrusion detection in Internet of Things. J. Netw. Comput. Appl. 2017, 84, 25–37. [Google Scholar] [CrossRef]
Butun, I.; Morgera, S.D.; Sankar, R. A survey of intrusion detection systems in wireless sensor networks. IEEE Commun. Surv. Tutor. 2013, 16, 266–282. [Google Scholar] [CrossRef]
Mujahid, M.; Mirdad, A.R.; Alamri, F.S.; Ara, A.; Khan, A. Software defined network intrusion system to detect malicious attacks in computer Internet of Things security using deep extractor supervised random forest technique. PeerJ Comput. Sci. 2025, 11, e3103. [Google Scholar] [CrossRef] [PubMed]
Farhan, S.; Mubashir, J.; Haq, Y.U.; Mahmood, T.; Rehman, A. Enhancing network security: An intrusion detection system using residual network-based convolutional neural network. Clust. Comput. 2025, 28, 251. [Google Scholar] [CrossRef]
Alrayes, F.S.; Zakariah, M.; Amin, S.U.; Khan, Z.I.; Alqurni, J.S. CNN Channel Attention Intrusion Detection System Using NSL-KDD Dataset. Comput. Mater. Contin. 2024, 79, 4319. [Google Scholar] [CrossRef]
Mitchell, R.; Chen, I.R. A survey of intrusion detection techniques for cyber-physical systems. ACM Comput. Surv. (CSUR) 2014, 46, 55. [Google Scholar] [CrossRef]
Farnaaz, N.; Jabbar, M. Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef]
Cherif, A. Intrusion Detection for Internet of Medical Things (IoMT) using Extreme Learning Machine. In Proceedings of the 2025 2nd International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 9–11 February 2025; pp. 1–7. [Google Scholar] [CrossRef]
Yamashita, T.; Hirasawa, K.; Hu, J.; Murata, J. Multi-branch structure of layered neural networks. In Proceedings of the 9th International Conference on Neural Information Processing (ICONIP’02), Singapore, 18–22 November 2002; Volume 1, pp. 243–247. [Google Scholar] [CrossRef]
Geirhos, R.; Jacobsen, J.H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
Diro, A.A.; Chilamkurti, N. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Gener. Comput. Syst. 2018, 82, 761–768. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Hady, A.A. WUSTL-EHMS-2020. 2020. Available online: https://www.cse.wustl.edu/~jain/ehms/index.html (accessed on 20 October 2024).
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Information and Computer Science, University of California, Irvine. KDD Cup 1999 Data. 2007. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 19 October 2024).
Zhang, J.; Zulkernine, M.; Haque, A. Random-Forests-Based Network Intrusion Detection Systems. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2008, 38, 649–659. [Google Scholar] [CrossRef]
Hady, A.A.; Ghubaish, A.; Salman, T.; Unal, D.; Jain, R. Intrusion Detection System for Healthcare Systems Using Medical and Network Data: A Comparison Study. IEEE Access 2020, 8, 106576–106584. [Google Scholar] [CrossRef]
Li, Y.; Xia, J.; Zhang, S.; Yan, J.; Ai, X.; Dai, K. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Syst. Appl. 2012, 39, 424–430. [Google Scholar] [CrossRef]
Tesfahun, A.; Bhaskari, D.L. Intrusion Detection Using Random Forests Classifier with SMOTE and Feature Reduction. In Proceedings of the 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, Pune, India, 15–16 November 2013; pp. 127–132. [Google Scholar] [CrossRef]
Shah, B.; Trivedi, B.H. Reducing Features of KDD CUP 1999 Dataset for Anomaly Detection Using Back Propagation Neural Network. In Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, India, 21–22 February 2015; pp. 247–251. [Google Scholar] [CrossRef]
Zaib, M.H. NSL KDD Dataset. 2024. Available online: https://www.kaggle.com/datasets/hassan06/nslkdd (accessed on 19 July 2024).
Kale, R.; Lu, Z.; Fok, K.W.; Thing, V.L.L. A Hybrid Deep Learning Anomaly Detection Framework for Intrusion Detection. In Proceedings of the 2022 IEEE 8th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Jinan, China, 6–8 May 2022; pp. 137–142. [Google Scholar] [CrossRef]
Albulayhi, K.; Abu Al-Haija, Q.; Alsuhibany, S.A.; Jillepalli, A.A.; Ashrafuzzaman, M.; Sheldon, F.T. IoT Intrusion Detection Using Machine Learning with a Novel High Performing Feature Selection Method. Appl. Sci. 2022, 12, 5015. [Google Scholar] [CrossRef]
Iwendi, C.; Anajemba, J.H.; Biamba, C.; Ngabo, D. Security of Things Intrusion Detection System for Smart Healthcare. Electronics 2021, 10, 1375. [Google Scholar] [CrossRef]
Alani, M.M.; Mashatan, A.; Miri, A. Explainable Ensemble-Based Detection of Cyber Attacks on Internet of Medical Things. In Proceedings of the 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Abu Dhabi, United Arab Emirates, 14–17 November 2023; pp. 0609–0614. [Google Scholar] [CrossRef]
Nayak, J.; Meher, S.K.; Souri, A.; Naik, B.; Vimal, S. Extreme learning machine and bayesian optimization-driven intelligent framework for IoMT cyber-attack detection. J. Supercomput. 2022, 78, 14866–14891. [Google Scholar] [CrossRef]
Nour, M. ToN_IoT Datasets. 2024. Available online: https://research.unsw.edu.au/projects/toniot-datasets (accessed on 19 July 2024).
Ramya, M.; Sudhakaran, P.; Sivagnanam, Y.; Santhana Krishnan, C. Advanced intrusion detection technique (AIDT) for secure communication among devices in internet of medical things (IoMT). EURASIP J. Wirel. Commun. Netw. 2025, 2025, 34. [Google Scholar] [CrossRef]
Akar, G.; Sahmoud, S.; Onat, M.; Cavusoglu, Ü.; Malondo, E. L2D2: A Novel LSTM Model for Multi-Class Intrusion Detection Systems in the Era of IoMT. IEEE Access 2025, 13, 7002–7013. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Dadkhah, S.; Neto, E.C.P.; Ferreira, R.; Molokwu, R.C.; Sadeghi, S.; Ghorbani, A.A. CICIoMT2024: A Benchmark Dataset for Multi-Protocol Security Assessment in IoMT. Internet Things 2024, 28, 101351. [Google Scholar] [CrossRef]
Alazab, M.; Awajan, A.; Obeidat, A.; Faruqui, N.; Rehman, H.U. IntruSafe: A FCNN-LSTM hybrid IoMT intrusion detection system for both string and 2D-spatial data using sandwich architecture. Neural Comput. Appl. 2025, 37, 23395–23422. [Google Scholar] [CrossRef]
Wardhani, R.W.; Putranto, D.S.C.; Jo, U.; Kim, H. Toward enhanced attack detection and explanation in intrusion detection system-based IoT environment data. IEEE Access 2023, 11, 131661–131676. [Google Scholar]
Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
Binbusayyis, A.; Alaskar, H.; Vaiyapuri, T.; Dinesh, M. An investigation and comparison of machine learning approaches for intrusion detection in IoMT network. J. Supercomput. 2022, 78, 17403–17422. [Google Scholar] [CrossRef] [PubMed]
Alrayes, F.S.; Zakariah, M.; Amin, S.U.; Khan, Z.I.; Helal, M. Hybrid Cross-Temporal Contrastive Model with Spiking Energy-Efficient Network Intrusion Detection in IOMT. Int. J. Comput. Intell. Syst. 2025, 18, 270. [Google Scholar] [CrossRef]
Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2020, 39, 539–550. [Google Scholar]

Figure 1. Methodology flowchart illustrating the parallel evaluation paths for neural network architectures with and without autoencoder dimensionality reduction.

Figure 2. Autoencoder architecture for dimensionality reduction, converting 34-dimensional input to 16-dimensional representation.

Figure 3. Standard ANN and Enhanced Channel ANN (ANN_v1) architectures. Both models process 34 input features through multiple fully connected layers with ReLU activations, differing primarily in the width of hidden layers. (a) Standard ANN architecture with progressively decreasing hidden layer sizes. (b) Enhanced Channel ANN (ANN_v1) with wider hidden layers to increase representational capacity.

Figure 4. Architecture diagram of the dual-branch ANN models (ANN_v2 and ANN_v3), illustrating feature splitting, parallel processing paths, and different fusion mechanisms. The left model (ANN_v2) uses element-wise addition to combine features from both branches, while the right model (ANN_v3) uses concatenation to preserve all information from both branches. Both architectures employ dropout regularization after each shared layer to prevent overfitting.

Figure 5. Architecture diagram of ANN_v4, illustrating ReLU activations and residual-like shortcut connections. The network comprises eight layers with four skip connections that facilitate gradient flow by providing alternative information pathways during training. Each shortcut corresponds to an addition operation in the forward pass.

Figure 6. Training Loss Progression for the Proposed Neural Network Architectures: The plots demonstrate diverse convergence behaviors, with all models showing rapid initial loss reduction followed by stabilization. Variations in convergence speed and pattern highlight the impact of architectural modifications on learning dynamics. (a) ANN (b) ANN_v1 (c) ANN_v2 (d) ANN_v3 (e) ANN_v4 (f) AE+ANN_v4.

Figure 7. Heatmap illustrating the performance of different ANN-based models under three class balancing methods (SMOTE, Hybrid, and Weighted Cross-Entropy Loss). The color intensity reflects metric values across ACC, AUC, PR, RC and F1-score.

Table 1. Summary of Related Work.

Authors	Dataset	Methodology	Results
Zhang et al. [21]	KDD 1999	Random Forest (RF) for anomaly detection	95% accuracy, 1% false-positive rate
Li et al. [23]	KDD 1999	Clustering, Ant Colony Algorithm, SVM	98.62% accuracy, MCC of 0.861
Shah et al. [25]	KDD 1999	Information Gain (IG) for feature reduction	Improved model performance with reduced dataset
Tesfahun et al. [24]	KDD 1999	Random Forest with IG	Enhanced generalization capacity
Kale et al. [27]	NSL-KDD, CIC-IDS2018, TON IoT	Three-stage deep learning framework (K-means, GANomaly, CNN)	91.6% accuracy on NSL-KDD
Albulayhi et al. [28]	NSL-KDD	Feature selection using set theory	99.98% classification accuracy
Iwendi et al. [29]	NSL-KDD	RF with Genetic Algorithm for feature optimization	98.81% detection rate, 0.8% false alarm rate
Nayak et al. [31]	ToN_IoT	Bayesian Optimization and ELM	High precision and recall, but no class imbalance solution
Hady et al. [22]	Custom dataset WUSTL-EHMS-2020 (16,000 records)	Integration of medical and network data using EHMS testbed	Improved performance by 7% to 25%; SVM accuracy 92.46%, ANN AUC 92.98%
Mohammed M. et al. [30]	WUSTL-EHMS-2020	Ensemble learning and explainable AI with random over sampling	99.96% accuracy and 0.998 F1 score
Cherif A. [13]	WUSTL-EHMS-2020	Multiple neural network architectures with three class balancing approaches	93.05% accuracy, 0.9518 precision and 0.8037 F1-score with weighted loss
Ramya et al. [33]	Dataset proposed in [40]	Particle swarm Optimization combined with a probabilistic neural network	Accuracy: 96.4%, F1-score: up to 95.67%
Akar et al. [34]	CIC-IoMT2024 [36]	LSTM-based model	Accuracy: 98%, F1-score: 98%
AlAzab et al. [37]	CIC-IoMT2023 and Malimg datasets	FCNN combined with LSTM	Accuracy: 97.66%, F1-score: 97.85%

Table 2. Comparative Analysis of Related Works.

Ref.	Advanced Architectures	IoMT Dataset	Class Imbalance Handling	Feature Reduction
[21]	-	-	-	-
[23]	-	-	-	✓
[25]	-	-	-	✓
[24]	-	-	-	-
[27]	✓	✓	-	-
[28]	-	✓	-	✓
[29]	-	-	-	✓
[31]	-	-	-	✓
[22]	-	✓	✓	-
[30]	✓	✓	✓
[13]	-	✓	✓	-
[33]	-	✓	-	✓
[34]	-	✓	✓	-
[37]	✓	-	-	-
[41]	✓	-	✓	-

Table 3. Dataset Statistical Information.

Measurement	Value
Size	4.4 MB
Normal samples	14,272 (87.5%)
Attack samples	2046 (12.5%)
Total number of samples	16,318

Table 4. Summary of neural network architectures, detailing layer dimensions, activation functions, dropout usage, fusion mechanisms, and shortcut connections.

Model	Layer Dimensions	Activation Function	Dropout and Fusion Details
ANN	Input: 34 Hidden: [40, 40, 20, 10, 10, 10, 10] Output: 2	ReLU	None
ANN_v1	Input: 34 Hidden: [256, 256, 128, 64, 64, 64, 64] Output: 2	ReLU	None
ANN_v2	Network Branch: [256, 256] Biometric Branch: [256, 256] Shared Layers: [128, 64, 64] Output: 2	ReLU	Dropout (0.4) after shared layers; Fusion: element-wise addition scaled by 0.5
ANN_v3	Network Branch: [256, 256] Biometric Branch: [256, 256] Shared Layers: [256, 128, 64] Output: 2	ReLU	Dropout (0.4) after shared layers; Fusion: concatenation of branch outputs
ANN_v4	Input: 34 Hidden: [256, 256, 128, 64, 64, 64, 64] Output: 2	ReLU	Shortcut connections between layers 1–2, 4–5, 5–6, 6–7; No dropout

Table 5. Hyperparameter Settings.

Hyperparameter	Value
Learning rate	$1 \times 10^{- 3}$
Batch size	64
Weight decay	$5 \times 10^{- 4}$
Optimizer	AdamW
Loss function	Cross Entropy (standard or weighted)
Evaluation interval	50 epochs
Maximum epochs	200–500 (architecture dependent)

Table 6. Performance Results of Different Neural Network Architectures and Class Balancing Methods.

Model	Class Balancing Method	AUC	ACC	PR	RC	F1
ANN	SMOTE	0.8491	0.8753	0.7427	0.8491	0.7783
ANN_v1	SMOTE	0.8544	0.8983	0.7750	0.8544	0.8062
ANN_v2	SMOTE	0.8766	0.8955	0.7721	0.8766	0.8096
ANN_v3	SMOTE	0.8740	0.9032	0.7851	0.8740	0.8182
ANN_v4	SMOTE	0.8554	0.9035	0.7839	0.8554	0.8129
AE+ANN_v4	SMOTE	0.7485	0.9308	0.9383	0.7485	0.8085
ANN	Hybrid	0.8518	0.9179	0.8132	0.8518	0.8308
ANN_v1	Hybrid	0.8577	0.9213	0.8201	0.8577	0.8373
ANN_v2	Hybrid	0.8750	0.9323	0.8437	0.8750	0.8582
ANN_v3	Hybrid	0.8671	0.9203	0.8163	0.8671	0.8387
ANN_v4	Hybrid	0.8739	0.9151	0.8047	0.8739	0.8335
AE+ANN_v4	Hybrid	0.7925	0.9059	0.7942	0.7925	0.7934
ANN	Weighted cross-entropy Loss	0.8588	0.9145	0.8049	0.8588	0.8283
ANN_v1	Weighted cross-entropy Loss	0.8559	0.9197	0.8168	0.8559	0.8345
ANN_v2	Weighted cross-entropy Loss	0.8786	0.9403	0.8650	0.8786	0.8716
ANN_v3	Weighted cross-entropy Loss	0.8467	0.8161	0.6938	0.8467	0.7216
ANN_v4	Weighted cross-entropy Loss	0.8534	0.8382	0.7096	0.8534	0.7431
AE+ANN_v4	Weighted cross-entropy Loss	0.7463	0.9200	0.8705	0.7463	0.7909

Values in bold denote the best-performing results.

Table 7. Computational Performance Metrics of Neural Network Architectures.

Model	Total Training Time (s)	Training Time per Epoch (s)	Inference Time (s)
ANN	234.992	0.47	0.0557
ANN_v1	471.676	0.94	0.0680
ANN_v2	730.803	1.56	0.0921
ANN_v3	950.360	1.90	0.0631
ANN_v4	164.865	0.82	0.0331
AE+ANN_v4	1110.085	2.77	0.0329

Table 8. Comparative Analysis of IoMT Intrusion Detection Approaches on WUSTL-EHMS-2020 Dataset.

Study	Approach	Acc	F1	AUC	PR	RC
Cherif A. [13]	ELM (256) + SMOTE	0.8444	0.7223	0.7789	0.6949	0.7789
Cherif A. [13]	ELM (256) + Weighted cross-entropy Loss	0.9305	0.8037	0.7404	0.9518	0.7404
Hady et al. [22]	SVM with SMOTE	0.9246	Not reported	0.8237	Not reported	Not reported
Hady et al. [22]	ANN with SMOTE	0.9040	Not reported	0.9342	Not reported	Not reported
Mohammed et al. [30]	Ensemble with random over-sampling	0.9980	0.9980	Not reported	0.9980	0.9980
Current study	ANN_v2 + Weighted cross-entropy Loss	0.9403	0.8716	0.8786	0.8650	0.8786
Current study	AE+ANN_v4 + SMOTE	0.9308	0.8085	0.7485	0.9383	0.7485

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ammar, H.; Cherif, A. Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling. IoT 2025, 6, 74. https://doi.org/10.3390/iot6040074

AMA Style

Ammar H, Cherif A. Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling. IoT. 2025; 6(4):74. https://doi.org/10.3390/iot6040074

Chicago/Turabian Style

Ammar, Heyfa, and Asma Cherif. 2025. "Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling" IoT 6, no. 4: 74. https://doi.org/10.3390/iot6040074

APA Style

Ammar, H., & Cherif, A. (2025). Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling. IoT, 6(4), 74. https://doi.org/10.3390/iot6040074

Article Menu

Optimizing IoMT Security: Performance Trade-Offs Between Neural Network Architectural Design, Dimensionality Reduction, and Class Imbalance Handling

Abstract

1. Introduction

2. Related Work

2.1. Traditional Machine Learning Approaches

2.2. IoMT-Specific Approaches

2.3. Research Gaps and Opportunities

3. Methodology

4. Dataset Selection and Preprocessing

4.1. Data Preprocessing

4.2. Dimensionality Reduction

4.3. Class Imbalance Strategies

4.3.1. Synthetic Minority Over-Sampling Technique (SMOTE)

4.3.2. Hybrid Over-Under Sampling

4.3.3. Weighted Cross-Entropy Loss Function

5. Neural Network Architectures

5.1. Standard and Enhanced Channel Neural Network Architectures

5.2. Dual-Branch Architectures (ANN_v2 and ANN_v3)

5.3. Shortcut Connection ANN (ANN_v4)

6. Testing and Discussion

6.1. Evaluation Metrics

6.2. Training Dynamics

6.3. Testing Results

6.3.1. Impact of Neural Network Architecture

6.3.2. Effectiveness of Class Balancing Methods

6.3.3. Impact of Dimensionality Reduction

6.3.4. Robustness Against Overfitting

6.4. Computational Complexity

6.5. Comparative Analysis with Previous Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI