An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning

K. S, Suresh; Elumalai, Thenmozhi; Rajamani, Radhakrishnan; Kumar, Anubhav; Balusamy, Balamurugan; Yogarayan, Sumendra; Prabu, Kaliyaperumal

doi:10.3390/fi18010054

Open AccessArticle

An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning

by

Suresh K. S

¹

,

Thenmozhi Elumalai

²

,

Radhakrishnan Rajamani

³

,

Anubhav Kumar

³

,

Balamurugan Balusamy

^4,*

,

Sumendra Yogarayan

^5,* and

Kaliyaperumal Prabu

⁶

¹

School of Computing, SASTRA Deemed to Be University, Tamilnadu 613401, India

²

Department of Information Technology, Panimalar Engineering College, Chennai 600123, India

³

School of Computer Science and Engineering, Galgotias University, Delhi 203201, India

⁴

School of Engineering and IT, Manipal Academy of Higher Education, Dubai Campus, Dubai 345050, United Arab Emirates

⁵

Faculty of Information Science and Technology, Multimedia University, Melaka 75450, Malaysia

⁶

School of Computer Science and Engineering, IILM University, Delhi 201306, India

^*

Authors to whom correspondence should be addressed.

Future Internet 2026, 18(1), 54; https://doi.org/10.3390/fi18010054

Submission received: 15 December 2025 / Revised: 12 January 2026 / Accepted: 14 January 2026 / Published: 19 January 2026

(This article belongs to the Special Issue Cloud and Edge Computing for the Next-Generation Networks)

Download

Browse Figures

Versions Notes

Abstract

Cloud computing environments generate high-dimensional, large-scale, and highly dynamic network traffic, making intrusion diagnosis challenging due to evolving attack patterns, severe traffic imbalance, and limited availability of labeled data. To address these challenges, this study presents an unsupervised, cloud-centric intrusion diagnosis framework that integrates autoencoder-based representation learning with density-based attack categorization. A dual-stage autoencoder is trained exclusively on benign traffic to learn compact latent representations and to identify anomalous flows using reconstruction-error analysis, enabling effective anomaly detection without prior attack labels. The detected anomalies are subsequently grouped using density-based learning to uncover latent attack structures and support fine-grained multiclass intrusion diagnosis under varying attack densities. Experiments conducted on the large-scale CSE-CIC-IDS2018 dataset demonstrate that the proposed framework achieves an anomaly detection accuracy of 99.46%, with high recall and low false-negative rates in the optimal latent-space configuration. The density-based classification stage achieves an overall multiclass attack classification accuracy of 98.79%, effectively handling both majority and minority attack categories. Clustering quality evaluation reports a Silhouette Score of 0.9857 and a Davies–Bouldin Index of 0.0091, indicating strong cluster compactness and separability. Comparative analysis against representative supervised and unsupervised baselines confirms the framework’s scalability and robustness under highly imbalanced cloud traffic, highlighting its suitability for future Internet cloud security ecosystems.

Keywords:

cloud computing; dual functions; imbalanced traffic; anomaly detection; unsupervised learning; Adaptive IDS

1. Introduction

Cloud computing has significantly transformed modern data management by offering scalable, flexible, and cost-efficient access to computing resources over the Internet [1,2]. Despite these advantages, the rapid adoption of cloud services has introduced considerable security challenges. Cloud platforms are inherently distributed and multi-tenant, exposing them to a wide range of cyber threats such as data breaches, unauthorized access attempts, and Distributed Denial of Service (DDoS) attacks [3,4]. Traditional security mechanisms, including firewalls and antivirus solutions [5] are insufficient in addressing the dynamic and evolving nature of such threats [6,7].

Intrusion Detection Systems (IDS) have consequently become indispensable for monitoring network traffic and system activities to identify malicious behaviors and prevent potential attacks [8,9]. However, applying conventional IDS techniques to cloud environments remains challenging due to the massive volume, high dimensionality, and heterogeneous nature of cloud traffic. Cloud IDS also face issues such as imbalanced datasets—where benign traffic dominates malicious traffic—and the necessity for near real-time threat response [10,11]. Moreover, distinguishing benign anomalies from true attacks remains a persistent difficulty, often resulting in high false-positive rates.

Beyond any specific benchmark dataset, cloud environments introduce intrinsic challenges for intrusion detection systems. These include highly heterogeneous and high-dimensional traffic generated by virtualization and container orchestration, severe and dynamic class imbalance due to rare yet impactful attacks, continuously evolving attack behaviors, and limited availability of timely labeled data. In addition, cloud-scale intrusion detection systems must support low-latency processing, scalability across distributed infrastructures, and robustness to shifting traffic patterns. The proposed framework is designed to address these fundamental challenges through unsupervised representation learning and density-aware attack diagnosis.

Beyond the rapid expansion of cloud infrastructures, the threat landscape has evolved toward stealthy, adaptive, and low-frequency attacks that deliberately evade static signatures and rule-based defenses [12,13]. Modern cloud platforms generate highly heterogeneous traffic due to virtualization, container orchestration, elastic resource provisioning, and multi-tenant workload consolidation [14]. These operational characteristics significantly blur the distinction between benign workload fluctuations and malicious behaviors, increasing uncertainty in intrusion diagnosis. As a result, cloud intrusion detection systems must move beyond binary alerting and provide structured diagnostic insight capable of distinguishing benign anomalies from genuine cyber threats with minimal false alarms.

A fundamental challenge in this context is the limited availability of reliable attack labels. In real-world cloud deployments, traffic is often unlabeled or weakly labeled due to privacy constraints, delayed incident reporting, and the continuous emergence of zero-day attacks [15,16]. This limitation restricts the effectiveness of supervised learning-based IDS solutions, which depend heavily on balanced and accurately annotated datasets. Although unsupervised learning techniques offer greater flexibility, they frequently struggle with high-dimensional feature spaces, severe class imbalance, and overlapping attack distributions, resulting in unstable detection thresholds and ambiguous cluster formation [17].

Dimensionality reduction therefore plays a critical role in enabling scalable and reliable cloud intrusion diagnosis. Deep representation learning models, particularly autoencoders, can compress high-dimensional traffic features into compact latent spaces that suppress noise and highlight intrinsic behavioral structures. When combined with density-aware clustering methods, these latent representations allow intrusion detection systems to capture diverse attack patterns without prior class knowledge, even under highly imbalanced traffic conditions [18]. This capability is essential for cloud environments, where attack densities vary widely and rare but impactful threats are embedded within massive volumes of benign traffic.

From a future Internet perspective, effective intrusion diagnosis must also support deployment across distributed edge–fog–cloud architectures. Lightweight anomaly detection at the edge, coordinated analysis at the fog layer, and large-scale learning in the cloud form a synergistic security pipeline. The proposed framework aligns with this vision by enabling scalable, unsupervised intrusion diagnosis suitable for next-generation, cloud-centric and hybrid computing ecosystems.

Rather than proposing a new standalone learning algorithm, this work focuses on designing and validating a practically deployable unsupervised intrusion diagnosis architecture tailored for large-scale, imbalanced, and dynamically evolving cloud environments. This applied perspective aligns with the operational requirements of future Internet security systems, where robustness, scalability, and diagnostic capability are as critical as algorithmic novelty.

The proposed framework is designed as a cohesive, end-to-end unsupervised intrusion diagnosis pipeline, ensuring seamless interaction between representation learning, anomaly detection, and density-based attack categorization.

Recent research has explored machine learning and deep learning–based IDS approaches, demonstrating improved detection capabilities. However, challenges related to scalability, computational cost, imbalanced data, and the detection of novel or low-frequency attacks continue to hinder practical deployment. Motivated by these gaps, this study proposes a dual-function intrusion diagnosis framework that integrates autoencoder-based dimensionality reduction and anomaly detection with DBSCAN-based multiclass attack classification. By leveraging autoencoder latent representations and density-based clustering, the proposed method effectively addresses high-dimensional traffic, imbalanced datasets, and varying attack densities within cloud environments. Evaluations on the CSE-CICIDS2018 dataset demonstrate the framework’s ability to deliver both anomaly detection and fine-grained attack categorization, providing valuable insights for real-time threat mitigation. Figure 1 provides an overview of the security challenges encountered across edge–fog–cloud computing environments in future Internet architectures, motivating the need for scalable intrusion diagnosis. To address these limitations, the present study proposes a dual-function intrusion detection framework that:

Utilizes an autoencoder to simultaneously perform dimensionality reduction and anomaly detection through reconstruction error analysis.
Applies DBSCAN clustering on anomalous latent representations to enable robust multiclass attack categorization.
Effectively manages varying attack densities and imbalanced traffic while reducing computational complexity.
Demonstrates strong scalability and real-world relevance using the CSE-CICIDS2018 dataset, reflecting practical cloud traffic conditions.

2. Related Work

Intrusion detection in cloud environments has been extensively studied, yet the evolving complexity and scale of cloud traffic continue to challenge existing approaches. Prior research in this domain generally falls into three major categories: deep learning–based intrusion detection systems (IDS), hybrid and ensemble learning pipelines, and clustering or graph-based analytical methods. A structured synthesis of these categories is presented below to contextualize the motivation for the proposed work.

2.1. Deep Learning–Based IDS Approaches

Deep learning models have gained prominence due to their ability to capture complex, non-linear patterns in large-scale network traffic. Aljuaid and Alshamrani [19] employed a CNN architecture on the CSE-CICIDS2018 dataset, reporting 98.67% accuracy. Rosline et al. [20] developed a lightweight PSO-optimized CNN, achieving 91.7% accuracy. Sajid et al. [21] introduced a hybrid CNN–XGBoost–LSTM pipeline with strong multiclass classification performance. Sahi et al. [22] introduced CS_DDoS, a classification-based system leveraging LS-SVM, naïve Bayes, K-nearest, and multilayer perceptron to detect TCP flood attacks in cloud environments, achieving up to 97% accuracy but facing challenges with spoofed IPs and threshold evasion. Despite these advances, deep learning models often demand high computational resources, exhibit limited scalability, and tend to perform poorly under imbalanced traffic distributions—conditions commonly observed in real-world cloud environments.

2.2. Hybrid and Ensemble Learning Approaches

Several studies have explored hybrid or multi-stage IDS designs to improve detection performance. Megouache et al. [23] integrated k-means clustering with Extreme Learning Machines (ELM), attaining high accuracy but facing memory and scalability constraints for cloud-scale traffic. Vamsikrishna et al. [24] proposed a hierarchical neural network–based IDS that improved detection rates but introduced additional computational burden through heavy feature extraction. Li et al. [25] presented a semi-supervised pipeline combining autoencoders, DBSCAN, multilayer perceptrons, and random forests, showing competitive performance on NSL-KDD and UNSW-NB15. Alharbi et al. [26] proposed RFCL, a clustering-based robust federated aggregation method using cosine similarity and personalization. RFCL effectively mitigates data/model poisoning attacks under heterogeneous Non-IID settings, outperforming Median, Krum, AFA, FedMGDA+, and CC across MNIST, CIFAR-10, and Fashion-MNIST. Rameshkumar et al. [27] proposed a Progressive Transfer Learning-based Deep Q Network for DDoS defence in wireless multimedia sensor networks, integrating adaptive particle swarm optimization for feature selection. Their approach improved detection accuracy, reduced attacker detection delay, and enhanced throughput in smart agriculture scenarios. However, dependency on specific datasets, scalability limits, and limited robustness against adversarial variations remain unresolved challenges.

2.3. Clustering and Graph-Based Approaches

To better model structural dependencies in network flows, researchers have also explored clustering and graph-based techniques. Dugyala et al. [28] combined GNNs with Leader K-means clustering, improving detection accuracy but encountering computational bottlenecks. Villegas-Ch et al. [29] proposed a dynamic GNN-driven framework for IoT intrusion detection, achieving strong results on DoS, Spoofing, and MiTM attacks while highlighting limitations in scalability and heterogeneous traffic adaptation. Guo et al. [30] introduce a clustering and centroid-based universal defence against backdoor attacks, leveraging misclassification behavior of poisoned clusters. Experiments across diverse datasets, triggers, and architectures demonstrate attack-agnostic robustness, consistently outperforming state-of-the-art defences under varied poisoning conditions. Artioli et al. [31] conducted a comprehensive study of 15 clustering algorithms for UEBA across three datasets, highlighting scalability and reliability. Density-based methods like HDBSCAN and DenMune achieved strong performance, producing cohesive clusters closely aligned with user behaviors. While density-based clustering methods show promise in detecting irregular attack patterns, existing work often lacks integrated mechanisms for feature compression and anomaly detection—important components for managing high-dimensional cloud traffic.

2.4. Research Gap

A comprehensive review of existing literature reveals several limitations that hinder the effectiveness of cloud-based intrusion detection systems. Most current IDS approaches do not employ dual-purpose architectures capable of jointly performing dimensionality reduction and anomaly detection within a single integrated framework. Clustering-based methods struggle with multi-density attack distributions and often lack deep feature representations, making them less effective in distinguishing subtle or emerging attack patterns. Furthermore, challenges such as high data dimensionality, computational overhead, and significant traffic imbalance continue to affect their scalability and practical deployment in cloud environments. Prior studies largely focus on binary or coarse-grained detection, offering limited support for fine-grained, multiclass attack categorization—an essential requirement for actionable and context-aware cyber threat response.

2.5. Unsupervised Representation Learning in Adjacent Domains

Beyond intrusion detection, unsupervised representation learning using autoencoder-based architectures has been successfully explored in several adjacent domains dealing with high-dimensional and information-rich data. In particular, recent studies on unsupervised hyperspectral image super-resolution employ reconstruction-driven learning and latent-space interaction mechanisms to recover fine-grained spatial and spectral structures without relying on labeled training data [32,33].

These approaches demonstrate the effectiveness of unsupervised latent representations and reconstruction-based optimization in extracting meaningful patterns from complex, high-dimensional inputs. Although such methods are primarily designed for image restoration tasks rather than network security, they conceptually reinforce the viability of autoencoder-driven unsupervised learning for capturing intrinsic data structures. This shared foundation supports the design choices adopted in the proposed cloud-centric intrusion diagnosis framework, where latent-space modeling and reconstruction error are leveraged for anomaly detection and attack characterization under limited labeling conditions.

3. Materials and Methods

The design of the proposed framework is guided by key operational challenges in cloud intrusion detection. Autoencoder-based dimensionality reduction addresses the high dimensionality and heterogeneity of cloud traffic, reconstruction-error–based anomaly detection mitigates the scarcity of labeled attack data, and density-based clustering enables robust diagnosis of multi-density attack behaviors under extreme class imbalance. This challenge-driven design ensures applicability beyond any single dataset. The overall research methodology (Figure 2) illustrates a systematic pipeline for detecting and classifying cyberattacks in cloud environments. The process begins with dataset integration and pre-processing, followed by dimensionality reduction using an autoencoder. Anomalies are then detected through reconstruction error analysis, and DBSCAN clustering is applied to categorize anomalous samples into distinct attack types. The complete operational flow of the proposed system is summarized in Algorithm 1. All learning and clustering stages are performed without using attack labels, preserving the unsupervised nature of the framework.

Algorithm 1. Proposed Autoencoder–DBSCAN Based Multiclass Attack Detection Pipeline

Input: Dataset X, parameters ε, MinPts, latent dimension k
Output: Multiclass attack labels
1: X_pre ← preprocess(X)
2: Train AE on benign data: minimize L = ||x − AE(x)||²
3: For each sample x ∈ X_pre: compute RE(x) = ||x − AE(x)||²
4: Determine threshold T = μ_RE + k * σ_RE
5: Identify anomalies A = {x | RE(x) > T}
6: Extract latent vectors Z_anom = f_θ(A)
7: Apply DBSCAN on Z_anom using ε, MinPts
8: Assign cluster labels C_k
9: Map cluster labels to attack classes
10: Return multiclass predictions

3.1. Dataset Description

The CSE-CIC-IDS2018 dataset is a large-scale benchmark generated in an AWS cloud environment, containing realistic benign and attack traffic captured over multiple days [34,35]. It includes diverse modern threats such as DoS, DDoS, brute-force, botnet, infiltration, and web attacks, making it suitable for evaluating intrusion detection techniques in cloud settings. Each network flow is represented by high-dimensional feature vectors (80 attributes) covering packet statistics, temporal behavior, and content-based metrics.

A key challenge of this dataset is its highly imbalanced distribution, where benign flows constitute the majority of traffic while several attack types appear infrequently. This imbalance directly affects anomaly detection and cluster formation, making it an appropriate testbed for evaluating the robustness of the proposed Autoencoder–DBSCAN pipeline. The dataset also contains heterogeneous feature scales, missing or constant-value attributes, and noise, necessitating normalization and careful pre-processing before model training.

Given its scale, diversity, and realistic traffic patterns, the CSE-CIC-IDS2018 dataset provides a strong foundation for studying dimensionality reduction, anomaly detection, and unsupervised attack categorization in cloud environments. It enables rigorous assessment of reconstruction-error thresholding and density-based clustering performance within the proposed framework.

Although this study evaluates the proposed framework using a single benchmark dataset, CSE-CIC-IDS2018 provides one of the most comprehensive and realistic representations of modern cloud traffic currently available. It captures large-scale, multi-day traffic generated in a real cloud infrastructure, incorporates diverse contemporary attack types, and exhibits extreme class imbalance—characteristics that are critical for evaluating unsupervised intrusion detection systems. These properties make it a rigorous and representative testbed for assessing scalability, robustness, and multiclass intrusion diagnosis in cloud-centric environments.

3.2. Preprocessing

Pre-processing is crucial for structuring, standardizing, and optimizing the dataset for machine learning tasks. The CSE-CIC-IDS2018 dataset consists of 10 distinct CSV files, each representing different traffic scenarios. To simplify the analysis, the initial step involves combining all 10 CSV files into a single unified dataset, ensuring data integrity and removing duplicates. Subsequently, the combined dataset is encoded to convert categorical features into numerical values, making them compatible with machine learning models. One-hot encoding is applied to categorical variables, such as protocol types or service names, generating binary vectors that avoid introducing unintended ordinal relationships. Following encoding, standardization is performed to normalize all numerical features. This is achieved using the Z-score scaling technique, defined as:

X_{n o r m} = \frac{X - μ}{σ}

(1)

where X is the original feature value, μ is the feature mean, and σ is the standard deviation. This transformation ensures every feature has a mean of zero and a standard deviation of one, allowing equal contribution during model training and preventing features with larger magnitudes from dominating the learning process. These pre-processing steps convert the raw dataset into a clean, structured, and fully normalized input suitable for the autoencoder-based dimensionality reduction and subsequent anomaly detection stages.

3.3. Dimensionality Reduction Using Autoencoder

Dimensionality reduction plays a key role in simplifying high-dimensional data while maintaining essential patterns [36]. In this study, a basic autoencoder is employed as a dual-purpose neural network. It consists of three primary layers: an input layer corresponding to the original dataset’s dimensions, two hidden layers (encoder and decoder), and a bottleneck (latent space) layer that contains the compressed representation of the data. During training, the autoencoder learns to compress the input data into the latent space via the encoder, while reconstructing it using the decoder. Formally, the encoding and decoding processes can be expressed as:

z = f_{e n c} (x), \hat{x} = f_{d e c} (z)

(2)

where x is the original input, z is the latent-space representation, and

\hat{x}

is the reconstruction produced by the decoder. The latent space represents the compressed features, capturing the key patterns and structures in the data. With the CSE-CIC-IDS2018 dataset, the autoencoder effectively compresses the original high-dimensional data into a lower-dimensional latent space. This compression reduces redundancy, removes noise, and accelerates subsequent computations. Dimensionality reduction through autoencoders is particularly beneficial in this study, as the latent space representation preserves the key characteristics of the data. The encoded data from the bottleneck layer provides an efficient, compact representation for downstream tasks such as anomaly detection and clustering, thereby reducing computational complexity, enhancing scalability, and maintaining interpretability.

3.4. Anomaly Detection Using Autoencoder

Anomaly detection focuses on identifying data points that deviate from expected patterns [37]. In this study, the dual-function autoencoder is essential for detecting these anomalies. Trained exclusively on normal samples, the autoencoder reduces reconstruction errors for normal data, ensuring precise reconstruction of its input. Training the autoencoder exclusively on benign traffic simplifies deployment in real-world environments, where labeled attack data is scarce or unavailable, and allows periodic retraining using routinely collected normal traffic without operational disruption. Upon processing an input, the trained autoencoder calculates the reconstruction error as the difference between the original input and its reconstructed output as shown in (3):

R e c o n s t r u c t i o n E r r o r = {∥ X_{o r i g i n a l} - X_{r e c o n s t r u c t e d} ∥}^{2}

(3)

Normal samples typically yield low reconstruction errors since they align closely with the learned patterns. In contrast, anomalous samples produce notably higher errors due to their deviation from these expected patterns. A predefined threshold is set to distinguish between normal and anomalous samples. This threshold is determined based on the mean and standard deviation of reconstruction errors from the training data, expressed as:

T = μ_{e r r} + k \cdot σ_{e r r}

(4)

where T is the anomaly threshold, μ_err is the mean reconstruction error, σ_err is the standard deviation, and k is a sensitivity factor. Once anomalies are detected using reconstruction errors, their indices are noted, and the corresponding data representations are retrieved from the autoencoder’s latent space. The extracted representations provide a reduced-dimensional view of the anomalous data points, which are subsequently utilized during the clustering phase. The anomaly detection capability of the autoencoder plays a crucial role in filtering significant data, which is then used for the DBSCAN clustering process. It excludes normal samples, optimizing the clustering process to identify patterns specifically within the anomalies.

3.5. Multiclass Attack Classification Using DBSCAN

DBSCAN is employed as the second-stage component for multiclass attack classification. As a density-based unsupervised clustering algorithm, DBSCAN groups data points based on local neighborhood density, making it especially effective for non-linear, irregularly shaped clusters in reduced-dimensional feature spaces [38,39]. In the proposed pipeline, DBSCAN is applied only to the latent-space representations of anomalous samples, denoted as Z_anom, extracted from the autoencoder’s bottleneck layer after anomaly detection.

DBSCAN requires two parameters:

Epsilon (ε): the maximum neighborhood radius,
min_samples: the minimum number of points required to form a dense region.

For any point p, its ε-neighborhood is defined as

N_{ε} (p) = {q | ∥ p - q ∥ \leq ε}

A point is considered a core point if

∣ N_{ε} (p) ∣ \geq \min_{samples}

(5)

Clusters are formed by connecting core points and their density-reachable neighbors, while points that do not satisfy (5) are labeled as noise. Operating in the reduced latent space significantly enhances cluster separability, as anomalous behaviors tend to manifest as compact and distinct patterns after autoencoder compression. Consequently, DBSCAN effectively identifies groups of related attacks—e.g., DoS, brute-force, reconnaissance, or web-based attacks—without requiring class labels during clustering. Noise points flagged by DBSCAN often represent rare, emerging, or previously unseen attack behaviors, providing valuable signals for threat intelligence.

Cluster-to-Attack Label Assignment

Once clusters C_j are generated for anomalous samples, each cluster is mapped to an attack class by majority voting using available ground-truth labels in the dataset. This label assignment is performed strictly as a post-processing step for evaluation and interpretability, and does not influence the clustering process itself. The predicted label for cluster C_j is

\hat{y} (C_{j}) = \begin{matrix} \arg \max \\ y \end{matrix} c o u n t (y | x \in C_{j})

(6)

To evaluate the quality of the cluster-to-label mapping, cluster purity is computed as

P u r i t y (C_{j}) = \frac{{m a x}_{y} c o u n t (y | C_{j})}{| C_{j} |}

(7)

The average purity across all clusters was 0.972, demonstrating highly reliable translation of DBSCAN clusters into meaningful attack categories.

This completes the proposed intrusion detection and attack classification pipeline. By integrating autoencoder-based dimensionality reduction, reconstruction-error–driven anomaly detection, and density-based clustering through DBSCAN, the framework provides a scalable and data-efficient mechanism for identifying and categorizing diverse cyberattacks in cloud environments. The combination of latent-space representations and density modeling enhances the separation of attack behaviors, enabling reliable multiclass categorization even under imbalanced and heterogeneous traffic conditions.

3.6. Experimental Design

A systematic experimental design was adopted to ensure reproducibility and reliable evaluation of the proposed intrusion detection framework. The design covers dataset handling, parameter selection, and evaluation protocols as detailed below.

3.6.1. Dataset Handling and Preparation

The CSE-CIC-IDS2018 dataset, comprising 10 heterogeneous CSV files, was consolidated into a unified dataset to maintain consistency across attack scenarios. Duplicate entries and corrupted records were removed to preserve data integrity. One-hot encoding was applied to categorical fields such as protocol and service types, followed by Z-score normalization to standardize all numerical attributes. The dataset was then split into a 70:30 training–testing ratio, ensuring identical traffic distributions across both sets. Normal samples were used exclusively for training the autoencoder, while both benign and attack samples were used for DBSCAN clustering and downstream evaluation.

3.6.2. Parameter Setting and Justification

The model parameters were selected based on preliminary experiments, literature recommendations, and dataset characteristics:

(a): Autoencoder architecture: The number of hidden nodes (29 and 21) was selected to study the impact of dimensionality compression. These two settings allow evaluating the trade-off between information retention and noise reduction in latent space.
(b): Reconstruction error threshold (T): The threshold was computed as in Equation (4). Multiple values were tested, and the selected k provided maximum detection stability.
(c): DBSCAN parameters (ε and MinPts): DBSCAN’s ε = 0.2 and MinPts = 600 were determined through iterative grid search and inspection of k-distance plots. These values yielded compact clusters while preventing over-fragmentation, which is crucial in high-volume IDS data. The DBSCAN parameter ε was selected using k-distance plots derived from the anomaly-filtered latent space, while MinPts was varied across multiple candidate values to balance cluster compactness and noise sensitivity. This density-driven tuning approach ensures that parameter selection is guided by the intrinsic structure of anomalous traffic rather than arbitrary heuristics.
(d): Cluster label mapping: Final class assignment was performed by aligning cluster IDs with the majority class labels of their respective instances.

Although the parameters ε = 0.2, MinPts = 600, and latent dimensionality of 21 were optimal for the CSE-CIC-IDS2018 dataset, the proposed parameter selection strategy is not dataset-specific. In new environments, the autoencoder latent dimension can be determined by progressively reducing dimensionality until reconstruction error stabilizes and clustering quality metrics (e.g., Silhouette Score, Davies–Bouldin Index) reach a plateau. Similarly, DBSCAN’s ε can be estimated using k-distance plots derived from anomaly-filtered latent representations, while MinPts may be set proportional to the expected minimum attack cluster size and traffic volume.

3.6.3. Experimental Evaluation Protocols

Experiments were executed using Python 3.0 on Google Colab with an NVIDIA T4 GPU to support large-scale computations. The performance of anomaly detection and multiclass attack classification was measured using: (i) Confusion-matrix-derived metrics (precision, recall, specificity, F1-score, accuracy), (ii) Clustering metrics (Silhouette Score, Davies–Bouldin Index) and (iii) Comparative evaluation against state-of-the-art methods. For fair comparison, all baseline models (SVM, LSTM, DNN, DBN, K-Means, FC-Means, Hierarchical, OPTICS) were trained on the same train–test split. No data augmentation or oversampling was applied to avoid introducing bias. This experimental design ensures a comprehensive and transparent evaluation of the proposed dual-function Autoencoder–DBSCAN IDS framework.

All experiments in this study were conducted in an offline evaluation setting using pre-collected traffic data; however, the proposed framework is designed to support low-latency inference and can be adapted for online deployment. It is important to note that ground-truth attack labels were not used during autoencoder training, anomaly detection, or DBSCAN clustering. Labels were employed only after clustering, solely for the purpose of cluster-to-attack mapping and quantitative performance evaluation. As such, the proposed framework operates in a fully unsupervised manner during model learning and attack discovery.

3.7. Novelty and Technical Contributions

While the individual components of the proposed framework—autoencoders and density-based clustering—have been explored in prior intrusion detection studies, the novelty of this work lies in their tight integration into a unified, fully unsupervised, cloud-centric intrusion diagnosis pipeline. Unlike existing approaches that apply clustering directly on raw or weakly compressed features, the proposed framework introduces an anomaly-filtered latent-space clustering strategy that significantly improves density consistency, cluster stability, and multiclass attack separability under highly imbalanced cloud traffic conditions.

Although autoencoders and DBSCAN have been individually explored in intrusion detection, the proposed work introduces a set of integrated technical enhancements that extend beyond existing AE–DBSCAN combinations. First, a dual-function autoencoder is designed to simultaneously perform dimensionality reduction and anomaly detection by training exclusively on benign traffic, thereby producing a reconstruction-error–driven anomaly filter. This enables the extraction of a density-consistent latent subset that contains only anomalous samples, significantly improving clustering stability. Second, instead of applying DBSCAN to the entire latent space as in prior work, the proposed method applies clustering only to the anomaly-filtered latent vectors, creating a homogeneous distribution that enhances density-based separation of minority and overlapping attack patterns. Third, the DBSCAN configuration is optimized using k-distance analysis derived directly from the anomaly-latent manifold, providing an adaptive ε and MinPts selection mechanism that stabilizes clustering across diverse attack types. Finally, the framework offers a complete multi-stage unsupervised pipeline—AE-based anomaly detection → latent anomaly extraction → density-based clustering → attack-type labeling—enabling fully unsupervised multiclass cloud intrusion diagnosis. These integrated optimizations result in improved accuracy, reduced computational load, and substantially better cluster compactness compared with baseline AE or DBSCAN methods. Because the proposed pipeline is fully unsupervised and does not rely on attack labels during training or clustering, it is inherently dataset-agnostic and can be transferred to new environments through retraining on benign traffic and latent-space parameter adaptation.

4. Results

The experimentation was conducted using Python 3.0 on Google Colab, leveraging a T4 GPU for computational efficiency. The evaluation metrics employed include precision, recall, specificity, F-measure, and accuracy, derived from the confusion matrix as well as the Silhouette score [40] and Davies–Bouldin index (DBI) [41]. The autoencoder’s hidden nodes, representing the compressed dimensions, were utilized as reduced dimensions for DBSCAN-based multiclass classification. Specifically, 29 hidden nodes (HN-29) in Hidden Layer 1 and 21 hidden nodes (HN-21) in the latent space were used for dimensionality reduction. The performance of the proposed approach was evaluated on these 21 and 29 dimensions to assess its effectiveness in classifying the data. Table 1 presents the samples utilized in this research, while Table 2 outlines the hyperparameters used in the experimental setup.

4.1. Performance of Autoencoder-Based Anomaly Detection

The confusion matrix in Figure 3 indicates strong performance of the autoencoder for anomaly detection in the reduced space with 21 hidden nodes. A high True Negative (TN) count of 4,021,191 shows effective identification of normal data, while a True Positive (TP) count of 822,217 highlights accurate anomaly detection. The False Negative (FN) count of 2253 and False Positive (FP) count of 24,221 are relatively low, demonstrating robustness in minimizing misclassification. The confusion matrix in Figure 4 reflects the autoencoder’s anomaly detection performance with 29 hidden nodes. A True Negative (TN) count of 3,945,105 indicates reliable detection of normal data, while a True Positive (TP) count of 769,043 demonstrates effective anomaly identification. However, the higher False Negative (FN) count of 55,427 and False Positive (FP) count of 100,307 suggest reduced precision and recall compared to the 21-node configuration, indicating potential trade-offs in model performance.

4.2. Performance of DBSCAN-Based Multiclass Classification

The multi-class confusion matrix in Figure 5 demonstrates that DBSCAN performs well in detecting and classifying both majority and minority classes on the reduced space (21 hidden nodes). The model correctly classifies a significant number of benign instances (3,995,412 out of 4,045,412) and handles major attack types like DDoS and DoS effectively, with minimal misclassifications. Minority attack classes such as BruteForce, Bot, Infiltration, and Web attacks are also detected with good precision, showing the model’s ability to manage diverse traffic types. This highlights DBSCAN’s overall effectiveness in handling multi-class scenarios. The multi-class confusion matrix in Figure 6 reveals DBSCAN’s performance on reduced space with 29 hidden nodes. The model effectively classifies most benign instances, correctly identifying 3,945,412 out of 4,045,412. It also performs well for major attack types like DDoS (367,180/379,180) and DoS (189,290/196,290), though slightly more misclassifications occur compared to the 21-node configuration. Minority classes like BruteForce, Bot, Infiltration, and Web are reasonably detected, with minimal errors, but there is room for improvement in capturing finer distinctions among these attack types. Overall, the model demonstrates reliable performance across all classes.

4.3. Statistical Analysis of Confusion Matrices

A comprehensive statistical evaluation was performed for both the autoencoder-based anomaly detection and the DBSCAN-based multiclass attack classification to assess robustness and interpretability. The 21-node autoencoder configuration demonstrated superior statistical performance, exhibiting very high recall (0.9973) and strong specificity (0.9940), with a minimal false-negative rate (0.0027), which is critical for avoiding missed attacks. Comparative analysis showed that the 29-node model produced larger false negatives and higher false positives, further supporting the selection of the 21-node configuration. For DBSCAN, per-class analysis indicated that major attack categories such as DoS and DDoS achieved precision levels above 96%, while minority classes (e.g., Infiltration and Botnet) displayed slightly elevated false-negative rates due to sparse density in the latent space. Error hotspot inspection revealed predictable overlaps—such as DoS versus DDoS due to burst-pattern similarity, and Bot versus Web attacks due to shared low-rate behavior—yet these misclassifications were significantly reduced under the 21-node latent representation. Clustering quality metrics reinforced these observations: the HN-21 model achieved a higher silhouette score (0.9857) and a lower Davies–Bouldin Index (0.0091), indicating tighter cluster compactness and greater separability. Finally, a five-run repeated evaluation showed extremely low variance in accuracy for both the autoencoder (±0.0008) and DBSCAN (±0.0013), confirming strong statistical stability and repeatability of the proposed approach.

4.4. Ablation Study—Component-Wise Contribution

To quantify the contribution of each component of the proposed pipeline, we conducted an ablation study comparing (a) Autoencoder-only (AE) anomaly detection, (b) DBSCAN clustering on raw features, and (c) the full AE + DBSCAN pipeline. Table 3 summarizes detection/classification accuracy and cluster quality for the two latent-space configurations (HN-29 and HN-21). The AE-only results show anomaly detection accuracy of 0.9680 (HN-29) and 0.9946 (HN-21). Applying DBSCAN on AE-latent anomalies yields multiclass classification accuracy of 0.9686 (HN-29) and 0.9879 (HN-21).

The AE provides the largest single improvement in anomaly separability (HN-21 > HN-29), DBSCAN applied on AE latent vectors further converts anomalies into fine-grained attack classes with a small drop from AE anomaly accuracy, and the HN-21 configuration is consistently better across both stages, confirming that stronger compression (to a compact, denoised latent space) improves downstream clustering.

4.5. Sensitivity Analysis of Key Hyperparameters

To assess the robustness of the proposed framework with respect to hyperparameter selection, a comprehensive sensitivity analysis was conducted on both the autoencoder and DBSCAN components. Specifically, we evaluated the impact of latent-space dimensionality and the DBSCAN parameters ε and MinPts across a wide range of values, enabling systematic identification of stable operating regions and optimal configurations.

A comprehensive sensitivity analysis was conducted on the proposed framework: (i) the latent dimensionality k of the autoencoder, and (ii) the DBSCAN parameters ε and MinPts. For the autoencoder, configurations HN-29 and HN-21 were evaluated, and their anomaly detection performance is presented through confusion matrices (Figure 3 and Figure 4). The corresponding trends in overall accuracy, precision, recall, specificity, and F1-score are summarized in Table 3 (Autoencoder-only).

Furthermore, the per-class multiclass classification performance for both HN-29 and HN-21 is illustrated in Figure 5 and Figure 6, demonstrating the impact of latent space size on cluster separability. To analyze the robustness of the DBSCAN stage, ε was varied within the range [0.05, 0.50] with a step size of 0.05, while MinPts was tested over {50, 100, 200, 400, 600, 800}. The resulting clustering quality, measured using the Silhouette Score and Davies–Bouldin Index (DBI), is reported in Figure 7 and Figure 8. These results collectively highlight the sensitivity of the proposed framework to hyperparameter variations and support the selection of optimal settings for stable multiclass attack classification.

The Silhouette Score for DBSCAN anomaly detection in Figure 7 improves when using 21 hidden nodes (HN-21), increasing from 0.9654 (HN-29) to 0.9857. This indicates that reducing the number of hidden nodes enhances clustering quality, leading to more distinct and well-separated clusters. Similarly, the DBI in Figure 8 shows improvement with 21 hidden nodes, decreasing from 0.01254 (HN-29) to 0.0091. A lower DBI signifies better clustering performance, suggesting that reducing the number of hidden nodes strengthens cluster separation and overall model effectiveness.

These sensitivity trends provide practical guidance for selecting stable parameter ranges when deploying the framework in new cloud environments with different traffic densities or attack distributions.

4.6. Comparative Evaluation with Prior Methods

The proposed Autoencoder (HN-21), shown in Figure 9, achieves an anomaly-detection accuracy of 0.9946, outperforming all supervised baselines, including SVM (0.8824), LSTM (0.8954), DNN (0.9354), and DBN (0.9254). Similarly, the combined AE + DBSCAN model, illustrated in Figure 10, achieves a multiclass attack-classification accuracy of 0.9879, surpassing unsupervised approaches such as K-Means (0.8795), FCM (0.9125), Hierarchical Clustering (0.9025), and OPTICS (0.9265).

To ensure fairness, all baseline models were trained and evaluated using the same pre-processing pipeline and dataset split. Table 4 summarizes the accuracy and 95% confidence intervals of all comparisons. Statistical validation confirms that the proposed approach provides a significant improvement: the AE (HN-21) exceeds the next-best supervised baseline (DNN, 0.9354), and the AE + DBSCAN exceeds the next-best clustering method (OPTICS, 0.9265), with paired bootstrap tests showing p < 0.001 in both cases. These results demonstrate that the proposed dual-stage framework consistently outperforms prior supervised and unsupervised models in both anomaly detection and attack classification tasks.

Notably, while some baseline models exhibit reasonable detection accuracy, their higher computational complexity and sensitivity to data imbalance limit their suitability for large-scale cloud deployment compared to the proposed latent-space, anomaly-filtered approach.

The outcomes emphasize the effectiveness of the proposed dual-function Autoencoder and DBSCAN-based approach in detecting anomalies and classifying attacks. The configuration with 21 hidden nodes (HN-21) outperforms the 29 hidden nodes (HN-29) configuration, achieving higher precision, recall, specificity, and accuracy. The Autoencoder effectively reduced dimensionality while maintaining critical data patterns, enabling robust anomaly detection with a high accuracy of 99.46%. DBSCAN further leveraged these reduced dimensions for precise attack classification, achieving an accuracy of 98.79%. Improved clustering metrics, such as a higher silhouette score (0.9857) and a lower Davies-Bouldin Index (0.0091), indicate better cluster quality with 21 hidden nodes. These findings validate the model’s ability to handle diverse traffic scenarios efficiently.

4.7. Statistical Validation

To ensure that the reported performance improvements are statistically meaningful and not due to random variation in the test data, we conducted a comprehensive statistical validation consisting of bootstrap confidence intervals, McNemar’s test, and paired bootstrap significance testing. All performance metrics (Accuracy, Macro-F1, per-class F1) were evaluated over 1000 bootstrap resamples of the test set. For each resample, we recomputed the evaluation metric and derived 95% confidence intervals (CIs) using the 2.5th and 97.5th percentiles of the resulting distribution.

To compare the proposed AE + DBSCAN pipeline with conventional supervised baselines, we employed McNemar’s test for paired binary anomaly-detection outcomes. This test evaluates whether two classifiers differ significantly in their error patterns using a 2 × 2 contingency table of sample-wise correct/incorrect predictions. Statistical significance was assessed at α = 0.05.

For multiclass attack classification, where McNemar’s test is not directly applicable, we used a paired bootstrap significance test. Here, per-sample correctness (0/1) was compared across 1000 bootstrap replicates to estimate the sampling distribution of accuracy and macro-F1 differences between models. The resulting p-value quantifies whether the proposed method outperforms the baseline beyond chance.

Table 4 summarizes the statistical validation results for the two evaluated subsets (HN-21 and HN-29). The proposed AE + DBSCAN approach achieves higher accuracy and macro-F1 than the strongest baseline (DNN) on both subsets, with non-overlapping confidence intervals. McNemar’s test confirms that the anomaly-detection decisions of AE + DBSCAN differ significantly from the baseline (p < 0.001), while the paired bootstrap tests also show significant improvements in multiclass classification (p < 0.05). These results collectively demonstrate the robustness and statistical reliability of the proposed dual-stage intrusion detection framework.

These results highlight that the proposed framework offers a favorable balance between detection performance and computational efficiency, making it more suitable for scalable cloud intrusion detection than either computationally intensive supervised models or lightweight but less accurate clustering approaches.

4.8. Computational Complexity and Scalability Analysis

The proposed AE + DBSCAN framework is designed to minimize computational overhead by performing both learning and clustering within a compressed latent representation. The total computational cost is governed by three components: Autoencoder (AE) training and inference, reconstruction-error–based anomaly thresholding, and DBSCAN clustering on the filtered anomalous samples.

4.8.1. Autoencoder Complexity

The AE operates on an input dimension of 80 and compresses it into a 21-dimensional latent space. With approximately 13.48 M benign samples, the training complexity scales linearly with both dataset size and feature dimension, making it significantly more efficient than deep architectures such as DNNs and LSTMs that operate on the full feature space. Inference requires only a single forward pass, enabling real-time anomaly scoring for high-volume cloud traffic streams.

4.8.2. Reconstruction-Error Thresholding

Threshold selection involves a one-time computation of the mean and standard deviation of reconstruction errors. This operation is computationally negligible compared to AE training and does not impact real-time performance.

4.8.3. DBSCAN on Latent Space

Although traditional DBSCAN can exhibit quadratic behavior in the number of samples, the proposed pipeline applies clustering only to the AE-flagged anomalous samples, typically <6% of total traffic. Operating in a 21-dimensional latent space instead of the original 80-dimensional space reduces distance computations by nearly 4×, resulting in substantial runtime gains and reduced noise sensitivity.

4.8.4. Memory and Runtime Efficiency

Latent-space clustering achieves an approximate 3.8× reduction in memory usage, since both distance matrices and intermediate structures are maintained in a compressed representation. This reduction directly enhances responsiveness and supports large-scale deployment.

From a runtime perspective, the proposed framework supports near real-time intrusion detection in cloud environments. Autoencoder inference involves a single forward pass with fixed complexity, enabling low-latency anomaly scoring for streaming network flows. Reconstruction-error thresholding introduces negligible overhead, while DBSCAN clustering is applied only to a small subset of anomaly-filtered samples, allowing clustering to be performed asynchronously or in batch mode without blocking real-time traffic monitoring.

4.8.5. Scalability

The framework is inherently compatible with distributed and GPU-accelerated environments. AE inference can be parallelized across cloud nodes, while DBSCAN can process anomaly batches asynchronously through containerized microservices or cloud-native platforms. This ensures that the pipeline remains scalable, reliable, and efficient for large cloud-computing environments with dynamic workloads.

4.8.6. Real-Time Deployment Considerations

In a real-time deployment scenario, the proposed framework can be integrated into operational intrusion detection systems using a pipeline-based architecture. Incoming network flows are first processed by the trained autoencoder to compute reconstruction errors in real time. Flows exceeding the anomaly threshold are immediately flagged and forwarded to the clustering module, while benign flows are discarded from further analysis. The clustering stage can operate asynchronously on buffered anomalous samples, enabling periodic attack diagnosis without introducing latency into the primary detection path. Such a design allows continuous monitoring while supporting fine-grained attack analysis.

4.8.7. Component-to-Layer Mapping

In a distributed edge–fog–cloud deployment, the proposed framework naturally decomposes into lightweight detection and centralized diagnosis components. The autoencoder, trained exclusively on benign traffic, can be deployed at the edge or fog layer to perform real-time inference and compute reconstruction errors for incoming flows. The thresholding mechanism involves a simple statistical comparison and introduces negligible computational overhead, enabling immediate anomaly filtering. Anomalous latent representations can then be forwarded to the cloud layer, where DBSCAN clustering is applied asynchronously to perform fine-grained attack diagnosis across aggregated traffic. Figure 11 illustrates the proposed edge–fog–cloud deployment architecture, highlighting the separation between low-latency anomaly detection at the edge and centralized attack diagnosis in the cloud.

Although evaluated offline, the proposed framework is compatible with online intrusion detection workflows. Autoencoder-based anomaly scoring can be performed in real time on streaming network flows due to its fixed and low inference cost. Reconstruction-error thresholding introduces negligible latency, while the DBSCAN-based diagnosis stage can operate asynchronously on buffered anomalous traffic, avoiding interference with the primary detection pipeline.

5. Discussion

This study investigated an unsupervised, cloud-centric intrusion diagnosis framework by integrating autoencoder-based representation learning with density-based attack categorization. The experimental results presented in Section 4 provide several important insights into the effectiveness, robustness, and practical relevance of the proposed approach for future Internet cloud environments.

The anomaly detection results demonstrate that latent-space dimensionality plays a critical role in separating benign and malicious traffic. The autoencoder configuration with 21 hidden nodes consistently outperformed the 29-node configuration, achieving an anomaly detection accuracy of 99.46% with a very low false-negative rate (0.0027). This indicates that stronger compression into a compact latent space enhances noise suppression and highlights discriminative traffic patterns. In contrast, the higher false-positive and false-negative rates observed with the 29-node configuration suggest that excessive latent capacity may retain redundant or noisy information, reducing anomaly separability. These findings reinforce the importance of carefully balancing compression strength and information preservation in unsupervised intrusion detection.

The DBSCAN-based multiclass classification results further validate the benefits of operating on anomaly-filtered latent representations. As shown in Figure 5 and Figure 6, the framework achieved reliable classification performance across both majority and minority attack classes, with an overall accuracy of 98.79% in the optimal HN-21 configuration. Major attack categories such as DoS and DDoS achieved precision levels above 96%, while minority classes—including Botnet, Infiltration, and Web attacks—were detected with acceptable accuracy despite severe data imbalance. The observed misclassifications, particularly between DoS and DDoS or Bot and Web attacks, can be attributed to shared traffic burst characteristics and low-rate behavioral similarities, which are well-known challenges in cloud traffic analysis.

Clustering quality metrics provide further evidence of the framework’s effectiveness. The high Silhouette Score (0.9857) and low Davies–Bouldin Index (0.0091) achieved under the HN-21 configuration indicate tight cluster compactness and strong inter-cluster separability. These results confirm that anomaly-focused clustering in a reduced latent space substantially improves density-based learning stability compared with clustering on raw or weakly compressed features. Sensitivity analysis of DBSCAN parameters also shows that the framework remains stable across a wide range of ε and MinPts values, supporting its robustness for real-world deployment.

While CSE-CIC-IDS2018 serves as a realistic evaluation benchmark, the addressed challenges—traffic heterogeneity, imbalance, evolving attack patterns, and scalability—are inherent to real-world cloud environments, supporting the broader applicability of the proposed framework.

The ablation study highlights the complementary roles of the two framework components. While the autoencoder alone provides strong anomaly detection capability, integrating DBSCAN enables fine-grained attack diagnosis with only a marginal reduction in accuracy. This confirms that the proposed dual-stage design successfully balances detection precision and diagnostic depth, which is essential for actionable cloud security monitoring. Comparative evaluation further demonstrates that the proposed framework significantly outperforms both supervised and unsupervised baselines, with statistical validation confirming that these improvements are not due to random variation.

A key consideration in evaluating intrusion detection systems for cloud environments is the trade-off between detection performance and computational efficiency. Supervised deep learning models such as DNNs and LSTMs achieve competitive accuracy but incur substantial training and inference costs due to high-dimensional feature processing and model complexity. Traditional clustering methods, including K-Means and hierarchical clustering, exhibit lower computational overhead but suffer from degraded performance and poor scalability in high-dimensional and imbalanced traffic scenarios. In contrast, the proposed Autoencoder–DBSCAN framework balances these trade-offs by compressing traffic into a compact latent space and applying clustering only to anomaly-filtered samples, achieving high detection accuracy while significantly reducing runtime and memory requirements.

From a systems perspective, the computational complexity and scalability analysis indicate that the framework is well-suited for large-scale cloud environments. By restricting DBSCAN to less than 6% anomalous traffic and operating in a 21-dimensional latent space, the framework achieves substantial reductions in runtime and memory usage. The lightweight inference characteristics of the autoencoder make it suitable for deployment at the edge or fog layers, where rapid anomaly detection is required. More computationally intensive clustering and diagnostic analysis can be centralized at the cloud layer, enabling a hierarchical edge–fog–cloud intrusion detection architecture. This separation of detection and diagnosis supports scalability, reduces detection latency, and aligns with future Internet security requirements.

The modular design of the proposed framework supports dynamic cloud environments, where traffic characteristics and workload distributions evolve over time. Edge-level anomaly detection enables rapid response, while centralized clustering at the cloud layer provides global visibility and coordinated attack diagnosis, aligning with the operational requirements of future Internet security architectures.

From an applied research perspective, the value of the proposed framework lies not solely in algorithmic innovation but in demonstrating how carefully coordinated representation learning and density-based clustering can be transformed into a reliable, scalable, and empirically validated intrusion diagnosis solution for cloud environments. Such system-level refinements are essential for bridging the gap between theoretical IDS models and real-world deployment.

5.1. Limitations

Despite the strong empirical performance demonstrated in Section 3, the proposed unsupervised Autoencoder–DBSCAN framework exhibits several limitations that merit consideration. A primary limitation of this study is the reliance on a single large-scale benchmark dataset for empirical evaluation. While CSE-CIC-IDS2018 closely reflects realistic cloud traffic conditions, variations in network configurations, traffic composition, feature distributions, and attack taxonomies across different environments may influence model performance. Consequently, the reported findings should be interpreted as evidence of robustness within a representative cloud setting rather than universal performance guarantees. Cross-dataset validation using alternative benchmarks (e.g., UNSW-NB15, CIC-IDS2017) or real-world traffic traces is necessary to further assess generalizability.

Another limitation, the anomaly detection stage relies on reconstruction-error thresholding derived from the statistical properties of benign training data. While this approach proved effective for the CSE-CIC-IDS2018 dataset, dynamic cloud environments are prone to concept drift caused by workload variation, infrastructure scaling, and evolving user behavior. Such shifts may gradually alter the benign traffic distribution, requiring periodic threshold recalibration to maintain detection accuracy.

A further limitation, the density-based clustering stage remains sensitive to the selection of DBSCAN parameters (ε and MinPts). Although latent-space compression significantly stabilizes clustering behavior, suboptimal parameter choices may still lead to over-clustering or excessive noise labeling, particularly under highly heterogeneous or burst-driven attack scenarios. This sensitivity is more pronounced for extremely low-frequency attack classes, which may not always satisfy DBSCAN’s density requirements. Although sensitivity analysis demonstrates stable performance across a broad range of hyperparameter values, fully automated parameter adaptation is not yet integrated into the current framework. While the proposed tuning strategy generalizes across environments, absolute parameter values (e.g., ε or MinPts) may require recalibration when traffic volume, feature distributions, or attack prevalence differ significantly from the evaluation dataset.

An additional limitation, the framework operates in an offline learning mode and does not currently support continuous or online model adaptation. As a result, newly emerging attack patterns or previously unseen traffic behaviors cannot be incorporated without retraining. Additionally, while latent representations enhance detection performance, they offer limited interpretability, restricting analysts’ ability to understand the underlying causes of specific anomaly detections or cluster assignments. The absence of online or streaming evaluation constitutes an important limitation of the present work. Real-time cloud environments may introduce additional challenges such as concept drift, bursty traffic patterns, and strict latency constraints that are not fully captured in offline experiments. The current framework does not incorporate adaptive or online learning mechanisms, which limits its ability to automatically adjust to long-term traffic evolution and emerging attack patterns.

Finally, although the overall computational overhead is reduced through latent-space processing, autoencoder training remains the dominant cost and may require GPU acceleration for timely convergence in very large-scale deployments.

5.2. Future Work

Several research directions can be explored to further enhance the proposed framework and extend its applicability to real-world future Internet environments. An important extension of the proposed framework involves incorporating adaptive and online learning mechanisms. Incremental or streaming autoencoder architectures could enable continuous adaptation to evolving benign traffic patterns and concept drift in cloud environments. Similarly, dynamic reconstruction-error thresholding could adjust anomaly sensitivity over time, while incremental or adaptive density-based clustering may support evolving attack densities and newly emerging threat behaviors. Evaluating the framework under live or near-real-time cloud traffic conditions will further validate its operational applicability.

Future enhancements may also incorporate automated hyperparameter tuning strategies, such as adaptive ε estimation, self-tuning density-based clustering, or meta-heuristic optimization techniques. Additionally, adaptive parameter selection mechanisms that automatically adjust latent dimensionality and density thresholds based on streaming traffic statistics could further improve robustness and reduce manual intervention in real-world deployments.

Exploring cross-dataset validation and real-world deployment scenarios represents another important research direction. Future work will focus on cross-dataset and cross-domain evaluation using multiple IDS benchmark datasets and real-world traffic traces to systematically assess the transferability and robustness of the proposed framework across heterogeneous cloud environments.

Third, extending the framework toward an edge–fog–cloud security architecture is a natural progression. Lightweight anomaly detection could be deployed at the edge or fog layers for early threat filtering, while more computationally intensive clustering and analysis could be centralized in the cloud, enabling scalable and hierarchical intrusion diagnosis.

Finally, improving interpretability remains an important objective. Incorporating explainable AI techniques—such as latent feature attribution or prototype-based cluster explanation—could provide valuable insights into detected attack behaviors, increasing analyst trust and supporting informed response strategies.

6. Conclusions

This study presented a robust, dual-stage intrusion detection framework designed to address the challenges of high-dimensional, dynamic, and imbalanced cloud network traffic. By integrating a dual-function autoencoder for dimensionality reduction and anomaly detection with DBSCAN-based clustering for multiclass attack classification, the framework provides a scalable and efficient approach for cloud intrusion diagnosis. The autoencoder compresses network traffic into a compact latent space while preserving essential behavioral patterns, enabling precise anomaly detection through reconstruction-error thresholding. Subsequent application of DBSCAN on anomaly-filtered latent representations allows reliable grouping of attacks into distinct classes, including the identification of noise points corresponding to rare or emerging threats. Experimental evaluation on the CSE-CIC-IDS2018 dataset demonstrates strong performance, achieving an anomaly detection accuracy of 99.46% and multiclass attack classification accuracy of 98.79% under the optimal 21-dimensional latent configuration. High clustering quality, reflected by a Silhouette Score of 0.9857 and a Davies–Bouldin Index of 0.0091, confirms tight cluster compactness and clear separation of attack behaviors. The framework also exhibits favorable computational efficiency and scalability, as DBSCAN operates only on a small anomalous subset while dimensionality reduction significantly reduces runtime and memory requirements. Statistical validation further confirms that the proposed method consistently outperforms both supervised and unsupervised baseline approaches. Although evaluated on a single large-scale cloud dataset, the fully unsupervised and modular design of the proposed framework enables straightforward retraining and adaptation to new datasets and deployment environments without reliance on labeled attack data. By focusing on core cloud-intrinsic challenges—such as traffic heterogeneity, imbalance, and evolving attack behaviors—rather than dataset-specific characteristics, the framework provides a generalizable foundation for intrusion diagnosis in diverse cloud and future Internet environments. While the current evaluation is conducted in an offline setting, the low-latency inference, anomaly-filtered clustering strategy, and compatibility with distributed architectures make the proposed framework well suited as a foundation for future real-time and large-scale cloud intrusion detection deployments. The separation of lightweight detection and centralized diagnosis supports scalable operation while maintaining timely threat response. Overall, by explicitly balancing detection accuracy and computational efficiency, the proposed Autoencoder–DBSCAN framework addresses a critical requirement for practical intrusion detection in large-scale cloud environments. Future extensions incorporating adaptive learning, online model updates, and explainable analysis mechanisms can further enhance the framework’s resilience and applicability in continuously evolving cloud security ecosystems.

Author Contributions

Conceptualization, S.K.S., T.E., R.R., A.K., B.B. and K.P.; methodology, S.K.S., T.E., R.R., A.K. and K.P.; software, S.K.S., T.E., R.R., B.B. and S.Y.; validation, S.K.S., R.R., A.K., B.B., S.Y. and K.P.; formal analysis, S.K.S., T.E., A.K., B.B., S.Y. and K.P.; investigation, S.K.S., T.E., A.K., B.B., S.Y. and K.P.; resources, K.P., T.E., R.R., B.B. and K.P.; data curation, S.K.S., T.E., R.R., S.Y. and K.P.; writing—original draft preparation, S.K.S., T.E., R.R., A.K. and K.P.; writing—review and editing, S.K.S., T.E., B.B., S.Y. and K.P.; visualization, R.R., A.K., B.B., S.Y. and K.P.; supervision, T.E., A.K., B.B., S.Y. and K.P.; project administration, B.B., S.Y. and K.P.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used for this article is available online at https://www.unb.ca/cic/datasets/ids-2018.html, accessed on 14 December 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Attou, H.; Guezzaz, A.; Benkirane, S.; Azrour, M.; Farhaoui, Y. Cloud-Based Intrusion Detection Approach Using Machine Learning Techniques. Big Data Min. Anal. 2023, 6, 311–320. [Google Scholar] [CrossRef]
Qureshi, S.; He, J.; Qureshi, S.U.; Zhu, N.; Wajahat, A.; Nazir, A.; Ullah, F.; Wadud, A. Advanced AI-driven intrusion detection for securing cloud-based industrial IoT. Egypt. Inform. J. 2025, 30, 100644. [Google Scholar] [CrossRef]
Al-Ghuwairi, A.-R.; Sharrab, Y.; Al-Fraihat, D.; AlElaimat, M.; Alsarhan, A.; Algarni, A. Intrusion detection in cloud computing based on time series anomalies utilizing machine learning. J. Cloud Comput. 2023, 12, 127. [Google Scholar] [CrossRef]
Balajee, R.M.; Jayanthi Kannan, M.K. Intrusion Detection on AWS Cloud through Hybrid Deep Learning Algorithm. Electronics 2023, 12, 1423. [Google Scholar] [CrossRef]
Samunnisa, K.; Kumar, G.S.V.; Madhavi, K. Intrusion detection system in distributed cloud computing: Hybrid clustering and classification methods. Meas. Sens. 2023, 25, 100612. [Google Scholar] [CrossRef]
Uddin, M.A.; Aryal, S.; Bouadjenek, M.R.; Al-Hawawreh, M.; Talukder, M.A. A dual-tier adaptive one-class classification IDS for emerging cyberthreats. Comput. Commun. 2025, 229, 108006. [Google Scholar] [CrossRef]
Al-Safaar, D.W.; Al-Yaseen, W.L. A survey of network intrusion detection systems based on deep learning approaches. Sci. Tech. J. Inf. Technol. Mech. Opt. 2023, 23, 352–363. [Google Scholar] [CrossRef]
Aldhaheri, S.; Alhuzali, A. SGAN-IDS: Self-Attention-Based Generative Adversarial Network against Intrusion Detection Systems. Sensors 2023, 23, 7796. [Google Scholar] [CrossRef]
Wani, A.A. Comprehensive review of dimensionality reduction algorithms: Challenges, limitations, and innovative solutions. PeerJ Comput. Sci. 2025, 11, e3025. [Google Scholar] [CrossRef]
Khan, A.R.; Kashif, M.; Jhaveri, R.H.; Raut, R.; Saba, T.; Bahaj, S.A. Deep Learning for Intrusion Detection and Security of Internet of Things (IoT): Current Analysis, Challenges, and Possible Solutions. Secur. Commun. Netw. 2022, 2022, 4016073. [Google Scholar] [CrossRef]
Timofte, E.M.; Balan, A.L.; Iftime, T. AI Driven Adaptive Security Mesh: Cloud Container Protection for Dynamic Threat Landscapes. In Proceedings of the 2024 International Conference on Development and Application Systems (DAS), Suceava, Romania, 23–25 May 2024; pp. 71–77. [Google Scholar] [CrossRef]
Mahadik, S.S.; Pawar, P.M.; Muthalagu, R. Edge-Federated Learning-Based Intelligent Intrusion Detection System for Heterogeneous Internet of Things. IEEE Access 2024, 12, 81736–81757. [Google Scholar] [CrossRef]
Ponnumani, R.; Vasudeva, N.; Elumalai, T.; Kaliyaperumal, P.; Balusamy, B.; Benedetto, F. A multi-stage framework for scalable and context-aware intrusion detection in IoT-cloud systems using deep latent modeling and graph-based attack classification. Comput. Electr. Eng. 2026, 131, 110949. [Google Scholar] [CrossRef]
Rehman, T.; Tariq, N.; Khan, F.A.; Rehman, S.U. FFL-IDS: A Fog-Enabled Federated Learning-Based Intrusion Detection System to Counter Jamming and Spoofing Attacks for the Industrial Internet of Things. Sensors 2025, 25, 10. [Google Scholar] [CrossRef]
Hindy, H.; Atkinson, R.; Tachtatzis, C.; Colin, J.N.; Bayne, E.; Bellekens, X. Utilising deep learning techniques for effective zero-day attack detection. Electronics 2020, 9, 1684. [Google Scholar] [CrossRef]
Kaliyaperumal, P.; Latha, P.; Palanisamy, S.; Pushpanathan, S.; Nayyar, A.; Balusamy, B.; Alkhayyat, A. SiamIDS: A Novel Cloud-Centric Siamese Bi-LSTM Framework for Interpretable Intrusion Detection in Large-Scale IoT Networks. Comput. Stand Interfaces 2025, 97, 104119. [Google Scholar] [CrossRef]
Prabu, K.; Sudhakar, P.; Thirumalaisamy, M.; Balusamy, B.; Benedetto, F. A Novel Hybrid Unsupervised Learning Approach for Enhanced Cybersecurity in the IoT. Future Internet 2024, 16, 253. [Google Scholar] [CrossRef]
Talukder, M.A.; Khalid, M.; Sultana, N. A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction. Sci. Rep. 2025, 15, 4617. [Google Scholar] [CrossRef]
Aljuaid, W.H.; Alshamrani, S.S. A Deep Learning Approach for Intrusion Detection Systems in Cloud Computing Environments. Appl. Sci. 2024, 14, 5381. [Google Scholar] [CrossRef]
Rosline, G.J.; Rani, P. Intrusion detection system for cloud environment based on convolutional neural networks and PSO algorithm. Indones. J. Electr. Eng. Comput. Sci. 2024, 35, 1499–1506. [Google Scholar] [CrossRef]
Sajid, M.; Malik, K.R.; Almogren, A.; Malik, T.S.; Khan, A.H.; Tanveer, J.; Rehman, A.U. Enhancing intrusion detection: A hybrid machine and deep learning approach. J. Cloud Comput. 2024, 13, 123. [Google Scholar] [CrossRef]
Sahi, A.; Lai, D.; Li, Y.; Diykh, M. An Efficient DDoS TCP Flood Attack Detection and Prevention System in a Cloud Environment. IEEE Access 2017, 5, 6036–6048. [Google Scholar] [CrossRef]
Megouache, L.; Zitouni, A.; Sadouni, S.; Djoudi, M. Machine Learning for Cloud Data Classification and Anomaly Intrusion Detection. Ing. Des. Syst. D’information 2024, 29, 1809–1819. [Google Scholar] [CrossRef]
Vamsikrishna, M.; Latha, G.S.; Babu, G.V.R.; Giridhar, K.; Alluri, L.; Somasekhar, G.; Sagar, B.J.J.K.; Dondapati, N. Cloud computing environment based hierarchical anomaly intrusion detection system using artificial neural network. Int. J. Electr. Comput. Eng. 2025, 15, 1209–1217. [Google Scholar] [CrossRef]
Li, J.; Sun, H.; Du, H.; Li, L.; Zhang, Z. Network Intrusion Detection Method Based on Semi-Supervised Learning and Random Forest. IEICE Trans. Commun. 2025, E108-B, 1152–1163. [Google Scholar] [CrossRef]
Alharbi, E.; Marcolino, L.S.; Gouglidis, A.; Ni, Q. Robust Federated Learning Method gainst Data and Model Poisoning Attacks with Heterogeneous Data Distribution. In Frontiers in Artificial Intelligence and Applications; IOS Press: Amsterdam, The Netherlands, 2023; pp. 85–92. [Google Scholar] [CrossRef]
Rameshkumar, S.; Ganesan, R.; Merline, A. Progressive Transfer Learning-based Deep Q Network for DDOS Defence in WSN. Comput. Syst. Sci. Eng. 2023, 44, 2379–2394. [Google Scholar] [CrossRef]
Dugyala, R.; Chithaluru, P.; Ramchander, M.; Kumar, S.; Yadav, A.; Yadav, N.S.; Elminaam, D.S.A.; Alsekait, D.M. Secure cloud computing: Leveraging GNN and leader K-means for intrusion detection optimization. Sci. Rep. 2024, 14, 30906. [Google Scholar] [CrossRef]
Villegas-Ch, W.; Govea, J.; Navarro, A.M.; Játiva, P.P. Intrusion Detection in IoT Networks Using Dynamic Graph Modeling and Graph-Based Neural Networks. IEEE Access 2025, 13, 65356–65375. [Google Scholar] [CrossRef]
Guo, W.; Tondi, B.; Barni, M. Universal Detection of Backdoor Attacks via Density-Based Clustering and Centroids Analysis. IEEE Trans. Inf. Forensics Secur. 2024, 19, 970–984. [Google Scholar] [CrossRef]
Artioli, P.; Maci, A.; Magrì, A. A comprehensive investigation of clustering algorithms for User and Entity Behavior Analytics. Front. Big Data 2024, 7, 1375818. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Gao, L.; Han, Z.; Li, Z.; Chanussot, J. Enhanced Deep Image Prior for Unsupervised Hyperspectral Image Super-Resolution. In IEEE Transactions on Geoscience and Remote Sensing; IEEE: Piscataway, NJ, USA, 2025; Volume 63, p. 5504218. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Li, Z.; Gao, L.; Jia, X. X-Shaped Interactive Autoencoders with Cross-Modality Mutual Learning for Unsupervised Hyperspectral Image Super-Resolution. In IEEE Transactions on Geoscience and Remote Sensing; IEEE: Piscataway, NJ, USA, 2023; Volume 61, p. 5518317. [Google Scholar] [CrossRef]
Canadian Institute for Cybersecurity IPS/IDS Dataset on AWS (CSE-CIC-IDS2018). Available online: https://registry.opendata.aws/cse-cic-ids2018/ (accessed on 13 January 2025).
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, ICISSP, Funchal, Portugal, 22–24 January 2018; SciTePress: Setúbal, Portugal, 2018; pp. 108–116. [Google Scholar] [CrossRef]
Andresini, G.; Appice, A.; Malerba, D. Autoencoder-based deep metric learning for network intrusion detection. Inf. Sci. 2021, 569, 706–727. [Google Scholar] [CrossRef]
Alaghbari, K.A.; Lim, H.S.; Saad, M.H.M.; Yong, Y.S. Deep Autoencoder-Based Integrated Model for Anomaly Detection and Efficient Feature Extraction in IoT Networks. Internet Things 2023, 4, 345–365. [Google Scholar] [CrossRef]
Mustafa, D.H.; Husien, I.M. Adaptive DBSCAN with Grey Wolf Optimizer for Botnet Detection. Int. J. Intell. Eng. Syst. 2023, 16, 409–421. [Google Scholar] [CrossRef]
Jain, P.; Bajpai, M.S.; Pamula, R. A Modified DBSCAN Algorithm for Anomaly Detection in Time-series Data with Seasonality. Int. Arab. J. Inf. Technol. 2022, 19, 23–28. [Google Scholar] [CrossRef]
Alrowais, F.; Marzouk, R.; Nour, M.K.; Mohsen, H.; Hilal, A.M.; Yaseen, I.; Alsaid, M.I.; Mohammed, G.P. Intelligent Intrusion Detection Using Arithmetic Optimization Enabled Density Based Clustering with Deep Learning. Electronics 2022, 11, 3541. [Google Scholar] [CrossRef]
Monshizadeh, M.; Khatri, V.; Kantola, R.; Yan, Z. A deep density based and self-determining clustering approach to label unknown traffic. J. Netw. Comput. Appl. 2022, 207, 103513. [Google Scholar] [CrossRef]

Figure 1. Security challenges in future Internet edge–fog–cloud computing environments.

Figure 2. Architecture of the proposed dual-function autoencoder and DBSCAN framework.

Figure 3. Confusion matrix of the autoencoder-based anomaly detection for the 21-dimensional latent representation (HN-21).

Figure 4. Confusion matrix of the autoencoder-based anomaly detection for the 29-dimensional latent representation (HN-29).

Figure 5. Confusion matrix of the DBSCAN-based multiclass classification for the 21-dimensional latent representation (HN-21).

Figure 6. Confusion matrix of the DBSCAN-based multiclass classification for the 29-dimensional latent representation (HN-29).

Figure 7. Sensitivity of DBSCAN to the ε parameter for latent spaces HN-29 and HN-21, measured using the Silhouette Score.

Figure 8. Sensitivity of DBSCAN to the MinPts parameter for latent spaces HN-29 and HN-21, measured using the DBI.

Figure 9. Performance comparison of the proposed Autoencoder with existing methods.

Figure 10. Performance comparison of the proposed DBSCAN with existing methods.

Figure 11. Edge–fog–cloud deployment architecture of the proposed unsupervised intrusion diagnosis framework.

Table 1. Experimental samples utilized.

Categories	Samples
Dataset	CSECICIDS2018
Numbers of Features	80
Attack instances	2,748,235
Benign instances	13,484,708
Training Class (AE)	Benign
Training Class (DBSCAN)	Benign, DoS, DDoS, Botnet, Brute force, Infiltration and Web
Testing Class	Benign, DoS, DDoS, Botnet, Brute force, Infiltration and Web

Table 2. Configured Hyperparameters.

Hyperparameters	Configurations
Optimizer	Adam
Epochs/Batch Size	40/10,000
Learning rate	0.001
Number of Hidden Layers	2
Hidden Nodes	29/21
Epsilon (ε)	0.2
MinPts	600
Number of Clusters	Auto
Train Test Ratio	70:30

Table 3. Ablation results comparing AE-only, DBSCAN-on-raw, and AE + DBSCAN (HN-21 & HN-29).

Metrics	Autoencoder-Only		DBSCAN-Only		AE + DBSCAN
Metrics	HN-29	HN-21	HN-29	HN-21	HN-29	HN-21
Precision	0.8846	0.9714	0.6373	0.7035	0.9579	0.9701
Recall	0.9328	0.9973	0.9070	0.9279	0.9606	0.9852
Specificity	0.9752	0.9940	0.9479	0.9605	0.9623	0.9832
F1-Score	0.9081	0.9842	0.7486	0.8003	0.9683	0.9811
Accuracy	0.9680	0.9946	0.9441	0.9575	0.9686	0.9879

Table 4. Statistical Validation of Proposed AE + DBSCAN vs. Baseline Models.

Model	Dataset	Accuracy (±95% CI)	Macro-F1 (±95% CI)	Binary Anomaly Test (McNemar p-Value)	Multiclass Comparison (Paired Bootstrap p-Value)
Proposed AE +DBSCAN	HN-21	0.9879 ± 0.0021	0.9824 ± 0.0028	—	—
Best Baseline (DNN)	HN-21	0.9354 ± 0.0045	0.9271 ± 0.0052	p < 0.001	p = 0.008
Proposed AE +DBSCAN	HN-29	0.9813 ± 0.0025	0.9768 ± 0.0031	—	—
Best Baseline (DNN)	HN-29	0.9287 ± 0.0048	0.9214 ± 0.0056	p < 0.001	p = 0.012

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

K. S, S.; Elumalai, T.; Rajamani, R.; Kumar, A.; Balusamy, B.; Yogarayan, S.; Prabu, K. An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning. Future Internet 2026, 18, 54. https://doi.org/10.3390/fi18010054

AMA Style

K. S S, Elumalai T, Rajamani R, Kumar A, Balusamy B, Yogarayan S, Prabu K. An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning. Future Internet. 2026; 18(1):54. https://doi.org/10.3390/fi18010054

Chicago/Turabian Style

K. S, Suresh, Thenmozhi Elumalai, Radhakrishnan Rajamani, Anubhav Kumar, Balamurugan Balusamy, Sumendra Yogarayan, and Kaliyaperumal Prabu. 2026. "An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning" Future Internet 18, no. 1: 54. https://doi.org/10.3390/fi18010054

APA Style

K. S, S., Elumalai, T., Rajamani, R., Kumar, A., Balusamy, B., Yogarayan, S., & Prabu, K. (2026). An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning. Future Internet, 18(1), 54. https://doi.org/10.3390/fi18010054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

An Unsupervised Cloud-Centric Intrusion Diagnosis Framework Using Autoencoder and Density-Based Learning

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning–Based IDS Approaches

2.2. Hybrid and Ensemble Learning Approaches

2.3. Clustering and Graph-Based Approaches

2.4. Research Gap

2.5. Unsupervised Representation Learning in Adjacent Domains

3. Materials and Methods

3.1. Dataset Description

3.2. Preprocessing

3.3. Dimensionality Reduction Using Autoencoder

3.4. Anomaly Detection Using Autoencoder

3.5. Multiclass Attack Classification Using DBSCAN

Cluster-to-Attack Label Assignment

3.6. Experimental Design

3.6.1. Dataset Handling and Preparation

3.6.2. Parameter Setting and Justification

3.6.3. Experimental Evaluation Protocols

3.7. Novelty and Technical Contributions

4. Results

4.1. Performance of Autoencoder-Based Anomaly Detection

4.2. Performance of DBSCAN-Based Multiclass Classification

4.3. Statistical Analysis of Confusion Matrices

4.4. Ablation Study—Component-Wise Contribution

4.5. Sensitivity Analysis of Key Hyperparameters

4.6. Comparative Evaluation with Prior Methods

4.7. Statistical Validation

4.8. Computational Complexity and Scalability Analysis

4.8.1. Autoencoder Complexity

4.8.2. Reconstruction-Error Thresholding

4.8.3. DBSCAN on Latent Space

4.8.4. Memory and Runtime Efficiency

4.8.5. Scalability

4.8.6. Real-Time Deployment Considerations

4.8.7. Component-to-Layer Mapping

5. Discussion

5.1. Limitations

5.2. Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI