A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection

Al-E’mari, Salam; Sanjalawe, Yousef; Fraihat, Salam

doi:10.3390/ai6080165

Open AccessArticle

A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection

by

Salam Al-E’mari

^1,*

,

Yousef Sanjalawe

^2,*

and

Salam Fraihat

³

¹

Department of Information Security, Faculty of Information Technology, University of Petra (UoP), Amman 11196, Jordan

²

Department of Information Technology, King Abdullah II School for Information Technology, University of Jordan (JU), Amman 11942, Jordan

³

Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman P.O. Box 346, United Arab Emirates

^*

Authors to whom correspondence should be addressed.

AI 2025, 6(8), 165; https://doi.org/10.3390/ai6080165

Submission received: 15 June 2025 / Revised: 16 July 2025 / Accepted: 21 July 2025 / Published: 22 July 2025

Download

Browse Figures

Versions Notes

Abstract

The escalating sophistication of cyber threats underscores the critical need for intelligent and adaptive intrusion detection systems (IDSs) to identify known and novel attack vectors in real time. Feature selection is a key enabler of performance in machine learning-based IDSs, as it reduces the input dimensionality, enhances the detection accuracy, and lowers the computational latency. This paper introduces a novel optimization framework called Quantum Epigenetic Algorithm (QEA), which synergistically combines quantum-inspired probabilistic representation with biologically motivated epigenetic gene regulation to perform efficient and adaptive feature selection. The algorithm balances global exploration and local exploitation by leveraging quantum superposition for diverse candidate generation while dynamically adjusting gene expression through an epigenetic activation mechanism. A multi-objective fitness function guides the search process by optimizing the detection accuracy, false positive rate, inference latency, and model compactness. The QEA was evaluated across four benchmark datasets—UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT—and consistently outperformed baseline methods, including Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Quantum Genetic Algorithm (QGA). Notably, QEA achieved the highest classification accuracy (up to 97.12%), the lowest false positive rates (as low as 1.68%), and selected significantly fewer features (e.g., 18 on TON_IoT) while maintaining near real-time latency. These results demonstrate the robustness, efficiency, and scalability of QEA for real-time intrusion detection in dynamic and resource-constrained cybersecurity environments.

Keywords:

quantum-inspired optimization; epigenetic algorithms; feature selection; intrusion detection systems; cybersecurity

1. Introduction

The rapid expansion of digital infrastructure has introduced new vulnerabilities that adversaries continue to exploit with increasing sophistication. Cybersecurity threat landscapes are no longer defined by static signatures but by dynamic, evolving attack behaviors that often evade traditional rule-based intrusion detection systems (IDSs). As a result, machine learning (ML)-based IDSs have emerged as essential tools for detecting known and zero-day attacks through pattern recognition and anomaly detection [1,2]. Despite their promise, such systems are frequently hindered by high-dimensional network traffic data, leading to inefficiencies in model training, increased false alarm rates, and suboptimal real-time performance [3].

Feature selection is a critical preprocessing technique that mitigates these issues by isolating the most relevant attributes for classification, thereby reducing the computational cost and improving the detection accuracy. However, selecting optimal feature subsets in complex, high-dimensional spaces is a combinatorial problem that often requires robust global search strategies. Classical optimization methods, including filter-based and wrapper-based feature selection, are generally limited by local optima and scalability constraints [4]. To overcome these limitations, evolutionary and swarm-based algorithms, such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO), have been widely explored [5,6]. Yet, these approaches are still prone to premature convergence and lack adaptability in dynamic environments.

Traditional IDSs, particularly those based on signature matching, often fail to identify zero-day or adaptive attacks that do not match known patterns. As a result, there has been a significant shift toward the use of machine learning-based IDSs, which aim to identify anomalies and suspicious behaviors without relying on predefined signatures.

However, despite their potential, ML-based IDSs face several persistent challenges. A significant issue arises from the high dimensionality of network traffic data, which can degrade the model performance, increase the training time, and result in high false positive rates. Selecting the most informative features from these large datasets is a complex task, particularly when the data are noisy, imbalanced, or rapidly changing. Many existing feature selection methods—whether statistical, filter-based, or traditional metaheuristics—struggle with scalability and adaptability in such environments.

Moreover, the threat landscape itself is highly dynamic. Attack strategies are continuously evolving, and static models quickly become outdated. Yet, most ML-based IDSs are designed to operate under fixed assumptions, limiting their effectiveness in detecting previously unseen or shifting attack patterns. Existing solutions also tend to prioritize either accuracy or efficiency, rarely achieving a practical balance between detection performance, latency, and resource usage, especially in real-time or resource-constrained settings.

Recent research has explored bio-inspired and quantum-inspired optimization techniques to address some of these limitations. While these approaches have demonstrated improvements in exploration capabilities and convergence rates, they are typically applied in isolation. There remains a critical gap in the development of unified, adaptive frameworks that integrate these paradigms to support intelligent, context-aware feature selection for real-world IDS deployment.

We hypothesize that the proposed QEA, which integrates quantum-inspired probabilistic representation with biologically motivated epigenetic regulation, offers a superior performance for feature selection in intrusion detection systems compared to classical approaches. The quantum component facilitates broad exploration of the solution space through superposition-based encoding, thereby reducing the likelihood of premature convergence. Concurrently, the epigenetic mechanism enables adaptive gene expression in response to environmental feedback, promoting resilience and responsiveness to evolving cyber threats. This synergy is expected to enhance detection accuracy, minimize false positives, and maintain low computational latency in high-dimensional, real-time cybersecurity contexts.

This paper introduces a QEA for adaptive cybersecurity threat detection, a hybrid optimization framework that fuses quantum-inspired representation with biologically motivated epigenetic control to address these challenges. The quantum component utilizes probabilistic superposition to encode feature subsets, allowing for the parallel exploration of multiple configurations [7]. At the same time, the epigenetic mechanism dynamically regulates gene expression based on fitness feedback, inspired by how organisms adapt phenotype expression in response to environmental changes without altering the genetic code [8]. This dual-layered approach not only mimics biological adaptability but also significantly enhances the algorithm’s capacity to navigate the trade-off between exploration and exploitation. As a result, the model remains agile when faced with evolving or previously unseen attack patterns, while still maintaining a fast convergence rate.

Conventional feature selection methods often face two significant challenges: they tend to converge prematurely on suboptimal feature subsets, and they struggle to adapt effectively to dynamic or evolving data. The proposed QEA addresses these issues by combining quantum-inspired exploration with adaptive epigenetic control. The quantum component enables broad and diverse sampling of potential feature subsets through probabilistic superposition, which helps avoid getting stuck in local optima. At the same time, the epigenetic mechanism adjusts gene expression in response to feedback from the fitness landscape, allowing the algorithm to downregulate less useful features without completely removing them. This flexible regulation helps the model remain responsive to new or changing attack patterns, ultimately leading to more accurate, efficient, and adaptable intrusion detection systems. In essence, the key contributions of this study can be outlined as follows:

we propose a novel QEA that integrates quantum coding and reversible gene expression for efficient feature selection in IDS;
we design a custom multi-objective fitness function that balances accuracy, false positive rate, latency, and model compactness to meet real-time detection requirements;
The effectiveness of the proposed QEA is validated using four benchmark intrusion detection datasets—UNSW-NB15 (https://www.unb.ca/cic/datasets/cic-unsw-nb15.html, accessed on 21 March 2025), CIC-IDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html, accessed on 21 March 2025), CSE-CIC-IDS2018 (https://data.mendeley.com/datasets/29hdbdzx2r/1, accessed on 11 March 2025), and TON_IoT (https://research.unsw.edu.au/projects/toniot-datasets, accessed on 18 March 2025)—and benchmarked against a range of conventional and metaheuristic feature selection techniques.

Furthermore, experimental results demonstrate that the proposed QEA thoroughly analyzes performance and robustness across varying threat scenarios, establishing it as a promising tool for adaptive cybersecurity systems.

The remainder of this paper is structured as follows. Section 2 presents a comprehensive background on epigenetic algorithms, emphasizing their relevance to complex optimization problems. Section 3 surveys related work in feature selection, bio-inspired optimization techniques, and cybersecurity applications. Section 4 introduces the proposed QEA, detailing its conceptual foundations and methodological contributions. The experimental framework, including dataset preprocessing, fitness function formulation, and parameter tuning, is described in Section 5. Section 6 provides a thorough analysis of the experimental results, benchmarking QEA against conventional and quantum-inspired feature selection methods. Section 7 offers an interpretation of the results, highlighting practical implications, transparency, and deployment potential. Section 7 discusses the limitations of the proposed framework and avenues for enhancement. Finally, Section 9 concludes the paper and outlines potential directions for future research.

2. Background on Epigenetic Algorithm

The Epigenetic Algorithm (EA) is a bio-inspired optimization technique that draws its conceptual foundation from epigenetics, a field in molecular biology that studies how gene expression is regulated by mechanisms other than changes in the underlying DNA sequence. Unlike classical evolutionary algorithms such as the GA, which assume that genetic structures directly determine phenotypic traits, the EA introduces an additional control layer—epigenetic markers—that dynamically influence which genes are expressed during the fitness evaluation phase. This extra layer facilitates a more adaptive and robust search behavior in complex optimization landscapes, particularly when dealing with dynamic or noisy environments [9,10].

In the EA, each individual solution (chromosome) is represented by two components: a binary genotype

G = [g_{1}, g_{2}, \dots, g_{n}]

and an epigenetic expression vector

E = [e_{1}, e_{2}, \dots, e_{n}]

of the same length, where

g_{i} \in {0, 1}

and

e_{i} \in {0, 1}

. The phenotype P used for fitness evaluation is defined by the Hadamard product

P = G \circ E

, meaning that gene

g_{i}

is considered active in the phenotype only if

e_{i} = 1

. This mechanism introduces a dynamic regulation scheme that can suppress gene expression that has historically contributed to poor fitness outcomes, even if it is present in the genotype. The expression vector E is updated iteratively based on the success or failure of each gene’s contribution to the fitness function

F (P)

. Mathematically, the fitness function can be represented as [11]:

f_{i} = F (P_{i}) = F (G_{i} \circ E_{i})

(1)

where

f_{i}

denotes the fitness of the i-th individual. Epigenetic adaptation is governed by feedback mechanisms that strengthen or weaken gene expression based on environmental feedback or the performance of an objective function. One possible update rule for the expression vector is [12]:

e_{i}^{(t + 1)} = \{\begin{matrix} 1 & if Δ f_{i} > θ \\ 0 & if Δ f_{i} < - θ \\ e_{i}^{(t)} & otherwise \end{matrix}

(2)

where

Δ f_{i}

is the change in fitness due to the activation of gene i, and

θ

is a sensitivity threshold that controls the adaptation rate. A comparison between the traditional GA and the EA is shown in Table 1, highlighting the key conceptual and operational differences between the two frameworks. Moreover, the general pseudocode of the EA is presented in Algorithm 1.

A visual representation of the EA workflow is depicted in Figure 1. It illustrates the interaction between the genetic and epigenetic layers, including fitness evaluation and feedback-driven gene expression adaptation.

To illustrate the practical utility of the EA, consider its application to the feature selection problem in supervised machine learning. In this context, each gene

g_{i}

represents the inclusion (1) or exclusion (0) of a specific feature from the dataset. The expression vector E allows the algorithm to dynamically disable features that lead to model overfitting or poor classification accuracy, even if they are present in the genotype. This is particularly beneficial in high-dimensional spaces where traditional GA may converge slowly or become trapped in local optima due to fixed gene activation. For example, in a binary classification problem using a Support Vector Machine (SVM), the fitness function F can be the classification accuracy or F1-score on a validation dataset, and the EA helps in identifying an optimal subset of features that maximizes this performance metric.

Algorithm 1 Epigenetic Algorithm for Optimization

Input: Population size N, maximum iterations T, fitness function F, epigenetic threshold $θ$
Initialize population ${G_{i}}_{i = 1}^{N}$ with random binary chromosomes
Initialize epigenetic vectors ${E_{i}}_{i = 1}^{N}$ with ones
for $t = 1$ to T do
for $i = 1$ to N do
Compute phenotype $P_{i} = G_{i} \circ E_{i}$
Evaluate fitness $f_{i} = F (P_{i})$
end for
Identify best solution $G^{*}$ with highest f
for $i = 1$ to N do
Update $E_{i}$ based on $Δ f$ and threshold $θ$
Apply crossover and mutation to generate new $G_{i}$
end for
end for
Output: Best solution $G^{*}$

An example dataset with 10 features might result in an initial chromosome G = [1, 1, 0, 1, 0, 1, 1, 1, 0, 1] and an expression vector

E = [1, 0, 1, 1, 1, 1, 0, 1, 1, 1]

, leading to an effective phenotype

P = [1, 0, 0, 1, 0, 1, 0, 1, 0, 1]

. This phenotype is then evaluated using the chosen fitness function, and updates to E are made accordingly to guide the search toward more informative subsets.

The EA has shown promising results in various optimization domains such as neural network training [9], dynamic environments [13], and bioinformatics [14,15]. Its ability to incorporate short-term memory and adaptive gene regulation makes it a robust alternative to classical genetic approaches in non-stationary and feature-rich problem spaces.

3. Related Work

Feature selection plays a crucial role in enhancing the performance of machine learning-based intrusion detection systems, particularly in high-dimensional datasets of network traffic. Numerous ML-driven methods have been proposed to reduce the dimensionality while preserving the classification accuracy. These methods can broadly be categorized into filter, wrapper, and embedded approaches.

Filter-based methods, such as Information Gain, Chi-square, and ReliefF, rank features based on statistical relevance independent of the learning algorithm. While computationally efficient, they often overlook interactions between features and may not yield optimal subsets for complex classification tasks. Wrapper methods, including Sequential Forward Selection (SFS) [16] and Recursive Feature Elimination (RFE) [17], evaluate feature subsets by training a predictive model. These approaches typically achieve higher accuracy but are computationally intensive, making them unsuitable for large-scale or real-time environments.

Embedded methods integrate feature selection directly into the model-training process. Examples include LASSO regression [18], which utilizes L1 regularization to shrink coefficients, and tree-based models such as Random Forests or Gradient Boosted Trees, which provide feature importance scores. Though more efficient than wrappers, embedded methods may still struggle with redundancy and multicollinearity in large datasets.

Despite their success, traditional ML-based feature selection methods face several limitations. They are often sensitive to noise, prone to overfitting, and unable to explore the global feature space thoroughly [19]. Moreover, their performance depends heavily on the choice of classifier and hyperparameters, and they generally lack adaptability to dynamic or evolving data environments—a common characteristic of modern cyber threats.

To address these challenges, recent studies have explored bio-inspired and evolutionary algorithms, such as Particle Swarm Optimization (PSO) [20] and Ant Colony Optimization (ACO) [21], which offer improved search capabilities in high-dimensional spaces. However, these methods still encounter issues related to convergence speed, local optima, and the exploration–exploitation balance.

Quantum-inspired feature selection algorithms [22] have emerged as a promising alternative, combining probabilistic search and superposition-based encoding to improve convergence and global search behavior. These methods offer enhanced diversity in population-based optimization and introduce mechanisms such as quantum rotation gates to explore large search spaces efficiently. In the following subsections, we focus on quantum-enhanced and hybrid algorithms in the context of feature selection, discussing how they compare to classical metaheuristics and ML-based techniques.

Initially, Ezzarii et al. (2020) developed an EGA specifically for IDS. They design their model using biologically motivated operators called epimutation and epicrossover to alter the activity of genes as evolution occurs. Employing a chosen list of epigenetic factors, the algorithm searches only for active genes, resulting in fewer errors and highly accurate findings (up to 98%). Benchmark datasets were used to test the model, which outperformed standard genetic algorithms in terms of both speed and reliability [23]. In another study, Ezzarii et al. (2016) investigated the biological processes that govern epigenetic mechanisms in IDSs. They proposed turning specific genes on or off, based on their significance in responding to a threat. Due to dynamic filtering, the algorithm required fewer iterations and was more likely to find accurate rule sets. It was observed that the algorithm can respond to changes in attack patterns, which play an essential role in the modern battle against cybercrime [24].

Epigenetic logic is applied in conjunction with quantum paradigms, extending beyond mere intrusion detection. Saad et al. (2021) introduced a Quantum-Inspired Genetic Algorithm (QIGA) to handle resource-constrained scheduling problems. Although building an algorithm for cybersecurity was not its original purpose, its architecture, based on quantum gates, superposition states, and quantum rotation, has proven to be more effective and has discovered a broader range of solutions in searching for unknowns. Because of these findings, quantum concepts can be integrated into bio-inspired cybersecurity models [25]. These researchers, Mücke et al. (2023), proposed a new QUBO-based framework for selecting significant features in quantum computing applications used for threat detection. Quantum annealing is essential to their tactic, enabling better feature reduction and outperforming traditional selection methods. It works well with both conventional and quantum computers, which makes it worthwhile in mixed cybersecurity systems [26].

Fagbo et al. (2025) introduced a cyber threat detection system that uses quantum technology from the ground up. Both Grover’s search and Shor’s factorization are utilized in their architecture, enhancing the system’s ability to react quickly and make accurate predictions [27]. Furthermore, Dubey et al. (2025) depict the integration of AI and quantum, studying the use of AI–quantum hybrid models for real-time prevention of cyber threats. By combining AI’s ability to recognize patterns and the multiple computing streams of a quantum device, they deliver a threat detection system that responds actively to threats. Their work also addresses hardware scalability and efficient algorithms [28]. Ramya et al. (2024) described a new approach to combining NLP with Quantum Optimization in a parallel report. They utilized the Grover algorithm and BERT-based models to identify abnormal patterns and review threat intelligence reports. The integrated data processing made it possible to process data 50 percent faster and accurately recognise objects 92 percent of the time, also reducing 30 percent of misclassifications [29].

Quantum machine learning (QML) is also being studied as a tool for intelligent identification of cyber threats. A comprehensive framework based on QML and Quantum Neural Networks (QNNs) and Quantum Support Vector Machines (QSVMs) was proposed by Hossain et al. (2024) for mitigating anomalies and adversarial threats. For secure communication and to respond autonomously, the system utilizes QKD and QRL. When tested, QML had a 96% accuracy in spotting threats, and it reacted quicker, proving why it is such a flexible and adaptable method for defending against cyber threats. Another study also mentioned limitations when using quantum technology for AI, specifically due to the cost of quantum hardware and the difficulty in interpreting models created [30]. In addition, Azeez et al. (2024) developed a comprehensive real-time threat detection framework comprising QSVMs and QNNs. During this project, using the same technique, an accuracy of up to 96.7% was achieved, halving the latency time, making it suitable for critical infrastructure and financial systems [31].

Furthermore, researchers introduced a Quantum Deep Convolutional Neural Network (QDCNN) framework for detecting threats in cyber–physical systems in real-time. Although designed for autonomous vehicles, the architecture’s construction with both quantum and classical elements suggests that it may also be applicable to cybersecurity domains. There was notable progress and increased reliability in processing and handling noise. The system was more effective than older deep learning methods in high-dimensional pattern recognition and spotting anomalies. The study demonstrated that quantum-enhanced deep learning can support secure and fast-reacting applications [32]. Table 2 highlights the main existing related works.

While recent advancements in quantum-inspired computing and bio-inspired algorithms have shown significant promise in addressing complex optimization challenges, several critical limitations remain in their application to adaptive cybersecurity systems. A review of the existing literature reveals several open challenges and under-explored areas that motivate the development of a more robust and integrated solution. The key research gaps that this work aims to address are summarized below:

Lack of integrated quantum–epigenetic optimization: Existing works address epigenetic operators on classical hardware or quantum optimization without epigenetic regulation; none combine the two in a single adaptive framework.
Limited real-time, large-scale validation: Reported experiments typically use static benchmark datasets; streaming, high-throughput scenarios remain under-explored.
Explainability and compliance: Quantum-augmented models rarely provide interpretable decision logic or satisfy regulatory standards for security audits.
Standardized evaluation metrics: Heterogeneous datasets and metrics hinder objective comparison across studies; a unified benchmark remains lacking.

Addressing these gaps forms the core motivation for this study, which proposes a novel QEA designed to advance the state of the art in adaptive, interpretable, and resource-aware cybersecurity threat detection.

4. Proposed Quantum Epigenetic Algorithm for IDSs

The proposed QEA is a hybrid optimization method that combines quantum-inspired probabilistic representation with biologically motivated epigenetic regulation. This combination is particularly well-suited to the dynamic nature of cybersecurity environments, where the search for optimal configurations must remain both exploratory—to discover novel attack patterns—and exploitative—to converge rapidly for real-time deployment. The QEA is designed to address these dual requirements by employing quantum superposition for population initialization and diversity maintenance while utilizing reversible epigenetic masking to regulate the gene expression in response to environmental feedback. To enhance clarity, Figure 2 presents a step-by-step flowchart of the proposed Quantum Epigenetic Algorithm. The workflow begins with initializing the quantum population and encoding, followed by probabilistic feature generation, fitness evaluation using a classifier, and application of epigenetic gene masking. The quantum rotation gate updates guide the convergence, and the cycle continues until a termination criterion is met. The final output is the optimal subset of features that maximizes the classification performance.

In the context of intrusion detection, each individual in the population encodes either a binary feature-selection vector or a compact hyperparameter configuration for a downstream lightweight classifier. Each bit in the chromosome is represented by a quantum bit (qubit), initialized with a uniform distribution to ensure unbiased sampling of the solution space. During the measurement step, these qubits collapse into classical binary configurations, effectively allowing the algorithm to evaluate multiple hypotheses in parallel.

The proposed Quantum Epigenetic Algorithm (QEA) integrates quantum-inspired probabilistic search with a biologically motivated epigenetic memory mechanism. It aims to identify the most relevant feature subsets that enhance the accuracy and efficiency of intrusion detection systems. The overall process can be divided into two main phases: initialization and evolution, as detailed below.

4.1. Initialization Phase

In this phase, a population of quantum individuals is initialized. Each individual is represented by a string of quantum bits (qubits), where each qubit is encoded by a probability amplitude pair

(α, β)

, such that

{| α |}^{2} + {| β |}^{2} = 1

. These amplitudes reflect the probability of selecting a feature (1) or not (0) when the qubit is measured.

After initialization, quantum individuals are measured to generate a population of binary feature selection masks. Each mask is evaluated using a predefined fitness function (e.g., classification accuracy or F1-score) based on the selected features. A classifier—such as a decision tree, SVM, or random forest—is used to compute the fitness of each individual.

4.2. Evolution Phase

Once the initial population is evaluated, the algorithm proceeds with iterative optimization. In each generation, two main mechanisms drive the evolution process:

1. Epigenetic Gene Masking: Inspired by biological gene expression, this mechanism suppresses non-contributing or redundant features based on historical performance. Each individual’s gene expression is modulated based on a memory matrix that tracks beneficial gene activations over time.

2. Quantum Rotation Update: Quantum rotation gates are applied to update the probability amplitudes of each qubit. The rotation direction and angle depend on whether the corresponding gene contributed positively to the fitness of the best individual. This enables the population to converge toward more optimal feature subsets probabilistically.

The evolution process continues until a stopping criterion is met—either a maximum number of generations or stagnation in the best fitness score. The final output is the best-performing individual, representing the optimal feature subset for the intrusion detection task.

Table 3 summarizes the key parameters used in the QEA during both initialization and evolution phases. These parameters were selected based on empirical tuning and prior studies to strike a balance between exploration and exploitation while ensuring computational efficiency.

These parameters provide a flexible yet effective configuration for the QEA to explore high-dimensional search spaces. The epigenetic memory component enables the dynamic suppression of irrelevant features, while the probabilistic search, guided by quantum rotation, ensures convergence toward compact and high-performing feature subsets.

4.3. Epigenetic Regulation

Epigenetic regulation in the QEA introduces a memory-inspired mechanism that tracks the historical relevance of features across generations. Unlike conventional evolutionary strategies that rely solely on current fitness scores, this mechanism incorporates an epigenetic memory window, which maintains a short-term memory of gene usefulness over the past w generations.

To ensure computational efficiency, the epigenetic update process avoids performing an exhaustive fitness evaluation for all possible gene activations and deactivations. Instead of evaluating each gene flip across all d features per individual, we implement a localized and lightweight approximation strategy. Specifically, for each gene j in individual i, a temporary binary mask is generated by flipping the activation state

E_{i, j}

while keeping the rest of the genome unchanged. The fitness impact

Δ f

is then estimated using a cached or partial evaluation based on the classifier’s local output differences, rather than recomputing the full fitness from scratch. This approach significantly reduces the number of classifier evaluations required per generation—from

O (N \times d)

to approximately

O (N)

—where N is the population size, and d is the number of features. Moreover, to further accelerate the runtime, we apply the epigenetic update only to a small, randomly selected subset of genes per individual at each iteration, rather than evaluating all genes. This stochastic regulation mimics biological systems, balancing adaptability with computational cost.

This regulation mechanism is particularly beneficial in high-dimensional feature spaces, where irrelevant or redundant features may mislead the optimization process. By preserving information about feature usefulness over time, the algorithm dynamically adjusts its exploration strategy, promoting stable convergence while avoiding local optima.

In cases where the algorithm stagnates—evidenced by no improvement in the best individual over a pre-defined number of generations (e.g., 10)—epigenetic memory can trigger reinitialization or influence the quantum update to refocus the search trajectory. This strategy improves robustness and adaptability in the search process.

4.4. Quantum Rotation

Quantum rotation is the primary update mechanism used to evolve individuals in QEA. Each solution is encoded using a quantum chromosome represented by a series of qubits, where each qubit is defined by a pair of probability amplitudes

(α_{i}, β_{i})

, such that

| α_{i} |^{2} + {| β_{i} |}^{2} = 1

. These amplitudes reflect the likelihood of observing a classical bit value of 0 or 1 upon measurement.

During each generation, the algorithm evaluates observed individuals and compares them with the global best solution. Based on this evaluation, a quantum rotation gate is applied to update the amplitudes of each qubit. The rotation adjusts the current qubit state toward the desired value using a rotation angle

Δ θ

, which governs the speed and direction of convergence. The rotation operation is typically defined as follows:

[\begin{matrix} α_{i}^{t + 1} \\ β_{i}^{t + 1} \end{matrix}] = [\begin{matrix} cos (Δ θ) & - sin (Δ θ) \\ sin (Δ θ) & cos (Δ θ) \end{matrix}] \cdot [\begin{matrix} α_{i}^{t} \\ β_{i}^{t} \end{matrix}]

(3)

Here,

Δ θ

can be positive or negative, depending on whether the bit in the observed solution aligns with the corresponding bit in the global best. This allows for controlled exploration and exploitation, gradually steering the population toward optimal solutions without prematurely collapsing diversity.

Quantum rotation provides a smooth, probabilistic update mechanism that enhances the adaptability of the evolutionary process. When combined with epigenetic regulation, it supports a robust feature selection strategy that balances exploration of new solutions with the retention of historically successful patterns.

4.5. Fitness Function

The fitness of each solution is evaluated using a custom-designed multi-objective function tailored to the goals of intrusion detection, where the detection accuracy must be maximized, and both false positive rates and computational latency must be minimized. The proposed fitness function is defined as follows:

F (x) = λ_{1} \cdot Acc (x) - λ_{2} \cdot FPR (x) - λ_{3} \cdot T_{latency} (x) - λ_{4} \cdot \frac{| x |}{d},

(4)

where

x

is the binary feature-selection mask,

| x |

is the number of selected features, d is the total number of features,

Acc (x)

is the classification accuracy,

FPR (x)

is the false positive rate, and

T_{latency} (x)

is the latency or inference time of the corresponding model. The trade-off parameters

λ_{1}, λ_{2}, λ_{3}, λ_{4} \in R^{+}

are application-specific weights that prioritize different objectives. The final term acts as a regularizer to discourage the selection of overly large feature subsets, thereby promoting compact, interpretable models. However, each term in the equation is weighted by a coefficient

λ_{i}

that reflects its relative importance. These weights are selected such that their sum is equal to 1:

\sum_{i = 1}^{4} λ_{i} = 1

(5)

This ensures that the fitness function forms a convex combination of objectives, allowing for balanced contributions from all components [33]. To support systematic configuration, the weights can be derived from raw priorities

w_{i}

using the normalized formula:

λ_{i} = \frac{w_{i}}{\sum_{j = 1}^{4} w_{j}}

(6)

For example, if the raw importance values are assigned as

w_{1} = 5

(accuracy),

w_{2} = 3

(false positives),

w_{3} = 1

(latency), and

w_{4} = 1

(compactness), the resulting normalized weights are as follows:

λ_{1} = \frac{5}{10} = 0.5, λ_{2} = \frac{3}{10} = 0.3, λ_{3} = \frac{1}{10} = 0.1, λ_{4} = \frac{1}{10} = 0.1

(7)

This multi-objective fitness function enables the QEA to explore a diverse set of candidate feature subsets that maximize classification quality while meeting the operational demands of latency and model interpretability. Moreover, the inclusion of the feature ratio term encourages sparsity in feature selection, which is particularly valuable in network-based intrusion detection systems where excessive dimensionality can lead to overfitting, slower detection times, and increased memory consumption [34].

This fitness function integrates four objectives to guide the selection of optimal feature subsets within the QEA framework, and each component contributes uniquely to the overall evaluation:

Accuracy reflects the overall correctness of classification, ensuring the selected features maintain a high predictive performance.
FPN explicitly penalizes models that yield excessive false alarms, which is crucial in domains like cybersecurity and fraud detection. This term addresses class imbalance concerns not fully captured by accuracy alone.
F1-Score balances precision and recall, making it effective for evaluating minority class performance and providing a more nuanced view of classifier robustness.
Reduction Rate promotes model simplicity by encouraging the selection of fewer features, thereby reducing the risk of overfitting and computational cost.

To ensure a balanced contribution from all four terms, the weights

w_{1}, w_{2}, w_{3}, w_{4}

were determined through a systematic grid search strategy combined with five-fold cross-validation. The search space was constrained to satisfy

\sum w_{i} = 1

, and multiple candidate weight combinations were evaluated based on their average fitness score across validation folds. The combination yielding the highest average performance was selected. This approach ensures that the final weights are empirically supported and tailored to the specific characteristics of the dataset and classification task. Similar approaches inspire this data-driven weighting method in multi-objective feature selection frameworks [35,36].

Following the fitness evaluation, the best-performing individual

s_{best}

is identified, and the quantum chromosomes of all individuals are updated using a deterministic rotation mechanism:

Δ θ_{i, j} = \{\begin{matrix} + δ, & X_{i, j} = s_{best, j}, \\ - δ, & otherwise, \end{matrix}

(8)

where

δ

is a small, fixed rotation angle, and

X_{i, j}

is the classical value of the j-th gene of the i-th individual. This rule shifts the probability mass towards gene configurations that yield higher utility, encouraging convergence toward fitter solutions.

To mitigate premature convergence and preserve adaptability, the QEA incorporates an epigenetic regulation mechanism modeled through a continuous activation mask

E_{i, j} \in [0, 1]

for each gene. The activation level is updated as follows:

E_{i, j}^{(t + 1)} = \{\begin{matrix} min (E_{i, j}^{(t)} + η, 1), & if gene j improves fitness, \\ max (E_{i, j}^{(t)} - η, 0), & otherwise . \end{matrix}

(9)

Where

η

is the epigenetic learning rate, this formulation ensures that beneficial genes are reinforced over time, while harmful genes are gradually suppressed without being permanently eliminated. During the quantum update step, genes are probabilistically activated according to a Bernoulli distribution with success probability

E_{i, j}^{(t)}

. This strategy allows for dynamic gene reactivation in future iterations, essential for adapting to environmental drift in evolving threat landscapes.

The whole optimization pipeline is summarized in Algorithm 2, which integrates quantum initialization, epigenetic regulation, rotational update, fitness evaluation, and early-stopping based on a stagnation threshold.

This methodology enables the proposed QEA to strike a principled balance between efficient convergence and long-term adaptability. By leveraging quantum representations for probabilistic search and epigenetic mechanisms for dynamic control of gene activity, the algorithm becomes robust to concept drift, noise, and the need for real-time decision-making under uncertainty. Such capabilities are crucial in modern intrusion detection systems, where threat landscapes are becoming increasingly volatile and computational resources are limited.

Algorithm 2 QEA for IDS

Input: Dataset D, population size N, max iterations T, rotation angle $δ$ , epigenetic learning rate $η$ , fitness function F
Output: Optimal solution $S_{b e s t}$
Stage 1: Initialization
for each individual $i \in {1, . . ., N}$ do
Initialize quantum chromosome $Q [i]$ with random superposition states
Set epigenetic mask $E [i] [j] \leftarrow 1$ for all genes j
Collapse $Q [i]$ to classical $X [i]$
Evaluate fitness $f [i] = F (X [i])$
end for
$k \leftarrow arg {max}_{i} f [i]$ , $S_{b e s t} \leftarrow X [k]$ , $f_{b e s t} \leftarrow f [k]$
Stage 2: Evolution Loop
for each iteration $t = 1$ to T do
for each individual i do
Epigenetic Regulation:
for each gene j do
Flip $X [i] [j]$ to test fitness impact $Δ f$
if $Δ f < 0$ then
$E [i] [j] \leftarrow max (E [i] [j] - η, 0)$
else
$E [i] [j] \leftarrow min (E [i] [j] + η, 1)$
end if
end for
Quantum Rotation:
for each gene j where Bernoulli $(E [i] [j]) = 1$ do
if $X [i] [j] = S_{b e s t} [j]$ then
Apply $R_{y} (+ δ)$ to $Q [i] [j]$
else
Apply $R_{y} (- δ)$ to $Q [i] [j]$
end if
end for
Collapse $Q [i]$ to classical $X [i]$
Evaluate fitness $f [i] = F (X [i])$
if $f [i] > f_{b e s t}$ then
$S_{b e s t} \leftarrow X [i]$ , $f_{b e s t} \leftarrow f [i]$
end if
end for
end for
Return: $S_{b e s t}$

5. Experimental Setup

A comprehensive experimental framework was established to evaluate the effectiveness of the proposed QEA for feature selection in network intrusion detection. This section details the dataset used for training and evaluation, the preprocessing techniques applied to ensure data quality and consistency, the design of the multi-objective fitness function, and the configuration of the QEA parameters. Additionally, implementation details, including the hardware, software libraries, and classifier setup, are described to facilitate reproducibility and ensure a fair comparison with baseline methods.

5.1. Dataset and Preprocessing

This study’s experimental evaluation used four benchmark intrusion detection datasets: UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT. These datasets were selected for their diversity, representativeness of modern threat landscapes, and broad adoption in the research community.

The UNSW-NB15 dataset, curated by the Australian Centre for Cyber Security (ACCS) [37], addresses many of the limitations found in legacy datasets such as KDD99 and NSL-KDD [38]. It includes 2,540,044 labeled records with 49 features extracted using the IXIA PerfectStorm tool. The dataset captures normal and malicious traffic, with attacks categorized into nine groups: Exploits, DoS, Generic, and Reconnaissance [39].

The CIC-IDS2017 dataset, developed by the Canadian Institute for Cybersecurity, simulates both benign and malicious traffic over a five-day period and encompasses various attack types, including Brute Force, Heartbleed, DDoS, Port Scan, and Botnet. The dataset comprises over 3 million instances with 80 features, aiming to simulate real-world traffic behavior.

The CSE-CIC-IDS2018 dataset expands on the CIC-IDS2017 dataset by introducing a broader range of attacks, including Infiltration, SQL Injection, and XSS. It offers over 16 million labeled records and 84 features captured over 10 days in a hybrid testbed, making it one of the most comprehensive IDS datasets.

The TON_IoT dataset, released by the Cyber Range Lab at UNSW Canberra, integrates telemetry data from IoT sensors, operating systems, and network traffic. It reflects realistic smart environment settings, including industrial IoT devices, and is particularly suited for evaluating intrusion detection in resource-constrained environments.

For each dataset, a stratified sampling strategy was employed to extract a representative subset of 100,000 records, while preserving the class distribution. This approach ensures minority attack categories remain proportionally represented and mitigates potential class imbalance. To ensure computational feasibility and enable consistent runtime comparisons across all algorithms, we extracted a representative subset of records from each dataset. Subsampling was performed using stratified random sampling to preserve class distributions across attack categories. The same subsets were used for all feature selection algorithms and classification tasks to ensure a fair comparison. A fixed random seed (

seed = 42

) was used throughout the data preparation process to maintain the reproducibility and eliminate stochastic variance. Additionally, all datasets were inspected for missing values and outliers. Notably, UNSW-NB15 and CIC datasets are structured to contain minimal null entries [37].

The preprocessing phase was standardized across all datasets and comprised three steps:

Categorical encoding: Features such as protocol type, service, and state were transformed using one-hot encoding, preserving categorical semantics without introducing ordinal assumptions [40].
Normalization: All continuous-valued features were scaled to the range $[0, 1]$ using min–max normalization [41]:

$x^{'} = \frac{x - min (x)}{max (x) - min (x)}$

(10)

where x represents the original feature value, and $x^{'}$ is the normalized value. This normalization step mitigates the impact of scale variance among features, ensuring that distance-based classifiers, such as SVMs, are not disproportionately influenced by high-magnitude attributes [42].
Data splitting: Each preprocessed dataset was partitioned into 70% for training and 30% for testing using stratified random sampling to maintain class balance.

Initially, the complete feature set of each dataset (49–84 features) was retained as shown in Table 4. The proposed QEA was then applied to identify a compact and discriminative subset of features for each dataset. This enabled a rigorous evaluation of the QEA’s ability to improve the classification performance while minimizing the computational complexity and inference time.

The proposed approach has been evaluated on these four benchmark intrusion detection datasets, each encompassing a wide variety of attack categories. These datasets are designed to capture both traditional and modern multi-vector cyberattacks, which makes them well-suited to assess the robustness and versatility of the QEA. Specifically, the UNSW-NB15 dataset contains attacks such as Denial-of-Service (DoS), Exploits, Fuzzers, Backdoors, Reconnaissance, and Shellcode. The CIC-IDS2017 and CSE-CIC-IDS2018 datasets feature a broader spectrum, including Brute Force, Port Scanning, Botnets, Distributed Denial-of-Service (DDoS), Infiltration, SQL Injection, and Cross-Site Scripting (XSS). The TON IoT dataset introduces unique IoT-specific threats, including information theft, data poisoning, backdoor injection, and firmware manipulation.

This diversity of attack categories ensures that the QEA is tested against a wide range of known and evolving cyber threats, providing confidence in its adaptability to both conventional and emerging security challenges. By leveraging this variety of attack types, the proposed method can offer effective detection capabilities for different cybersecurity domains, including enterprise networks, IoT environments, and cloud-based systems.

5.2. Fitness Function

The multi-objective fitness function used in QEA is defined in Equation (4), where a user-defined scalar weights each term to reflect its relative importance. Specifically,

λ_{1}

emphasizes the classification accuracy,

λ_{2}

penalizes the false positive rate (FPR),

λ_{3}

penalizes latency (average inference time), and

λ_{4}

discourages the selection of excessively large feature subsets. These weights guide the QEA search toward solutions that are not only accurate but also efficient and generalizable.

Table 5 shows the effect of different weight configurations. When accuracy is prioritized (e.g.,

λ_{1} = 0.6

), the algorithm selects larger subsets that yield high detection rates but may incur increased latency. In contrast, emphasizing compactness (e.g.,

λ_{4} = 0.4

) encourages smaller subsets, which are particularly beneficial for deployment in resource-constrained environments, such as edge devices or embedded systems. On the other hand, uniform weights provide a balanced approach to all objectives and can serve as a baseline.

To ensure that no single objective dominates the optimization process due to scale differences, the weights

λ_{1}, λ_{2}, λ_{3}, λ_{4}

were selected such that their sum equals one:

\sum_{i = 1}^{4} λ_{i} = 1

(11)

This normalization allows the fitness function to represent a weighted convex combination of its components, facilitating a balanced trade-off between detection performance and efficiency [43]. Moreover, since metrics like accuracy and false positive rate are naturally bounded in the range

[0, 1]

, it is essential to scale latency and feature compactness in a comparable range using min–max normalization before combining them. This standardization prevents any term from disproportionately influencing the fitness outcome due to differences in magnitude, thereby ensuring fair contribution from all objectives [44].

In our experiments, we selected

λ_{1} = 0.5

,

λ_{2} = 0.3

,

λ_{3} = 0.1

, and

λ_{4} = 0.1

to prioritize the classification accuracy while keeping the false positive rate and model complexity within acceptable operational limits. This configuration was found to offer a good balance for practical deployment in cybersecurity environments where reliable threat detection is essential, but efficiency must not be overlooked.

5.3. Feature Selection Based on QEA

The proposed QEA drove the feature selection process in this study. It is designed to identify the most discriminative and compact subset of features across four benchmark intrusion detection datasets: UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT. The QEA serves as a metaheuristic optimization strategy that balances the classification performance with model compactness and latency, addressing the real-time requirements of IDS.

In each generation of the QEA, candidate feature subsets were evaluated using a Light Gradient Boosting Machine (LightGBM) classifier. LightGBM was selected due to its efficiency in handling high-dimensional structured data, fast training and inference speeds, and proven accuracy in large-scale classification tasks. Its histogram-based learning and leaf-wise growth strategy allows it to model complex, non-linear attack behaviors effectively [45,46].

The classification performance of each subset was assessed using stratified five-fold cross-validation. The results informed a multi-objective fitness function (Equation (4)) that simultaneously optimized accuracy, false positive rate, prediction latency, and feature compactness.

Empirical tuning was conducted to optimize QEA’s hyperparameters, drawing from preliminary grid search and relevant literature [9,47,48,49]. The final configuration used in all experiments is shown in Table 6 and yielded strong performance across all datasets, achieving an accuracy up to 97.12% while significantly reducing the number of features and maintaining low latency.

Through this adaptive configuration, the QEA was able to generalize across diverse and imbalanced datasets, demonstrating strong convergence behavior and yielding compact feature sets (e.g., 18 out of 44 features on TON_IoT and 21 out of 49 on UNSW-NB15) without compromising detection performance. The hybrid quantum–epigenetic optimization process enables dynamic learning and resilient feature suppression, making it well-suited for evolving cybersecurity environments.

It is important to note that while the QEA optimization process used an SVM classifier with RBF kernel to evaluate candidate feature subsets during fitness computation (due to its robustness in small-sample, high-dimensional spaces), the final evaluation of the classification performance across all feature selection methods was conducted using LightGBM to ensure a consistent and fair benchmarking framework.

5.4. Implementation Details

The proposed QEA was implemented in Python 3.10 using the NumPy and scikit-learn libraries. Quantum-inspired operations such as qubit state updates and measurement collapses were simulated using vectorized probability distributions. At the same time, epigenetic regulation was realized via continuous-valued activation masks applied at each generation. The multi-objective fitness function was computed using an embedded SVM classifier with an RBF kernel, and the classification performance was evaluated through five-fold cross-validation. Table 7 presents the system and implementation configuration.

The latency was measured using Python’s time module by timing the average prediction duration per test sample. It is essential to note that the latency measurements reported in Section 6 refer exclusively to the inference time of the final trained classifier after feature selection has been completed. Specifically, the reported values reflect the time required to make a single prediction using the reduced feature set. The computational cost of the QEA feature selection or training phase is not included in the latency figures, as the purpose of this metric is to assess the real-time suitability of the deployed detection model. All experiments were executed on a workstation with the following specifications in Table 7:

Table 7. System and implementation configuration.

Specification	Details
Processor	Intel Core i7-11700K @ 3.60GHz
RAM	32 GB DDR4
Operating System	Windows 11 Pro, 64-bit
Python Version	Python 3.10
Key Libraries	`numpy`, `scikit-learn`, `matplotlib`, `seaborn`, `pandas`, `qiskit` (for quantum simulation)
Development Environment	Spyder Editor
Fitness Evaluation	Five-fold Cross-Validation using SVM (RBF kernel)
QEA Implementation	Custom quantum-inspired and epigenetic logic simulated in Python using probabilistic amplitudes and Bernoulli masks

To ensure a fair and consistent comparison, all baseline feature selection algorithms, EA, GA, PSO, QGA, and QPSO, were implemented using a unified experimental pipeline and evaluated with the same LightGBM classifier settings. Hyperparameters for each algorithm were tuned through preliminary experimentation and grid search to optimize performance on the validation set. Initially, a coarse grid was used to explore a wide range of values for key parameters, such as population size, mutation rates, rotation angles, and epigenetic thresholds. Promising regions were then refined using a finer grid. The evaluation criterion was the average F1-score obtained from stratified five-fold cross-validation on the training split. To ensure reproducibility, all experiments were conducted with fixed random seeds and consistent data folds across algorithms. The best-performing hyperparameter set for each algorithm was used in final testing.

All implementations followed consistent logic for population initialization, stopping criteria, and evaluation protocols. The complete source code and reproducibility scripts will be made available upon request or as part of a supplementary repository accompanying the publication. Table 8 summarizes the hyperparameter values used in the experiment.

6. Results and Discussion

To assess the effectiveness of the proposed QEA for feature selection in intrusion detection systems, a comparative analysis was conducted against five established metaheuristic baselines: PSO [5,20], GA [25], EA [9], QGA [50], and QPSO [51]. These algorithms were selected due to their prominence in the literature and their conceptual relevance as evolutionary, swarm-based, and quantum-inspired methods.

To ensure the integrity and fairness of the experimental setup, all algorithms were evaluated under consistent and controlled conditions. All final feature selection outputs were assessed using the LightGBM classifier for consistency. During QEA optimization, however, an SVM was used internally for fitness evaluation. All experiments used stratified five-fold cross-validation and equal data splits, avoiding data leakage and sampling bias. Different metrics were used to assess the model performance, such as classification accuracy, false positive rate (FPR), prediction latency, F1-score, and the number of features. The evaluation metrics used are defined mathematically as follows:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(12)

Precision = \frac{T P}{T P + F P}

(13)

Recall = \frac{T P}{T P + F N}

(14)

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(15)

FPR = \frac{F P}{F P + T N}

(16)

Prediction Latency = t_{end} - t_{start}

(17)

These terms mean the following:

T P

(True Positives) refer to the number of times the classifier detects true attacks,

T N

(True Negatives) represent benign cases that were correctly identified,

F P

(False Positives) are false alarms resulting from wrong detection, and

F N

(False Negatives) are actual attacks that went undetected.

In the above equations, the terms are defined as follows:

T P

(True Positives) refer to the number of attack instances correctly identified by the classifier. In contrast,

T N

(True Negatives) denote the number of benign or normal instances accurately classified as non-attacks.

F P

(False Positives) represent benign instances incorrectly labeled as attacks, leading to false alarms, and

F N

(False Negatives) correspond to attack instances mistakenly classified as benign, potentially allowing threats to go undetected. Moreover, the prediction latency is simply the difference between when the prediction ends (

t_{end}

) and when it began (

t_{start}

) because this standardized framework allows for the actual feature selection method to be separated from the classifier’s and protocol’s influences, enabling solid comparison data.

6.1. Confusion Matrix and Statistical Performance Analysis

To gain a deeper understanding of the classification performance of the proposed QEA, we present and analyze the confusion matrices generated for each benchmark dataset. These matrices provide detailed insights into how well the model distinguishes between normal and malicious traffic, particularly in terms of TP, FP, FN, and TN.

For the UNSW-NB15 dataset, the QEA-optimized classifier achieved a strong balance between detection and precision. The confusion matrix is shown below in Table 9:

From this, the model attained an accuracy of 97.81%, an FPR of 1.94%, and an FNR of 2.46%. The precision and recall were 97.90% and 97.54%, respectively, resulting in an F1-score of 97.72%. These results demonstrate that the QEA can effectively minimize both types of misclassifications while maintaining high reliability in detection.

On the CIC-IDS2017 dataset, which is known for its class imbalance and diverse attack types, the confusion matrix was as follows in Table 10:

The model achieved an accuracy of 98.15%, with an FPR of 1.56% and an FNR of 2.19%. The precision reached 98.23%, the recall was 97.81%, and the F1-score stood at 98.02%. These values confirm the QEA’s ability to maintain both detection sensitivity and stability across imbalanced data distributions.

This high-dimensional dataset posed a challenge due to its diverse range of features and potential attack vectors. Nevertheless, QEA maintained top-tier performance. The confusion matrix is as follows in Table 11:

The corresponding accuracy was 97.97%, with an FPR of 2.25% and an FNR of 1.80%. The classifier recorded a precision of 97.69%, recall of 98.20%, and an F1-score of 97.94%. These results highlight the model’s capability to handle large-scale, feature-rich datasets while preserving the classification quality.

The TON_IoT dataset, which includes telemetry data from IoT devices, is inherently noisy and heterogeneous. The QEA framework delivered consistent results, with the following confusion matrix presented in Table 12:

As presented in Table 13, the accuracy achieved was 97.09%, with an FPR of 2.97% and FNR of 2.85%. The model reached a precision of 96.94%, a recall of 97.15%, and an F1-score of 97.05%. These metrics support the suitability of QEA for deployment in edge environments, where low latency and generalization across noisy inputs are essential.

In summary, the confusion matrices and associated metrics provide strong evidence of QEA’s ability to accurately classify both normal and malicious traffic, while maintaining a balance between false positives and false negatives. The consistent statistical performance across diverse datasets—ranging from enterprise-level traffic to IoT telemetry—reinforces the model’s robustness, efficiency, and generalizability for real-world cybersecurity applications.

6.2. Accuracy

Figure 3 clearly shows that the QEA method outperformed other methods on all four evaluated datasets. The QEA approach outperforms conventional and quantum-based techniques in identifying attacks across multiple intrusion detection scenarios.

On UNSW-NB15, the QEA proved to be the best, with an accuracy score of 96.05%, surpassing the closest competitors, QPSO (92.70%) and QGA (94.45%). This margin proves that the QEA can find essential data patterns, including known and hidden attacks. QEA gained 96.38% accuracy on the CIC-IDS2017 dataset, which is greater than QGA (94.60%) and far above traditional algorithms GA (91.96%) and EA (89.12%). The more accurate results on the new dataset, which features a broader variety of attacks, demonstrate that QEA can effectively address complex and uneven data distributions.

The CSE-CIC-IDS2018 dataset, with more features and attacks, demonstrated that QEA achieved the best accuracy of 97.12%, significantly outperforming QGA (95.50%) and QPSO (94.23%). The finding underscores that QEA is effective in working with high-dimensional data, which is often too complex for conventional algorithms. Results on the TONNE IoT dataset, which contains mixed data types and embedded telemetry, demonstrate that QEA can effectively handle diverse data. QEA scored a remarkable 94.47% accuracy, much higher than EA’s 88.20% and even higher than projected by advanced techniques like QGA (93.75%) and QPSO (92.60%). This outcome demonstrates the benefit of QEA when resources are tight, such as in smart devices and IoT systems.

Overall, QEA’s consistent dominance in classification accuracy across all datasets confirms its capability as a highly effective and generalizable feature selection mechanism for modern IDS deployments. The observed improvements are attributable to QEA’s hybrid nature, which synergizes quantum probabilistic encoding with epigenetic memory regulation, enabling more effective feature space exploration and preservation of relevant information throughout the optimization process.

6.3. F1-Scores

The comparative F1-score results across all datasets, as illustrated in Figure 4, further reinforce the effectiveness of the proposed QEA framework. As a balanced metric that accounts for the precision and recall, the F1-score is particularly well-suited for intrusion detection systems, where minimizing false positives and false negatives is critical.

On the UNSW-NB15 dataset, QEA achieved the highest F1-score of 95.76%, outperforming traditional methods such as GA (91.41%) and EA (88.33%) as well as advanced quantum-inspired approaches like QGA (94.04%) and QPSO (92.01%). This result suggests that QEA maintains a strong balance in detecting attack and benign traffic, with minimal classification bias. The performance advantage of QEA is even more evident on the CIC-IDS2017 dataset, where it achieved an F1-score of 96.01%, substantially higher than EA (88.75%) and notably above QGA (94.21%) and QPSO (92.17%). This dataset presents a more diverse mix of attack vectors and class imbalance, making it more challenging to obtain high F1-scores and highlighting QEA’s resilience under more complex detection conditions.

On the high-dimensional CSE-CIC-IDS2018 dataset, QEA maintained its dominance, reaching an F1-score of 96.83%. In comparison, QGA and QPSO recorded scores of 95.21% and 93.71%, respectively, while classical methods like GA and EA trailed at 92.43% and 89.87%. These outcomes confirm that QEA’s hybrid optimization strategy effectively selects informative features that enhance the classifier’s generalization ability across large and varied attack datasets. Finally, on the TON_IoT dataset, which introduces additional noise and heterogeneity due to its IoT-specific telemetry data, QEA still outperformed all baselines, with an F1-score of 94.05%. The closest competitor, QGA, reached 93.29%, while EA lagged behind at 87.34%. These results further emphasize QEA’s robustness and adaptability in conventional and edge computing environments.

Taken together, the F1-score comparisons across all datasets strongly indicate that QEA not only achieves a high classification accuracy but also does so with a high degree of consistency and reliability. Its superior performance can be attributed to its quantum–epigenetic synergy, which enables compelling exploration of the search space while adaptively preserving relevant genetic information to guide the optimization process.

6.4. FPR

The comparison of FPR, shown in Figure 5, provides critical insight into the reliability of each feature selection method under real-world intrusion detection scenarios. In cybersecurity, particularly in automated intrusion detection systems (IDSs), minimizing false alarms is essential to maintaining operational trust, reducing alert fatigue, and preventing unnecessary manual investigation.

Across all datasets, the proposed QEA consistently achieved the lowest FPR values, outperforming traditional and quantum-inspired baselines. On the UNSW-NB15 dataset, QEA achieved an FPR of 1.91%, representing a significant reduction compared to GA (2.94%) and EA (3.49%). This pattern was mirrored on the CIC-IDS2017 dataset, where QEA achieved 1.74%, the best among all tested methods—demonstrating its robustness in handling class imbalance and ambiguous attack patterns.

The advantage of QEA becomes even more pronounced on the CSE-CIC-IDS2018 dataset, which contains more nuanced and modern attack types. Here, QEA achieved an FPR of 1.68%, notably lower than QGA (2.07%) and QPSO (2.36%), and far below that of EA, which exhibited the highest rate at 3.38%. Such results suggest that QEA improves the detection accuracy and avoids the over-classification of benign traffic as malicious, a common drawback in many machine learning-based IDS implementations. On the TON_IoT dataset, which introduces heterogeneity and noisy telemetry signals typical of IoT environments, QEA still delivered a competitive FPR of 2.02%. This outperforms QPSO (2.66%) and EA (3.80%), further confirming QEA’s adaptability across diverse and resource-constrained deployment scenarios.

These results validate the effectiveness of QEA’s hybrid optimization approach, which combines quantum-inspired rotation for global exploration with adaptive epigenetic memory to fine-tune feature selection. This synergy allows QEA to identify feature subsets that yield a high classification accuracy and low false alarm rates—an essential criterion for deploying IDS models in practical, real-time environments.

6.5. Latency

The latency performance, depicted in Figure 6, offers a critical dimension to evaluating the real-time applicability of feature selection algorithms in IDS. In practical deployment, especially in time-sensitive environments such as critical infrastructure and IoT, achieving low inference latency is essential to ensure prompt threat mitigation and system responsiveness.

Among the six evaluated methods, PSO and GA consistently demonstrated the lowest latency, maintaining prediction times below 0.09 milliseconds across all datasets. PSO achieved the most favorable average latency (e.g., 0.06 ms on UNSW-NB15), which aligns with its lightweight evolutionary design and minimal memory overhead. GA performed similarly, reflecting its efficiency in converging on sub-optimal yet computationally inexpensive solutions.

By contrast, EA exhibited the highest latency among the classical methods, reaching up to 0.13 ms on TON_IoT. This can be attributed to the additional computational overhead associated with epigenetic regulation logic, which, although beneficial for accuracy and feature stability, introduces marginal inference delays. As expected, QGA incurred the highest latency overall, peaking at 0.89 ms on the CIC-IDS 2017 dataset. Its reliance on quantum-inspired rotation and population-wide transformations leads to increased computational complexity. Despite this, QGA maintained strong detection performance, making it suitable for environments where latency is less constrained but accuracy is paramount.

QEA balanced the model complexity and responsiveness, maintaining the latency in the 0.55–0.61 ms range across all datasets. While slightly higher than PSO and QPSO, QEA’s latency remained within acceptable limits for near-real-time applications. The marginal increase in inference time is justified by its significant performance gains in accuracy and reduction in false positives. Notably, QEA outperformed QGA in latency and classification metrics, demonstrating that hybrid bio-inspired models can deliver speed and precision when effectively designed.

While PSO and GA offer the fastest response times, QEA presents a superior trade-off between the predictive performance and real-time feasibility. This makes it a strong candidate for deployment in environments where both the detection quality and timeliness are crucial.

6.6. Feature Selection

The number of features selected by each algorithm, presented in Figure 7, offers insights into the dimensionality reduction capabilities of the evaluated methods. Feature selection is pivotal in intrusion detection, as it reduces computational overhead and training time, improves generalization, and eliminates irrelevant or redundant information that may degrade the classifier performance.

Across all four datasets, the proposed QEA consistently selected fewer features than all baseline methods while achieving a superior classification performance. On the UNSW-NB15 dataset, QEA reduced the feature space to 21 attributes, a substantial decrease from PSO (32), GA (35), and QGA (37). This confirms QEA’s efficiency in isolating the most discriminative features while discarding noise. The benefits of QEA’s minimal feature selection are particularly evident on the CIC-IDS2017 and CSE-CIC-IDS2018 datasets. At CIC-IDS2017, QEA selected only 27 features, compared to 43 by GA and 42 by PSO, yet still outperformed all competitors in both accuracy and F1-score. A similar pattern was observed on CSE-CIC-IDS2018, where QEA chose 24 features, while most other methods selected 35–41 features. This reduction highlights QEA’s ability to identify compact and informative feature subsets in high-dimensional and heterogeneous data settings.

Even in the TON_IoT dataset, which is characterized by sparse and noisy telemetry data, QEA reduced the number of features to 18, the smallest among all methods evaluated. In contrast, classical and swarm-based methods retained significantly larger subsets (e.g., 30 for PSO and GA, 27 for QPSO), indicating less aggressive pruning strategies. The consistent feature compactness achieved by QEA can be attributed to its hybrid optimization mechanism. The quantum rotation component enhances global exploration and helps escape local minima, while the epigenetic adaptation dynamically suppresses less relevant genes without irreversible deletion. This synergy ensures that only the most relevant features are preserved, supporting the classification performance and model interpretability.

In conclusion, QEA’s feature selection behavior demonstrates that it achieves not only high accuracy and robustness but also excellent model compactness. This makes it an ideal candidate for deployment in real-time and resource-constrained cybersecurity environments. QEA achieved a desirable trade-off between dimensionality reduction and classification efficacy, often selecting fewer features than other quantum-enhanced methods while preserving high accuracy.

6.7. Convergence

Figure 8 illustrates the convergence behavior of all evaluated algorithms over 100 iterations, as measured by their fitness score minimization trajectories. The figure reveals that the proposed QEA demonstrates the most rapid and stable convergence, achieving the lowest final fitness score (0.2599). Outstanding performances are also achieved by QPSO and QGA, both of which leverage ideas from quantum computing in their search strategies. In comparison, EA reaches better fitness much more slowly and stops far from optimal, which may keep it trapped in local optima.

QEA’s convergence profile shows that it is very effective at searching for the ‘best’ solution, and it does this effectively by achieving a good compromise between global exploration and local exploitation. This behavior results from its hybrid optimization mechanism, which involves probabilistic exploration of new regions through quantum rotation dynamics and the reinforcement of beneficial gene expressions while suppressing irrelevant or disruptive features via its epigenetic memory module. Adaptive modulation of the search process enhances convergence speed and prevents premature fixation on a suboptimal solution. Furthermore, the decline on QEA’s curve happens smoothly and steadily compared to the choppy and unreliable behavior of GA and EA. Such swings indicate that something is amiss with the optimization process or how the trainer utilizes the solutions.

The convergence analysis, combined with accuracy, F1 score, FPR, latency, and feature compactness results, provides strong empirical evidence of the superiority of QEA. Consistently optimized across multiple datasets confirms its robustness, scalability, and applicability to real-world intrusion detection scenarios where rapid and reliable feature selection is critical.

6.8. Ablation Study

To assess the individual contribution of each component in the proposed QEA, we conducted an ablation study on the CSE-CIC-IDS2018 dataset. The goal of this analysis is to evaluate how the exclusion of either the epigenetic regulation or the quantum-inspired rotation affects overall detection performance and model compactness.

We examined three configurations:

Full QEA: The complete model with both quantum rotation and epigenetic masking.
QEA without Epigenetic Mechanism: The model retains quantum-inspired encoding but disables adaptive gene expression.
QEA without Quantum Rotation: The model applies epigenetic regulation but removes the quantum rotation update from the optimization loop.

Figure 9 shows the comparison in terms of accuracy and F1-score, while Figure 10 illustrates the number of selected features for each configuration.

As shown, the full QEA configuration achieved the best overall performance, with an accuracy of 97.97% and an F1-score of 97.94%. When the epigenetic mechanism was removed, the accuracy dropped to 95.42% and the F1-score to 95.10%. Similarly, removing the quantum component resulted in a further decrease to 94.88% accuracy and 94.65% F1-score. These results highlight the synergistic effect of combining quantum search with biologically adaptive feedback.

In terms of feature selection, the full QEA selected only 24 features, compared to 32 features without epigenetics and 29 features without the quantum component. This further emphasizes the dual importance of both mechanisms, not only in improving classification performance but also in producing compact and interpretable models.

The ablation study confirms that both the quantum and epigenetic components of QEA play crucial roles in this context. Quantum rotation enhances global exploration and avoids premature convergence, while epigenetic regulation ensures adaptability and effective suppression of non-informative features. Their integration is essential to achieving state-of-the-art results in adaptive intrusion detection.

The proposed QEA was evaluated across four diverse datasets—UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT—each representing different network topologies, attack vectors, and data dimensionalities. The consistent improvement in performance across these datasets demonstrates the framework’s strong generalization capabilities. Since the QEA operates in a data-agnostic manner and only requires a labeled dataset as input, it can be extended beyond intrusion detection to other domains involving high-dimensional classification tasks, such as financial fraud detection, healthcare diagnostics, or industrial control systems.

In terms of interpretability, the QEA framework enhances model transparency by reducing features. By selecting a minimal and highly discriminative subset of features, it not only reduces the computational complexity but also improves the clarity of the decision-making process for downstream classifiers. This supports human-in-the-loop analysis, where cybersecurity professionals can better understand which features triggered an alert. However, we acknowledge that the internal optimization mechanisms (e.g., quantum rotation and epigenetic feedback) are abstract in nature. To enhance user trust and regulatory compliance, future work may integrate explainable AI tools, such as SHAP or LIME, to provide instance-level explanations and better interpret the model’s rationale. While this study does not directly incorporate formal explainability tools such as SHAP or LIME, the compactness of the feature sets enables easier post hoc analysis and visualization. Moreover, QEA’s output can be readily integrated with such tools to provide instance-level attribution, supporting explainable decision-making in cybersecurity applications that require human-in-the-loop oversight or regulatory transparency.

6.9. Statistical Significance Testing

While raw performance metrics, such as accuracy and F1-score, provide an initial indication of model effectiveness, it is essential to assess whether observed improvements are statistically meaningful. To this end, we conducted a series of paired t-tests comparing the proposed QEA to five established baseline feature selection methods: PSO, GA, EA, QGA, and QPSO. These tests were performed across all four benchmark datasets.

For each method, we computed the average accuracy and F1-score across five independent experimental runs using stratified five-fold cross-validation. Paired t-tests were then applied to compare QEA’s results with those of each baseline method for both metrics. The null hypothesis in each case was that there is no significant difference between the means of QEA and the corresponding baseline method.

As shown in Table 14, the p-values for all comparisons fall below the standard significance threshold of 0.05, confirming that the observed differences between QEA and the baseline models are statistically significant. The lowest p-values were recorded in comparisons with EA and GA, suggesting that QEA provides the most crucial relative improvement over classical evolutionary approaches. Even when compared to quantum-enhanced methods such as QGA and QPSO, the QEA still demonstrates statistically meaningful superiority.

These results reinforce the robustness of QEA’s hybrid design. The consistent performance gains across both accuracy and F1-score are not only empirically visible but also statistically validated. This adds further credibility to the generalization of our findings and supports the adoption of QEA in real-world cybersecurity applications where reliable performance is critical.

In future work, we aim to complement these t-tests with non-parametric alternatives such as the Wilcoxon signed-rank test to further validate the robustness of our results, particularly under skewed or non-normal performance distributions.

7. Interpretation and Discussion

The results obtained in this study demonstrate the considerable promise of the proposed QEA as a practical hybrid optimization framework for adaptive cybersecurity threat detection. By combining quantum-inspired probabilistic search with biologically motivated epigenetic regulation, QEA successfully balances exploration and exploitation in the search for optimal feature subsets. Across all four benchmark datasets—UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT—QEA consistently outperformed baseline methods, achieving the highest accuracy (up to 97.12%), the lowest false positive rates (as low as 1.68%), and selecting minimal yet highly discriminative feature sets (e.g., 18 features on TON_IoT). The algorithm’s ability to maintain low inference latency further reinforces its suitability for real-time intrusion detection systems.

However, despite these promising outcomes, several limitations must be acknowledged. First, the hybrid nature of QEA—especially the integration of quantum-inspired rotation and dynamic epigenetic masking—introduces additional computational complexity. While the algorithm remains efficient for medium-scale datasets, its scalability to vast datasets or high-speed streaming data remains an open challenge. Without acceleration techniques, such as parallel processing or hardware-level optimization, the runtime could become a bottleneck in real-time, high-throughput applications.

Second, although quantum computing principles inspire the QEA, the current implementation is purely classical. Simulating quantum behavior through probabilistic amplitudes offers benefits, but it does not fully leverage the computational power of actual quantum hardware. Real-world quantum deployment would require adapting the algorithm for noisy intermediate-scale quantum devices, which introduces technical hurdles such as gate fidelity, qubit coherence, and error correction.

Third, the evaluation was conducted on structured and publicly available benchmark datasets, which—despite their comprehensiveness—do not fully reflect the dynamic and noisy nature of real-world network traffic. Variables such as packet jitter, encrypted payloads, novel attack strategies, and data drift over time are not adequately captured in these datasets. Deploying QEA in operational environments would require extensive validation using live traffic and integration with streaming data pipelines.

Moreover, while QEA is designed to adapt to changing threat landscapes via epigenetic memory mechanisms, its behavior under continuous concept drift has not yet been empirically validated. In highly dynamic environments, such as critical infrastructure networks or smart city platforms, threats can evolve rapidly and unpredictably. Future work should explore online or incremental learning variants of QEA that can update feature selection policies in real-time.

Lastly, the interpretability of the QEA remains a consideration. Although the algorithm promotes compact models by selecting fewer features, the internal workings of the hybrid optimization process may be complicated for domain experts to interpret. As security-critical systems increasingly demand transparency and explainability, future iterations of QEA could benefit from incorporating interpretable AI techniques to provide more explicit justifications for selected features and decisions.

In summary, while the QEA offers a compelling approach for adaptive, accurate, and efficient feature selection in intrusion detection systems, practical deployment will depend on addressing these limitations. Scalability, real-time performance, interpretability, and integration with real-world infrastructure are key areas where further work is needed to transition this research from experimental validation to field-ready cybersecurity solutions.

8. Limitations of Proposed Framework

While the proposed QEA demonstrates promising results in feature selection for intrusion detection, it is important to acknowledge several limitations that may affect its broader applicability and real-world deployment.

First, although QEA achieves competitive accuracy and compactness, its computational cost is higher than that of simpler classical methods. The combined use of quantum-inspired probabilistic rotation and adaptive epigenetic regulation introduces additional overhead during the optimization process. For small- to medium-scale datasets, this trade-off is acceptable; however, for very large-scale or high-frequency streaming environments, the runtime may become a constraint unless parallel processing or hardware acceleration is employed.

Second, the current implementation of QEA is quantum-inspired but not executed on actual quantum hardware. The benefits of quantum principles, such as superposition and rotation, are simulated using classical computing resources. While this provides flexibility and accessibility, it also means that the algorithm does not currently exploit the full power or potential speedups of real quantum computing platforms. Deploying a native quantum version of QEA would require overcoming hardware limitations such as limited qubit availability, gate noise, and decoherence.

Third, the evaluation of QEA has been conducted using publicly available benchmark datasets, which, although widely accepted in the research community, may not fully reflect the complexity and unpredictability of real-world network traffic. Factors such as encrypted data streams, adversarial manipulation, and zero-day attack behaviors present in operational environments were not explicitly tested. As such, further validation on live or enterprise-scale datasets is necessary before the framework can be adopted in practice.

Another limitation lies in the framework’s ability to adapt over time. Although the epigenetic mechanism introduces a degree of learning from environmental feedback, the current QEA design assumes batch processing and does not yet support online or continual learning. In highly dynamic threat landscapes—such as those found in IoT networks or industrial control systems—this may hinder long-term performance as threat patterns evolve. Enhancing QEA with incremental or streaming learning capabilities is an important direction for future research.

Finally, interpretability remains a challenge. While QEA selects compact feature sets that enhance model transparency, the internal optimization process itself—driven by quantum-inspired and epigenetic dynamics—may not be easily understood by security analysts or compliance auditors. This could limit its adoption in sectors where transparency and explainability are required for operational trust or regulatory approval. Developing interpretable variants or integrating explanation models may help address this concern.

Although the QEA framework is designed with real-time intrusion detection in mind, it has not yet been validated on true streaming or live network traffic. All evaluations were conducted on static benchmark datasets. As such, its suitability for streaming scenarios remains an open question and will require integration with online learning mechanisms and temporal data handling in future work.

Furthermore, while QEA demonstrates a strong performance on datasets with up to 84 features, scalability to ultra-high-dimensional data (e.g., thousands of features) has not been fully evaluated. The algorithm’s reliance on probabilistic qubit updates and partial epigenetic feedback may help maintain efficiency; however, additional empirical studies on larger-scale feature spaces are necessary to confirm its robustness under such conditions.

In conclusion, while the proposed QEA demonstrates promising performance across multiple benchmark intrusion detection datasets, it is essential to acknowledge that the experimental results were obtained entirely within a simulated quantum-computing environment using Qiskit. Although such simulators are widely used for prototyping and algorithm development due to their accessibility and flexibility, they operate on classical hardware and do not fully capture the physical noise, decoherence, gate fidelity, and limited qubit connectivity inherent in current quantum hardware. As such, while the quantum-inspired mechanisms employed in QEA (e.g., superposition-based search and quantum rotation updates) were effectively simulated, their real-world execution on a quantum device may lead to different outcomes due to hardware-specific constraints. Therefore, the reported performance metrics—such as classification accuracy, false positive rates, and feature selection compactness—reflect the algorithm’s behavior under idealized conditions. Future works will aim to validate the algorithm on actual quantum devices, where practical limitations such as qubit decoherence, circuit depth, and quantum gate errors can be empirically assessed. This step is crucial for translating the current findings into practical deployments and ensuring robustness in real-world quantum environments.

9. Conclusions and Future Work

This paper presented a novel hybrid optimization framework of QEA for adaptive feature selection in cybersecurity threat detection. The proposed method effectively balances global exploration and local exploitation by integrating quantum-inspired probabilistic representation with biologically motivated epigenetic regulation. The algorithm’s dynamic activation mask mechanism enables reversible gene expression, enhancing resilience to premature convergence and improving adaptability to evolving attack patterns. Furthermore, a multi-objective fitness function was formulated to optimize the detection accuracy, FPR, inference latency, and several feature selections simultaneously. The experimental results were obtained on four benchmark datasets: UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and TON_IoT. The results consistently demonstrated that QEA outperformed the established evolutionary, swarm-based, and quantum-inspired baselines, including PSO, GA, EA, QGA, and QPSO, across all evaluation metrics. Notably, QEA achieved the highest classification accuracy while selecting significantly fewer features and maintaining low inference latency, thus validating its effectiveness for real-time detection.

Future work may extend this framework by incorporating online learning capabilities to further adapt to concept drifts in live networks or by exploring quantum-native implementations using actual quantum computing hardware. Additionally, applying QEA to other security domains, such as malware detection, could expand its applicability and reveal further advantages of this hybrid paradigm. Further, we aim to explore the integration of deep learning techniques, particularly Variational Autoencoders (VAEs), with our quantum–epigenetic framework. VAEs have shown strong potential in learning compact latent representations for anomaly detection, and combining them with adaptive feature selection may further enhance the detection accuracy in highly complex or nonlinear threat environments. This hybrid approach could offer a promising balance between representation learning and efficient, interpretable decision-making. In addition, the integration with explainable AI techniques (e.g., SHAP or LIME) enhances the transparency.

Author Contributions

Conceptualization, S.A.-E. and Y.S.; methodology, S.A.-E. and Y.S.; software, S.A.-E.; validation, Y.S. and S.F.; formal analysis, S.F.; investigation, Y.S.; resources, S.A.-E.; data curation, S.A.-E.; writing—original draft preparation, Salam Al-E’mari and Y.S.; writing—review and editing, S.F.; visualization, S.A.-E.; supervision, S.F.; project administration, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Y. A review of Machine Learning-based zero-day attack detection: Challenges and future directions. Comput. Commun. 2023, 198, 175–185. [Google Scholar] [CrossRef] [PubMed]
Ali, S.; Rehman, S.U.; Imran, A.; Adeem, G.; Iqbal, Z.; Kim, K.I. Comparative evaluation of ai-based techniques for zero-day attacks detection. Electronics 2022, 11, 3934. [Google Scholar] [CrossRef]
Lu, H.; Ma, Z.; Li, X.; Bi, S.; He, X.; Wang, K. TrafficHD: Efficient Hyperdimensional Computing for Real-Time Network Traffic Analytics. In Proceedings of the 61st ACM/IEEE Design Automation Conference, San Francisco, CA, USA, 23–27 June 2024; pp. 1–6. [Google Scholar]
Amiriebrahimabadi, M.; Mansouri, N. A comprehensive survey of feature selection techniques based on whale optimization algorithm. Multimed. Tools Appl. 2024, 83, 47775–47846. [Google Scholar] [CrossRef]
Freitas, D.; Lopes, L.G.; Morgado-Dias, F. Particle swarm optimisation: A historical review up to the current developments. Entropy 2020, 22, 362. [Google Scholar] [CrossRef] [PubMed]
Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle swarm optimization: A comprehensive survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
Mandal, A.K.; Chakraborty, B. Quantum computing and quantum-inspired techniques for feature subset selection: A review. Knowl. Inf. Syst. 2025, 67, 2019–2061. [Google Scholar] [CrossRef]
Vogt, G. Environmental adaptation of genetically uniform organisms with the help of epigenetic mechanisms—An insightful perspective on ecoepigenetics. Epigenomes 2022, 7, 1. [Google Scholar] [CrossRef] [PubMed]
Ong, Y.S.; Keane, A.J. Meta-Lamarckian learning in memetic algorithms. IEEE Trans. Evol. Comput. 2004, 8, 99–110. [Google Scholar] [CrossRef]
Hart, E.; Ross, P. A heuristic combination method for solving job-shop scheduling problems. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Amsterdam, The Netherlands, 27–30 September 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 845–854. [Google Scholar]
La Cava, W.; Spector, L. Inheritable epigenetics in genetic programming. In Genetic Programming Theory and Practice XII; Springer: Berlin/Heidelberg, Germany, 2015; pp. 37–51. [Google Scholar]
Miyamoto, T.; Furusawa, C.; Kaneko, K. Pluripotency, differentiation, and reprogramming: A gene expression dynamics model with epigenetic feedback regulation. PLoS Comput. Biol. 2015, 11, e1004476. [Google Scholar] [CrossRef] [PubMed]
Hertz, A.; Kobler, D. A framework for the description of evolutionary algorithms. Eur. J. Oper. Res. 2000, 126, 1–12. [Google Scholar] [CrossRef]
Li, A.; Mueller, A.; English, B.; Arena, A.; Vera, D.; Kane, A.E.; Sinclair, D.A. Novel feature selection methods for construction of accurate epigenetic clocks. PLoS Comput. Biol. 2022, 18, e1009938. [Google Scholar] [CrossRef] [PubMed]
Azman, N.S.; Samah, A.A.; Lin, J.T.; Majid, H.A.; Shah, Z.A.; Wen, N.H.; Howe, C.W. Support vector machine–Recursive feature elimination for feature selection on multi-omics lung cancer data. Prog. Microbes Mol. Biol. 2023, 6, a0000327. [Google Scholar] [CrossRef]
Saha, P.; Patikar, S.; Neogy, S. A correlation-sequential forward selection based feature selection method for healthcare data analysis. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; IEEE: New York, NY, USA, 2020; pp. 69–72. [Google Scholar]
Sharma, A.; Singh, M. Batch reinforcement learning approach using recursive feature elimination for network intrusion detection. Eng. Appl. Artif. Intell. 2024, 136, 109013. [Google Scholar] [CrossRef]
Putro, I.H.; Ahmad, T. Feature Selection Using Pearson Correlation with Lasso Regression for Intrusion Detection System. In Proceedings of the 2024 12th International Symposium on Digital Forensics and Security (ISDFS), San Antonio, TX, USA, 29–30 April 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: A survey of more than two decades of research. Knowl. Inf. Syst. 2024, 66, 1575–1637. [Google Scholar] [CrossRef]
Gheni, H.Q.; Oleiwi, W.K.; Al-Barmani, Z.; Alabdali, M.A. Optimizing feature selection for intrusion detection: A hybrid approach using cuckoo search and particle swarm optimization. Int. J. Saf. Secur. Eng. 2024, 14, 1907–1912. [Google Scholar] [CrossRef]
Subramani, S.; Selvi, M. Intrusion detection system and fuzzy ant colony optimization based secured routing in wireless sensor networks. Soft Comput. 2024, 28, 10345–10367. [Google Scholar] [CrossRef]
Kuo, S.Y.; Shen, J.Y.; Liu, C.L.; Chou, Y.H. Hybrid Quantum-inspired Evolutionary Neural Networks for Intrusion Detection System. In Proceedings of the 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Kuching, Malaysia, 6–10 October 2024; IEEE: New York, NY, USA, 2024; pp. 2801–2806. [Google Scholar]
Ezzarii, M.; El Ghazi, H.; El Ghazi, H.; El Bouanani, F. Epigenetic algorithm-based detection technique for network attacks. IEEE Access 2020, 8, 199482–199491. [Google Scholar] [CrossRef]
Ezzarii, M.; Elghazi, H.; El Ghazi, H.; Sadiki, T. Epigenetic algorithm for performing intrusion detection system. In Proceedings of the 2016 International Conference on Advanced Communication Systems and Information Security (ACOSIS), Marrakesh, Morocco, 17–19 October 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar]
Saad, H.M.; Chakrabortty, R.K.; Elsayed, S.; Ryan, M.J. Quantum-inspired genetic algorithm for resource-constrained project-scheduling. IEEE Access 2021, 9, 38488–38502. [Google Scholar] [CrossRef]
Mücke, S.; Heese, R.; Müller, S.; Wolter, M.; Piatkowski, N. Feature selection on quantum computers. Quantum Mach. Intell. 2023, 5, 11. [Google Scholar] [CrossRef]
Fagbo, O.O.; Adewusi, O.B.; Atakora, D.A.; Lawrence, T.S.; Olufemi, S.A.; Ezevillo, Z. Designing intelligent cyber threat detection systems through quantum computing. Int. J. Sci. Res. Arch. 2025, 14, 561–569. [Google Scholar] [CrossRef]
Dubey, V.; Shende, P.; Kumbhare, B.; Laxane, Y.B. Exploring AI Techniques for Quantum Threat Detection and Prevention. Indian J. Comput. Sci. Technol. 2025, 4, 8–12. [Google Scholar] [CrossRef]
Ramya, P.; Anitha, R.; Rajalakshmi, J.; Dineshkumar, R. Integrating Quantum Computing and NLP for Advanced Cyber Threat Detection. J. Cybersecur. Inf. Manag. 2024, 14, 186. [Google Scholar] [CrossRef]
Hossain, F.; Hasan, K.; Amin, A.; Mahmud, S. Quantum Machine Learning for Enhanced Cybersecurity: Proposing a Hypothetical Framework for Next-Generation Security Solutions. J. Technol. Inf. Commun. 2024, 4, 32222. [Google Scholar] [CrossRef]
Azeez, M.; Nenebi, C.T.; Hammed, V.; Asiam, L.K.; James, E. Developing intelligent cyber threat detection systems through quantum computing. Int. J. Sci. Res. Arch. 2024, 12, 1297–1307. [Google Scholar] [CrossRef]
Meghanath, A.; Das, S.; Behera, B.K.; Khan, M.A.; Al-Kuwari, S.; Farouk, A. QDCNN: Quantum Deep Learning for Enhancing Safety and Reliability in Autonomous Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2025, 1–11. [Google Scholar] [CrossRef]
Sanjalawe, Y.K.; Al-E’mari, S.R. Abnormal transactions detection in the ethereum network using semi-supervised generative adversarial networks. IEEE Access 2023, 11, 98516–98531. [Google Scholar] [CrossRef]
Awotunde, J.B.; Chakraborty, C.; Adeniyi, A.E. Intrusion detection in industrial internet of things network-based on deep learning model with rule-based feature selection. Wirel. Commun. Mob. Comput. 2021, 2021, 7154587. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Liu, H.; Motoda, H. Feature Selection for Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar]
BP, A.P.; Sunitha, N. A Literature Review on Prominent Datasets Used to Test the Cyber Attack Detection Accuracy of Machine Learning Techniques. In Proceedings of the 2024 International Conference on Cybernation and Computation (CYBERCOM), Dehradun, India, 15–16 November 2024; IEEE: New York, NY, USA, 2024; pp. 98–101. [Google Scholar]
Bouke, M.A.; Abdullah, A.; ALshatebi, S.H.; Abdullah, M.T. E2IDS: An enhanced intelligent intrusion detection system based on decision tree algorithm. J. Appl. Artif. Intell. 2022, 3, 1–16. [Google Scholar] [CrossRef]
Gao, J.; Chai, S.; Zhang, B.; Xia, Y. Research on network intrusion detection based on incremental extreme learning machine and adaptive principal component analysis. Energies 2019, 12, 1223. [Google Scholar] [CrossRef]
Al-E’mari, S.; Sanjalawe, Y.; Fraihat, S. Detection of obfuscated tor traffic based on bidirectional generative adversarial networks and vision transform. Comput. Secur. 2023, 135, 103512. [Google Scholar] [CrossRef]
Wani, A.A. Advancing Material Stability Prediction: Leveraging Machine Learning and High-Dimensional Data for Improved Accuracy. Mater. Sci. Appl. 2025, 16, 79–105. [Google Scholar] [CrossRef]
Chen, L.; Xu, Y.; Xu, F.; Hu, Q.; Tang, Z. Balancing the trade-off between cost and reliability for wireless sensor networks: A multi-objective optimized deployment method. Appl. Intell. 2023, 53, 9148–9173. [Google Scholar] [CrossRef]
Liu, Y.; Cao, S. The analysis of aerobics intelligent fitness system for neurorobotics based on big data and machine learning. Heliyon 2024, 10, e33191. [Google Scholar] [CrossRef] [PubMed]
Tang, C.; Luktarhan, N.; Zhao, Y. An efficient intrusion detection method based on LightGBM and autoencoder. Symmetry 2020, 12, 1458. [Google Scholar] [CrossRef]
Salam, A.E.; Yousef, S.; Duha, A.; Eman, A.; Alyaa, A. Employing Mutual Information Feature Selection and LightGBM for Intrusion Detection in IoT. ICIC Express Lett. 2024, 18, 597. [Google Scholar]
Han, K.H.; Kim, J.H. Quantum-inspired evolutionary algorithm for a class of combinatorial optimization. IEEE Trans. Evol. Comput. 2002, 6, 580–593. [Google Scholar] [CrossRef]
Wang, Z.; Li, M.; Li, J. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inf. Sci. 2015, 307, 73–88. [Google Scholar] [CrossRef]
Prechelt, L. Early stopping-but when? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2002; pp. 55–69. [Google Scholar]
Azzam, M.; Zeaiter, J.; Awad, M. Towards a quantum based ga search for an optimal artificial neural networks architecture and feature selection to model nox emissions: A case study. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19—24 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–8. [Google Scholar]
Wu, Q.; Ma, Z.; Fan, J.; Xu, G.; Shen, Y. A feature selection method based on hybrid improved binary quantum particle swarm optimization. IEEE Access 2019, 7, 80588–80601. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the EA workflow.

Figure 2. Workflow diagram of the proposed QEA for feature selection.

Figure 3. Accuracy comparison of every feature selection technique with all the datasets.

Figure 4. F1-score comparison across all feature selection methods and datasets.

Figure 5. FPR comparison across all feature selection methods and datasets.

Figure 6. Latency comparison in milliseconds per prediction sample.

Figure 7. Number of features selected by each method.

Figure 8. Convergence curves of feature selection methods over 100 iterations.

Figure 9. Ablation study—performance comparison (accuracy and F1-Score) on CSE-CIC-IDS2018.

Figure 10. Ablation study—model compactness (number of selected features).

Table 1. Comparison between Genetic algorithm and Epigenetic algorithm.

Feature	GA	EA
Genes	Fixed throughout the process	Gene expression changes dynamically
Mutation	Random	Guided based on individual’s performance
Exploitation	Lower exploitation capability	Higher exploitation due to environmental adaptation
Exploration	High	Similar, but more focused and adaptive

Table 2. Comparative analysis of related works on quantum and epigenetic cybersecurity approaches.

Study	Method	Domain	Quantum?	Adaptive? *	Key Limitation
[23]	EGA	IDS	No	Yes	Lacks scalability; classical processing constraints
[24]	Epigenetically Guided Genetic Optimization	IDS	No	Yes	Static gene modulation; sensitive to dataset characteristics
[25]	QIGA	Resource Scheduling	Yes	Yes	Not tailored to cybersecurity contexts
[26]	QUBO-Based Feature Selection via Quantum Annealing	Feature Selection in Security Systems	Yes	No	Offline adaptation only; lacks dynamic learning
[27]	Quantum Search and Factoring Algorithms (Grover’s, Shor’s)	Cyber Threat Detection Framework	Yes	Yes	Integration and hardware complexity
[28]	AI–Quantum Hybrid Learning	Real-Time Threat Detection	Yes	Yes	Scalability constraints; high infrastructure demands
[29]	Quantum Optimization + NLP	Threat Intelligence Parsing	Yes	Yes	Dependency on high-quality labeled NLP datasets
[30]	QML with QNN, QSVM, QKD, QRL	Anomaly Detection and Response	Yes	Yes	Interpretability and quantum hardware readiness
[31]	QSVM + QNN	Real-Time Cybersecurity for Critical Infrastructure	Yes	Yes	Integration cost; specific to high-performance systems
[32]	QDCNN	Cyber–Physical Threat Detection	Yes	Yes	Application initially targeted at autonomous systems

* Adaptive refers to the system’s ability to modify behavior based on new data or threats through learning or evolutionary processes.

Table 3. Parameter settings used in the QEA for feature selection.

Parameter	Value	Description
Population size	30	Number of quantum individuals in each generation
Maximum generations	50	Maximum number of iterations allowed during evolution
Qubit representation	$(α, β)$	Probability amplitudes used to encode selection decisions for each feature
Rotation angle	$Δ θ = 0.05 π$	Step size used in the quantum rotation gate update rule
Fitness function	Accuracy/F1-score	Metric used to evaluate individuals during evolution
Epigenetic memory window	5 generations	Number of past generations used to evaluate gene usefulness
Termination criterion	No improvement in 10 generations	Early stopping condition based on stagnation of best fitness

Table 4. Comparison of benchmark datasets.

Dataset	Total Records	Original Features	Attack Categories
UNSW-NB15	2,540,044	49	9
CIC-IDS2017	3,000,000+	80	7
CSE-CIC-IDS2018	16,000,000+	84	15
TON_IoT	366,000+	44	10

Table 5. Impact of fitness weight configurations on optimization behavior.

$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	Optimization Focus	Expected Behavior
0.6	0.2	0.1	0.1	Accuracy-oriented	Prioritizes correct classification; may allow high latency or large feature sets.
0.4	0.4	0.1	0.1	Accuracy + False Positive-Balanced	Tries to reduce both false alarms and maximize accuracy. Good for alert-sensitive systems.
0.3	0.3	0.2	0.2	Balanced trade-off	Equal emphasis on detection quality and efficiency (speed + compactness).
0.3	0.2	0.1	0.4	Compactness-focused	Strong penalty for selecting many features. Useful for lightweight or embedded IDS.
0.25	0.25	0.25	0.25	Uniform weights	Treats all objectives equally; useful baseline to test pure algorithmic behavior.

Table 6. QEA hyperparameter configuration for feature selection.

Parameter	Value/Description
Population size (N)	30 individuals
Maximum iterations (T)	100 generations
Quantum rotation angle ( $δ$ )	0.05 radians
Epigenetic learning rate ( $η$ )	0.1
Fitness weights ( $λ_{1}$ – $λ_{4}$ )	0.5 (Accuracy), 0.3 (FPR), 0.1 (Latency), 0.1 (Compactness)
Early stopping criterion	No improvement in 10 consecutive iterations
Classifier used	SVM with RBF kernel
Cross-validation	Five-fold stratified CV on training data
Latency measurement	Mean inference time per sample (Python time module)

Table 8. Hyperparameters used for feature selection algorithms.

Algorithm	Hyperparameter	Typical Value/Description
QGA	Population size	30
	Max iterations	100
	Quantum rotation angle ( $δ$ )	0.01–0.05 radians
QPSO	Population size	30
	Max iterations	100
	$β$ (contraction-expansion coefficient)	0.5–1.0
GA	Population size	30
	Max iterations	100
	Crossover probability	0.8
	Mutation probability	0.01–0.05
PSO	Population size	30
	Max iterations	100
	Inertia weight (w)	0.5–0.9
	Acceleration coefficients ( $c_{1}$ , $c_{2}$ )	1.5–2.0
EA	Population size	30
	Max iterations	100
	Epigenetic threshold ( $θ$ )	0.05
	Expression vector update rule	Fitness-based binary activation/deactivation

Table 9. Confusion matrix for UNSW-NB15 dataset.

	Predicted: Normal	Predicted: Attack
Actual: Normal	13,212	262
Actual: Attack	308	12,218

Table 10. Confusion matrix for CIC-IDS2017 dataset.

	Predicted: Normal	Predicted: Attack
Actual: Normal	13,578	215
Actual: Attack	267	11,940

Table 11. Confusion matrix for CSE-CIC-IDS2018 dataset.

	Predicted: Normal	Predicted: Attack
Actual: Normal	12,902	297
Actual: Attack	231	12,570

Table 12. Confusion matrix for TON_IoT dataset.

	Predicted: Normal	Predicted: Attack
Actual: Normal	12,805	392
Actual: Attack	365	12,438

Table 13. Classification performance of the proposed QEA model across four benchmark datasets.

Dataset	Accuracy	FPR	FNR	Precision	Recall	F1-Score
UNSW-NB15	97.81%	1.94%	2.46%	97.90%	97.54%	97.72%
CIC-IDS2017	98.15%	1.56%	2.19%	98.23%	97.81%	98.02%
CSE-CIC-IDS2018	97.97%	2.25%	1.80%	97.69%	98.20%	97.94%
TON_IoT	97.09%	2.97%	2.85%	96.94%	97.15%	97.05%

Table 14. Paired t-test p-values comparing QEA with baseline methods (accuracy and F1-Score).

Comparison	p-Value (Accuracy)	p-Value (F1-Score)
QEA vs. PSO	0.0012 *	0.0025 *
QEA vs. GA	0.0008 *	0.0019 *
QEA vs. EA	0.0005 *	0.0011 *
QEA vs. QGA	0.0124 *	0.0216 *
QEA vs. QPSO	0.0097 *	0.0178 *

* Statistically significant at 95% confidence level (

p < 0.05

).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-E’mari, S.; Sanjalawe, Y.; Fraihat, S. A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection. AI 2025, 6, 165. https://doi.org/10.3390/ai6080165

AMA Style

Al-E’mari S, Sanjalawe Y, Fraihat S. A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection. AI. 2025; 6(8):165. https://doi.org/10.3390/ai6080165

Chicago/Turabian Style

Al-E’mari, Salam, Yousef Sanjalawe, and Salam Fraihat. 2025. "A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection" AI 6, no. 8: 165. https://doi.org/10.3390/ai6080165

APA Style

Al-E’mari, S., Sanjalawe, Y., & Fraihat, S. (2025). A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection. AI, 6(8), 165. https://doi.org/10.3390/ai6080165

Article Menu

A Novel Quantum Epigenetic Algorithm for Adaptive Cybersecurity Threat Detection

Abstract

1. Introduction

2. Background on Epigenetic Algorithm

3. Related Work

4. Proposed Quantum Epigenetic Algorithm for IDSs

4.1. Initialization Phase

4.2. Evolution Phase

4.3. Epigenetic Regulation

4.4. Quantum Rotation

4.5. Fitness Function

5. Experimental Setup

5.1. Dataset and Preprocessing

5.2. Fitness Function

5.3. Feature Selection Based on QEA

5.4. Implementation Details

6. Results and Discussion

6.1. Confusion Matrix and Statistical Performance Analysis

6.2. Accuracy

6.3. F1-Scores

6.4. FPR

6.5. Latency

6.6. Feature Selection

6.7. Convergence

6.8. Ablation Study

6.9. Statistical Significance Testing

7. Interpretation and Discussion

8. Limitations of Proposed Framework

9. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI