Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection

Nagunwa, Thomas

doi:10.3390/analytics3030019

Open AccessArticle

Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection

by

Thomas Nagunwa

Department of Computer Science, Institute of Finance Management, Dar Es Salaam 11101, Tanzania

Analytics 2024, 3(3), 344-367; https://doi.org/10.3390/analytics3030019

Submission received: 17 May 2024 / Revised: 19 June 2024 / Accepted: 15 July 2024 / Published: 6 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

The increasing number, frequency, and sophistication of phishing website-based attacks necessitate the development of robust solutions for detecting phishing websites to enhance the overall security of cyberspace. Drawing inspiration from natural processes, nature-inspired metaheuristic techniques have been proven to be efficient in solving complex optimization problems in diverse domains. Following these successes, this research paper aims to investigate the effectiveness of metaheuristic techniques, particularly Genetic Algorithms (GAs), Differential Evolution (DE), and Particle Swarm Optimization (PSO), in optimizing the hyperparameters of machine learning (ML) algorithms for detecting phishing websites. Using multiple datasets, six ensemble classifiers were trained on each dataset and their hyperparameters were optimized using each metaheuristic technique. As a baseline for assessing performance improvement, the classifiers were also trained with the default hyperparameters. To validate the genuine impact of the techniques over the use of default hyperparameters, we conducted statistical tests on the accuracy scores of all the optimized classifiers. The results show that the GA is the most effective technique, by improving the accuracy scores of all the classifiers, followed by DE, which improved four of the six classifiers. PSO was the least effective, improving only one classifier. It was also found that GA-optimized Gradient Boosting, LGBM and XGBoost were the best classifiers across all the metrics in predicting phishing websites, achieving peak accuracy scores of 98.98%, 99.24%, and 99.47%, respectively.

Keywords:

phishing website detection; ensemble classifiers; nature-inspired metaheuristic techniques; genetic algorithm; differential evolution; particle swarm optimization

1. Introduction

The consistent increase in phishing websites in the current digital landscape presents a significant risk to individuals, organizations, and overall cybersecurity. Phishing attacks, which involve fraudulent efforts to acquire sensitive information, such as usernames, passwords, and financial data, are growing in number, frequency, and sophistication, posing a formidable challenge to cybersecurity. Recent statistics highlight the severity of the threat, showing a staggering increase in phishing incidents and substantial financial losses suffered by victims globally. According to APWG [1] and APWG [2], the number of new unique phishing websites per month jumped to 619,060 in 2023 from 92,564 in 2016, an increase of 569%. Meanwhile, APWG [1] observed over 1.6 million phishing attacks in the first quarter of 2023, the highest number the APWG has ever recorded. Furthermore, the Federal Bureau of Investigation (FBI) reported that they received 11 times as many complaints from phishing victims in 2020 compared to 2016 [3]. Moreover, IBM [4] has reported that the average global cost of data breaches for organizations due to phishing is USD 4.76 million [5]. This represents a significant financial burden on individuals, businesses, and economies alike. In addition, other substantial impacts of phishing website attacks include damage to the reputation of businesses [6,7,8,9,10], theft of proprietary and confidential information [11,12,13], distribution of other cyberattacks [14,15,16,17], and interference with democracy [18,19,20,21].

In light of the ever-growing threat landscape and the severe consequences associated with phishing website attacks, it is critical to develop highly accurate solutions for detecting and mitigating phishing websites. ML has become a powerful tool in the realm of phishing website prediction, leveraging features and patterns derived from the website’s structure, content, and user actions to identify malicious intent. Nevertheless, the efficacy of ML models is significantly impacted by the optimization of hyperparameters, which govern the learning process and influence the model’s prediction accuracy and generalization capability. Traditional hyperparameter optimization techniques depend on manual tuning or grid search methods, which entail significant time and computational investment, and may not yield optimal results.

To address these challenges, nature-inspired metaheuristic techniques, including GAs, DE, and PSO, present viable alternatives by drawing inspiration from behaviors observed in biological and social systems, natural selection, and mutation. These techniques have shown remarkable achievements in optimizing complex functions and solving combinatorial optimization problems in various domains, including financial modelling, engineering design, and logistics, among others. For example, Da Silva [22] applied GAs to a logistic engineering problem to enhance the efficiency of a supply chain; Akbari, et al. [23] used DE to optimize the production of electricity with wind renewable energy; and Chen, et al. [24] showcased the utility of PSO in optimizing financial investment and portfolio strategies for investors.

By leveraging the intrinsic advantages of these techniques, this research aims to conduct a comparative analysis that assesses the effectiveness of GAs, DE, and PSO in optimizing the hyperparameters of ML models for phishing website detection. The goal is to shed light on their potential applications in enhancing the robustness of phishing website detection solutions. To achieve the goal, six ensemble classifiers are trained and evaluated on three publicly available datasets, with the hyperparameters of each one optimized using each optimization technique. To establish the baseline for the comparative analysis, each classifier is also trained and evaluated using default hyperparameters. Statistical significance tests are carried out to assess the impact of the techniques on the performance of the classifiers against the use of default hyperparameters, enabling us to identify the most effective technique in this context. To our knowledge, this is the first comparative study of these metaheuristic techniques for tuning ML models for this problem.

The rest of the paper is arranged as follows. Section 2 provides the background on nature-inspired optimization techniques, particularly GAs, DE, and PSO. Section 3 explains the related works, while Section 4 describes the methodology adopted for this study. Section 5 explains the experiments and presents the results along with the analyses, discussions, and statistical significance tests. Section 6 concludes the paper by revisiting the methodology and the results and highlights future work.

2. Natured-Inspired Metaheuristic Optimization Techniques

In regard to optimization, the pursuit of efficient and effective solutions to complex problems has led researchers to examine the natural world for inspiration. Nature has long been a source of inspiration for problem-solving, providing sophisticated solutions to problems by showcasing adaptation, cooperation, and self-organization observed in biological, physical, and social systems [25]. Natural processes have inspired a variety of metaheuristic techniques that are used to solve optimization challenges in different fields by mimicking these processes. There are three categories of nature-inspired metaheuristic techniques, as follows:

Evolutionary algorithms: Evolutionary Algorithms (EAs) are based on biological evolution principles, imitating natural selection to gradually improve candidate solutions towards optimal or nearly ideal outcomes. GAs and DE are two prominent types of EAs that provide distinct methods for optimization [26];
Swarm intelligence algorithms: They draw inspiration from the collective behaviors seen in social insects, such as ants and bees, as well as animal groups, such as bird flocks and fish schools. These algorithms utilize the collaboration and interaction among members of a group to seek the best solutions. Examples, such as PSO and Ant Colony Optimization (ACO), are derived from various features of swarm behavior [27];
Physics-inspired metaheuristic algorithms: These replicate physical phenomena and fundamental concepts in pursuit of optimal solutions. The algorithms emulate processes, such as gravitational forces, electromagnetic fields, and thermodynamic fluctuations, to efficiently explore solution spaces. Examples of such algorithms are the Gravitational Search Algorithm (GSA) and Simulated Annealing (SA) [28].

The techniques have demonstrated success in optimizing complex functions, thereby addressing optimization challenges in various domains, such as supply chain management [22], engineering design [29], machine learning tasks [30], investment portfolio management [24], scheduling and resource allocation [31], and drug discovery [32]. In light of these successes, this study aims to deploy a selected number of these techniques, based on their long proven track record, to evaluate their effectiveness in optimizing ML models for the prediction of phishing websites. The selected techniques used in this study, namely GAs, DE, and PSO, are described below in detail.

2.1. Genetic Algorithms

Among EAs, GAs stand out for being a prominent and adaptable optimization technique. They are potent heuristic search and optimization techniques that draw inspiration from natural selection and genetics. Introduced by Holland in the 1970s, GAs are known for their efficiency in navigating large solution spaces and identifying nearly optimal solutions for complex optimization challenges in diverse fields [33].

The algorithms are based on the concept of evolution, which involves selection, crossover, and mutation, mirroring biological systems [34,35]. The algorithm works on a group of possible solutions, depicted as individuals or chromosomes, and iteratively evolves these solutions across generations to improve their fitness for the optimization problem. Through this process, GAs efficiently search for global solutions by balancing exploration and exploitation of the search space, making them a valuable tool in tasks such as problem-solving, optimization, and machine learning. Figure 1 presents a flow chart of the GA optimization process.

The optimization process of GAs can be described as follows [35]:

Initialization: To start, initialize a population P consisting of N individuals. This involves generating a series of random real values for P = {p₁, p₂, …., p_N}, each representing a possible solution to the optimization problem;
Evaluation: Evaluate the fitness f(p) of each individual p in the population by assessing the quality of the solution based on the objective function;
Selection: Choose individuals from the population for breeding according to their fitness. Individuals with higher fitness are more likely to be chosen, similar to natural selection favouring individuals with greater reproductive success. Typical selection methods include rank-based selection, roulette wheel selection, and tournament selection;
Crossover: Combine the selected individuals to generate offspring solutions by recombination or crossover. This approach imitates genetic recombination in biological reproduction by combining traits from parent individuals to create offspring. Various crossover operators, including uniform crossover, multi-point crossover, and single-point crossover, can be used;
Mutation: This involves introducing random modifications to offspring solutions to preserve diversity and explore new areas of the solution space. Mutation aids in avoiding premature convergence and guarantees that the algorithm maintains effective exploration of the search space. Mutation operators usually alter specific bits or components of the solution representation by flipping or modifying them;
Replacement: Substituting individuals in the population with the offspring solutions, usually with elitist or tournament selection methods, to maintain high-quality solutions. Elitism ensures that the most optimal solutions within the current population are passed on to the next generation, thus avoiding the loss of valuable information;
Termination: Repeat steps 2–6 for a set number of generations or until a termination condition is satisfied, including reaching a maximum iteration limit, attaining an acceptable solution, or a lack of improvement over a consecutive generation.

2.2. Differential Evolution

DE is one of the most popular and versatile metaheuristic techniques within EAs. DE was developed in 1997 by Storn and Price, as a heuristic search algorithm influenced by natural evolution processes in biological systems [36]. DE is based on genetic inheritance, mutation, and selection principles, allowing it to efficiently explore complex solution spaces and discover nearly optimal solutions for difficult optimization problems. DE shares evolutionary concepts with GAs and other algorithms, but stands apart due to its differential mutation technique, which enhances the exploration of the solution space and adaption to local optima [37]. DE operates on solution vectors in continuous space, making it suitable for optimization problems with real-valued parameters, unlike GAs that use genetic operators on binary strings.

Below is the methodology that the DE uses to optimize a given problem [36]:

i.: Initialization: DE starts with the initialization of a population of candidate solutions randomly selected from the search space. Every candidate solution is a possible solution to the current optimization problem. The potential solutions for each generation G can be represented as:

x_i,G i = 1, 2,.…, NP

(1)

ii.: Mutation: DE is characterized by its unique mutation approach, in which new potential solutions (called offspring) are created by combining current solutions. Three unique individuals (vectors) are randomly selected from the population for each candidate solution x_i. The individuals are identified as x_r₁, x_r₂, and x_r₃. The mutant vector v_i is calculated by perturbing x_r₁ using a scaled differential variation of the difference between x_r₂ and x_r₃. The operation is represented as:

v_i,G+1 = x_r1,G + F(x_r2,G − x_r3,G)

(2)

where F is a scaling factor that controls the amplification of the differential variation;

iii.: Crossover: DE uses a crossover operation to merge information from the target vector x_i,G with the mutant vector v_i,G to create a trial vector u_i,G+1. The crossover process is binomial, with each element of the trial vector being derived from either the mutant vector with a probability CR or from the target vector. This operation is mathematically expressed as:

u_{i j, G + 1} = \{\begin{matrix} v_{i j, G + 1}, i f (r a n d b (j) \leq C R) o r j = r n b r (i) \\ x_{i j, G}, i f (r a n d b (j) > C R) a n d j \neq r n b r (i) \end{matrix}

(3)

In this context, j represents the component index, D indicates the dimensionality of the problem, randb() generates a random number within the range of 0 to 1, and rnbr(i) is a randomly selected index from 1, 2….., D;

iv.: Selection: A selection method is used to decide whether to keep the trial vector u_i,G+1 or the original target vector x_i,G for the following generation. This choice is based on comparing the fitness values of u_i,G+₁ and x_i,G. If the trial vector shows better fitness than the goal vector, it will take the place of the target vector in the population;
v.: Termination: The mutation, crossover, and selection phases are repeated in a loop for a set number of generations, or until a specific termination condition is satisfied. Typical stopping conditions are either reaching a certain number of iterations or attaining an acceptable solution.

2.3. Particle Swarm Optimization

PSO is a metaheuristic algorithm that is population-based, self-adaptive, and stochastic. It is inspired by the social behavior of swarms, such as bird flocks or fish schools [38]. The goal is to identify the best solutions for intricate problems by continuously modifying the positions and velocities of particles that represent potential solutions [39]. The algorithm keeps track of the global best positions and fitness values, and updates individual particle positions according to personal trajectories and the best solutions [40]. The adaptability of PSO is enhanced by its capability to combine with other algorithms and tackle real-world problems through parallel implementations. PSO’s iterative optimization method and its foundation in swarm intelligence make it a great tool for efficiently solving optimization challenges.

The PSO algorithm calculates the optimal function value for each particle by determining its best position. The algorithm calculates new velocities for each particle based on its present velocity, optimal position, and the optimal position of its neighbors. It periodically alters the positions, velocities, and neighbors of particles to keep them within certain bounds. The algorithm proceeds until it reaches a designated endpoint [41]. Optimization is the process of identifying the best values for specific system features to fulfill design goals with minimal expense.

PSO represents each solution as a “bird/particle” within a search space. The particles move collectively to find the best position. The particle holds both the position and velocity vectors in a problem set within an n-dimensional space. A connection exists between the particle’s position and its range of mobility, which is utilized to describe the movement patterns [42]. Particle j’s position is designated as X_j = (X_j₁, X_j₂…….X_jn) and its velocity is represented as V_j = (V_j₁, V_j₂, …V_jn). Equations (4) and (5), below, update the velocity and position of the particles [43].

V_ij^{k + 1} = V_ij^k + C_p Γ_p (X_pbestij^k − X_ij^k) + C_g Γ_g (X_gbestij^k − X_ij^k)

(4)

X_ij^{k + 1} = X_ij^k + V_ij^{k + 1}, i = (1, 2,….N) and j = (1, 2,…..P)

(5)

where X_pbestij = personal best of the particle;

X_gbest = global best position of the group;
C_g,C_p = social and cognitive acceleration of the coefficient;
N = total number of variables;
r = uniformly distributed random number;
V_ij^{k + 1} = velocity of jth particle of ith variable at k + 1th iteration;
X_ij^{k + 1} = value of the position of jth particle of ith variable at kth iteration.

To enhance the PSO convergence rate, it is recommended to incorporate the inertia weight into the velocity in the initial equation. The calculated velocity of the particle is adjusted by including the weight inertia, as shown below.

V_ij^{k + 1} W ∗ V_ij^k + C_p Γ_p (X_pbestij^k − X_ij^k) + C_p Γ_p (X_pbestij^k − X_ij^k)

(6)

where W represents the inertial weight [44].

3. Related Work

Optimizing the performance of ML algorithms relies on a number of crucial steps, with hyperparameter tuning being one of them. Hyperparameters, apart from the model’s trained parameters, control the learning process and have a substantial influence on the model’s accuracy and generalization capability. Nature-inspired metaheuristic techniques are becoming more popular for this task, since they are good at exploring the large hyperparameter space and finding areas where the best configurations might be found. They also maintain a balance between exploration and exploitation to prevent premature convergence and find the best hyperparameter configurations globally [37,45].

Taking advantage of this, several researchers have deployed these techniques for the task in the context of phishing website prediction. Al-Sarem, et al. [46], for instance, compared the performance of six tree-based algorithms in classifying phishing websites, where the hyperparameters were optimized with GAs against the default hyperparameters. After evaluating the performance on three publicly available phishing website datasets, they observed that the Random Forest classifier yields the highest accuracy of 97.02% and 95.15% in datasets 1 and 3, respectively, while the LGBM classifier obtained the best accuracy of 98.65% in dataset 2, when both algorithms were optimized with GAs. A study by Stobbs, et al. [47] compared the performance of a Tree-structured Parzen Estimator (TPE), a Bayesian optimization-based algorithm, and a GA in tuning the hyperparameters of four traditional ML algorithms for classifying phishing websites. Using two datasets created by the authors, each with 28 features based on URL and website properties, they found that the Random Forest classifier performed the best, with an accuracy of 99.33%, when performing feature selection with PSO and hyperparameter tuning with TPE.

Almousa, et al. [48] proposed and assessed the performance of three deep learning-based models for the detection of phishing websites. Each model was trained and evaluated on three publicly available datasets and one that was created by the researchers. In each case, three scenarios were also applied in relation to hyperparameter tuning: (1) evaluation of the models with the default hyperparameters, (2) hyperparameter tuning using grid search, and (3) hyperparameter tuning using GAs. The results showed that the accuracy rate of each model was improved by up to 1% when grid search and GA methods were used, with the former slightly outperforming the latter. Pavan Kumar, et al. [49] applied a swarm intelligence-based BAT optimization algorithm to optimize a convolution neural network in predicting phishing websites. Using a single publicly available phishing dataset consisting of 30 URL and webpage-based features, the study used the algorithm to tune the CNN’s optimizer parameters, in which the best accuracy of 94.8% was obtained with Adam. Alqahtani, et al. [50] developed a deep autoencoder-based classifier for predicting phishing websites. A benchmark dataset of phishing and legitimate URLs was used to train and evaluate the model. Invasive Weed Optimization (IWO), an algorithm inspired by the unique properties of weed growth, was used to tune the hyperparameters of the model, optimizing the model to an optimal accuracy of 99.28%. Despite the great efforts made by these studies, they lack comparative analysis of several nature-inspired metaheuristic techniques to establish experimental evidence of the efficacy of these techniques in optimizing the hyperparameters of ML models in regard to this problem.

4. Methodology

In this section, the methodology used to evaluate the performance of GAs, DE, and PSO in optimizing the hyperparameters of ML models for predicting phishing websites is presented. To establish consistency in the performance of the algorithms, six ensemble classifiers, namely Random Forest, Gradient Boosting, LGBM, XGBoost, CatBoost, and ExtraTrees, were selected and used as classifiers for the training and evaluation processes. These tree-based classifiers were selected for two reasons: (1) significant percentages of outliers were observed in our datasets. Instead of deleting the outliers, which would have substantially decreased the dataset sizes, we opted to use classifiers that are insensitive to outliers. (2) The classifiers performed well in a similar study [51]. To produce a baseline level of performance for observing the impact of the metaheuristic techniques on the performances of the classifiers, we also train and evaluate the classifiers without hyperparameter optimization.

Figure 2 below summarizes the methodology, consisting of four steps: data retrieval and partitioning and data pre-processing; classifier training and hyperparameter optimization and classifier evaluation; statistical significance testing on the impact of the metaheuristic techniques; and performance ranking of the techniques. First, three benchmark datasets consisting of various types of features that distinguish phishing and legitimate websites were retrieved from their respective online repositories (see details in Section 5.1), before partitioning each one of them into training and testing datasets. Each training dataset is then pre-processed for feature selection and transformation and fed into each of the classifiers for modelling. The hyperparameters of each resulting classifier are then tuned using each of the three metaheuristic techniques, except in one scenario where the default hyperparameters are maintained. Next, each optimized classifier is evaluated with the testing data from each dataset and its performance is reported using multiple metrics. To validate the effectiveness of the metaheuristic techniques, we measured the statistical significance of the differences in the performance of all the optimized classifiers on the three datasets against the classifiers with default hyperparameters. From the test results, the metaheuristic techniques are ranked based on the number of classifiers they have genuinely impacted.

5. Experiments and Results

A total of 72 experiments were set up to train the classifiers and evaluate the metaheuristic techniques, i.e., six classifiers were trained and optimized, using the three metaheuristic techniques and with the default hyperparameters, on each of the three datasets. In order to evaluate and compare the performance across all the experiments, the performance measures accuracy, precision, recall, and F1 score were used. These measures are commonly used in evaluating the performance of ML models [52,53,54]. They are defined as follows:

A c c u r a c y = T P + T N / T P + T N + F P + F N

(7)

P r e c i s i o n = T P / T P + F P

(8)

R e c a l l = T P / T P + F N

(9)

F 1 s c o r e = 2 * P r e c i s i o n * R e c a l l / P r e c i s i o n + R e c a l l

(10)

The above measures are derived from the counts of true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs). A positive instance of this problem is a phishing website.

Python version 3.11 was used to develop codes for all the experiments, along with libraries including sklearn, lightgbm, xgboost, catboost, sklearn_genetic (for implementing GAs), scipy.optimize (for implementing DE), and pyswarm (for implementing PSO). The experiments were run on a machine with Windows 11, 16 GB memory, and Intel’s i7 processor specifications.

5.1. Datasets

To train the classifiers and evaluate the metaheuristic techniques for predicting phishing websites, three publicly available datasets were used. The reason for using multiple datasets in this study is to assess the generalizability of the classifiers and the metaheuristic techniques, since different datasets may exhibit different characteristics, such as varying noise levels, class imbalances, or feature distributions, which significantly influence a model’s performance. This approach provides assurances that the performance of a model is not confined to the specific peculiarities of the dataset [55]. Additionally, employing multiple datasets aligns the evaluation with real-world scenarios. In practical applications, ML models are deployed in environments where the data characteristics may vary over time or across different contexts. Thus, testing the models and the metaheuristic techniques on diverse datasets mirrors real-world scenarios more accurately and provides a more reliable assessment of their performance in practical applications [56].

The three selected datasets are labelled as D1, D2, and D3 for datasets 1, 2, and 3, respectively. D1 has a total of 58,645 instances, of which 27,998 are labelled as legitimate webpages and 30,647 as phishing webpages. D2 has a total of 88,647 instances, of which 58,000 are labelled as legitimate webpages and 30,647 are phishing webpages. Both D1 and D2 consist of the same 111 features extracted from phishing and legitimate URLs. The features were extracted from the properties of a whole URL (20 features), the URL’s domain (21 features), the URL’s directory (18 features), the URL’s file name (18 features), the URL parameters (20 features), and from resolving URL and external services (16 features). Both datasets were created, published and documented by Vrbančič, et al. [57] and made available on GitHub (https://github.com/GregaVrbancic/Phishing-Dataset). D3 has 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. The features were generated and extracted from the URL and webpage structures of each webpage. D3 is available on GitHub (https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-learning/data). Figure 3a–c summarizes the distribution of phishing and legitimate webpages in the datasets.

5.2. Data Pre-Processing

This step plays a significant role in transforming raw data into a complete, consistent, and accurate form, so that ML algorithms can efficiently learn the data to build robust prediction models [58,59]. First, a small number of negative values observed in all the datasets were replaced with null values. We then dropped the features with more than 50% of null values (20 features in D1, 56 in D2, and none in D3). Using a variance threshold of 0.05%, low variance features were dropped in all the datasets (50 features in D1, 29 in D2, and 14 in D3). A substantial percentage of outliers was detected in the datasets (5.9% in D1, 7.1% in D2, and 8.2% in D3). Instead of dropping the outliers, we opted to use ensemble classifiers to counter their effect. To balance the sample size of the datasets for each class label, we randomly selected N instances for each class in each dataset (i.e., N = 25,000 in D1, N = 30,000 in D2, and N = 4500, resulting in 50,000 total instances in D1, 60,000 in D2, and 9000 in D3). To ensure consistency of scale across all the features, a standardization method was applied to transform all the feature values in each dataset to a mean of 0 and a standard deviation of 1. In order to reduce variations in the evaluation results to achieve stable scores when splitting the training and testing datasets, we applied a stratified 10-fold cross-validation technique to all the datasets when training and evaluating the classifiers.

5.3. Experimental Results and Discussions

This section reports and discusses the performance of each classifier, in terms of four metrics, for each applied hyperparameter optimization technique and those without optimization. For each result, as displayed in the table subsections below, the classifiers are ranked from best to worst performers by their accuracy scores. First, the hyperparameters for the optimization process were identified. Table 1 below lists the selected hyperparameters of each classifier and the value range for the optimization process. For consistency of the optimization process across all the techniques, we only selected real-valued parameters, since DE algorithms operate only on solution vectors in continuous space. Also, the most effective parameters of each metaheuristic algorithm and their optimal values recommended by various studies, including Pedersen and Chipperfield [60], Clerc and Kennedy [61], Storn and Price [36], Das and Suganthan [62], and Al-Sarem, et al. [46], were identified and adopted in this study. These are summarized in Table 2 below.

5.3.1. Results of the Ensemble Classifiers with Default Hyperparameters

In the first set of experiments, all six classifiers were trained and evaluated on the D1, D2, and D3 datasets without applying any hyperparameter optimization techniques. Table 3 presents the results in terms of the performance metrics of the classifiers when applied to datasets D1, D2, and D3. Figure 4 presents a pictorial comparison of the accuracy scores of each classifier for the datasets. It can be observed that the performance of all the classifiers increased from D1 to D3, with optimal performance achieved with the latter. CatBoost sees the largest increase in accuracy (4.04%) from D1 to D3, whereas ExtraTrees sees the least increase (2.23%), in terms of the same metric. Increases are also observed in the other metrics for all the classifiers, with none showing a decrease in any of the metrics.

With default hyperparameters, XGBoost appears to be the best performer for all the datasets, with accuracy scores of 95.38% in D1, 96.12% in D2, and 97.33% in D3. It also achieved the highest scores in all the other metrics, except the recall score in D2, whereby it was only slightly outperformed by CatBoost, by 0.03%. On the other hand, Gradient Boosting is the worst performer in terms of datasets D1 and D2, with accuracy scores of 92.82% and 94.35%, respectively. In D3, it is the second worst performer, slightly outperforming ExtraTrees, the worst performer by a margin of 0.05%. Gradient Boosting also attained the lowest scores in D1 and D2 across all the other metrics, except the recall in D2, whereas ExtraTrees scores for the other metrics were the lowest in D3. XGBoost, LGBM, and Gradient Boosting maintained their rank positions for datasets D1 and D2, while the other classifiers slightly changed their positions. In D3, however, LGBM and ExtraTrees significantly changed their positions with a performance improvement observed in the former and performance degradation observed in the latter. These major differences between the datasets are likely due to the feature composition difference between D3 and both D1 and D2.

5.3.2. Results of the Ensemble Classifiers with PSO Optimization

In the second set of experiments, each classifier was trained and evaluated with PSO as the hyperparameter optimization technique. Table 4 presents their performance on the three datasets, while Figure 5 provides a visualized comparison of the accuracy scores for each classifier on the datasets. Similar to the first set of results, all the classifiers experienced significant gains in their performance across all metrics on D2 and D3 datasets, with all the classifiers attaining the worst and the best performance in D1 and D3, respectively. LGBM was found to be the best classifier in terms of D2 and D3, obtaining the highest accuracy scores of 96.89% and 98.56%, respectively. It also achieved the second-best accuracy of 95.25% in D1, behind Gradient Boosting by a margin of 0.03%. Random Forest and ExtraTrees were found to be the worst performers overall, both achieving the lowest accuracy scores in D1 and D3. XGBoost and CatBoost maintained their second top-tier position in D2 and D3, while they were in the mid-tier position for D1. LGBM attained the largest accuracy gain of 3.31% between D1 and D3, whereas Gradient Boosting obtained the lowest gain of 2.62%. Interestingly, the top three classifiers in all the datasets have a small margin of scores across all the metrics, which is different from the previous results.

5.3.3. Results of the Ensemble Classifiers with DE Optimization

Table 5 shows the results for the third set of experiments with the classifiers’ hyperparameters optimized using DE. Again, there is a significant increase in the scores in regard to all the metrics for each classifier from D1 to D3. Figure 6 visualizes the accuracy score for each classifier across the datasets. LGBM, Gradient Boosting, and XGBoost achieved the best performance across all the datasets. LGBM was found to attain the highest accuracy score of 96.85% and 98.99% in D1 and D3, respectively, but only achieved the third position in D2, with an accuracy of 96.97%. Again, Random Forest and ExtraTrees were found to yield the worst performance across all the datasets. In terms of accuracy gains between D1 and D3, CatBoost had the largest margin of 3.29%, while LGBM had the lowest margin of 2.14%.

5.3.4. Results of the Ensemble Classifiers with GA Optimization

In regard to GA optimization, LGBM, Gradient Boosting, and XGBoost are again the best performers in all the datasets, attaining the best accuracy scores of 97.97% (in D1), 98.69% (in D2), and 99.47% (in D3), respectively. Gradient Boosting is observed to make the largest gain in accuracy of 2.16%, while LGBM had the smallest gain of 1.27%. The GA, Random Forest, and ExtraTrees yielded the worst performance for all three datasets. Similar to the previous sets of results, all the classifiers experienced significant gains in their performance across all metrics on D2 and D3 datasets. The results are summarized below in Table 6 and Figure 7.

5.3.5. Results Analysis and Discussion

In each set of experiments, we observed a persistent rise in the performance of the classifiers across all metrics from D1 to D3. This was an expected trend for datasets D1 and D2, given the substantial difference in dataset sizes between the two datasets with the same features. This concurs with the general observation made by various studies, including the studies by Halevy, et al. [63] and Domingos [64] in which they empirically concluded that the performance of ML algorithms generally improves with larger dataset sizes due to the increased diversity and coverage of the data, which enable better generalization and learning of complex patterns. However, the performance in D3 is better than in D1 and D2, although D3 is the smallest dataset in terms of size, by far. This suggests that the selected best features in D3 are far better predictors than those in D1 and D2. Another key observation is the consistency of the performance of the classifiers across the datasets, with LGBM, Gradient Boosting, and XGBoost being the best performers overall. The first two were the top performers in eight sets of experiments, while the latter was the top performer in four sets. Strangely, the first two appeared to be the worst performers in two of the three sets of experiments with default hyperparameters. On the other hand, ExtraTrees and Random Forest were found to be the weakest performers, as they ranked in the lowest positions for the nine sets of experiments.

Regarding the performance comparison between the metaheuristic techniques, all the techniques appear to improve the scores of each classifier for most metrics in all datasets compared to the use of default hyperparameters. Figure 8a–c visualizes the improvements in accuracy in both datasets. There are a few exceptions, however. For instance, the accuracy scores of XGBoost, Random Forest, and ExtraTrees with PSO in D1 were found to be slightly lower than without optimization, with margins of 0.16%, 0.24%, and 0.34%, respectively. The same technique also reduced the precision scores of XGBoost and LGBM by 0.33% and 0.48%, respectively, and the recall score of the former, by 0.28%. PSO also lowered the f1 score of XGBoost by 0.3% and that of LGBM by 0.02%. In D2, PSO decreased the precision score of XGBoost, ExtraTrees, Random Forest, and LGBM by 0.16%, 0.23%, 0.97%, and 0.59%, respectively. Meanwhile, the recall score of XGBoost and CatBoost were decreased by 0.3% and 0.1%, respectively. Except for Gradient Boosting, the f1 scores of all the classifiers were reduced by slight margins, the largest being 0.62%. Apart from PSO, only DE decreased the performance of some of the classifiers. Specifically, it lowered the f1 score on D2 of ExtraTrees and Random Forest by 0.1% and 0.2%, respectively. Generally, the GA was found to yield the highest scores in all the metrics, outperforming the other metaheuristic techniques. DE is in the second position, with PSO taking the last position.

Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 show the confusion matrices of the classifiers when using default hyperparameters and when optimized using GAs on dataset D1. The accuracy scores of all the classifiers in predicting legitimate and phishing websites were improved when using GA optimization, as their overall accuracy scores are reflected in Figure 8a. Generally, all the classifiers predicted the legitimate websites more accurately than the phishing ones. The highest increase in the accuracy score for predicting legitimate websites was obtained by Gradient Boosting, with a jump of 4.32%. For the prediction of phishing websites, the highest increase of 5.33% was obtained by LGBM. The highest accuracy for predicting a legitimate website with default hyperparameters was obtained by XGBoost with a score of 95.91%, whereas with GA optimization, the same classifier achieved the highest score at 98.67%. XGBoost also achieved the highest accuracy score of 94.85% for predicting phishing websites with default hyperparameters, while LGBM achieved the highest accuracy of 97.39%.

5.3.6. Statistical Significance Tests

Statistically testing the significance of the differences between groups of values is a crucial step in the analysis of experimental data, as it helps to determine whether the observed differences are likely due to chance or if there is a genuine difference between the groups due to the effect of the applied process, in this case, the hyperparameter optimization. As a result, the quantified evidence allows researchers to draw reliable conclusions with no false positives and, thus, be able to confidently identify the process that should be adopted [65].

Here, we aim to provide statistical evidence of whether each metaheuristic technique has improved the performance (accuracy scores to be used for this analysis) of each classifier over the use of default hyperparameters. To achieve the objective, we applied a paired t-test [66] to test our null and alternative hypotheses (H_o and H_a, respectively) for each classifier using a significance level (α) of 0.05. The hypotheses are defined as follows:

H_o = The tested metaheuristic technique does not improve the performance

H_a = The tested metaheuristic technique improves the performance

Table 7, below, shows the results of the paired t-test in terms of the t-statistic and p-values of each metaheuristic technique in all three datasets over those produced by the default hyperparameters. The results show that the null hypothesis can be rejected in all cases for the GA, four cases in DE, and only one case for PSO (shown with bold text), since their p-values are lower than the significance level. This confirms that the GA has a genuine impact on the accuracy of all the classifiers, whereas DE has an impact on Random Forest, CatBoost, LGBM, and Gradient Boosting, with no effect on the rest of the classifiers. PSO, however, only impacted the accuracy of LGBM. This implies that only LGBM was impacted by all three techniques, while Random Forest, CatBoost, and Gradient Boosting were only impacted by DE and GA. XGBoost and ExtraTrees, on the other hand, were only impacted by PSO.

In terms of the extent of the impact of the techniques, we observe that the GA has impacted Random Forest, ExtraTrees, and XGBoost the most given that the lowest p-values are achieved (below 0.01). The latter two were also the most impacted compared to others, across all the techniques. Random Forest is the largest beneficiary of DE optimization, by achieving the lowest p-value compared to others. CatBoost and LGBM appear to be the least impacted by both the GA and DE techniques, by attaining the largest p-values.

This test, therefore, concludes that out of the three techniques, the GA is more effective in optimizing the hyperparameters of both bagging and boosting tree-based classifiers, followed by DE and, then, PSO. This supports our observations based on the results presented in Section 5.3.5.

6. Conclusions

This paper has investigated and compared the effectiveness of nature-inspired metaheuristic techniques for optimizing the hyperparameters of ML models for predicting phishing websites. Three popular nature-inspired metaheuristic techniques, the GA, DE, and PSO, were selected for tuning the hyperparameters of six ensemble classifiers, namely Random Forest, Gradient Boosting, LGBM, XGBoost, CatBoost, and ExtraTrees. Three publicly available datasets, consisting of predictive features extracted from phishing and legitimate websites, were used for the evaluation. The methodology adopted was to train each classifier on each dataset and then tune each of the resulting classifiers using each optimization technique. To establish a baseline for assessing the impact of the techniques on the performance of the classifiers, each classifier was also trained and evaluated with default hyperparameters. Statistical testing of the accuracy scores of all the classifiers optimized with the three techniques on the three datasets was conducted to validate the genuine impact of the techniques over the use of default hyperparameters. The results have shown that the GA was the most effective technique, by significantly improving the accuracy of all the classifiers. It also improved XGBoost and ExtraTrees the most, by achieving the lowest p-values of 0.0019 and 0.0039. DE was the second most effective technique, significantly improving the accuracy of four out of the six classifiers, with Random Forrest being the top beneficiary. It was found that PSO impacted only one classifier, which was LGBM. Overall, GA-optimized Gradient Boosting, LGBM, and XGBoost were found to be the best performers across all the metrics in predicting phishing websites.

It is important to note that these results were obtained using recommended values for the parameters of each metaheuristic technique established by studies in different contexts. It is unknown how the use of different values might have impacted the performance of the techniques in this context and, thus, their ranking. As part of our future work, we aim to evaluate the impact of various values of these parameters on the performance of the techniques using the same datasets. We also intend to extend our study to involve other EAs, swarm intelligence, and physical-inspired metaheuristic techniques, in regard to this problem. Based on our experience, the optimization runtime differs significantly among these techniques. We plan to explore this aspect, along with the performance issue. Despite the success of metaheuristic techniques in optimizing the hyperparameters for ML models in various domains, other traditional hyperparameter optimization techniques including grid search, random search, Bayesian optimization, and hyperband have also yielded good results in many fields. It is vital for future research to incorporate them in similar comparative studies for a comprehensive evaluation of their performance against these modern techniques, in order to establish the most robust and efficient techniques in terms of this problem.

Funding

This research did not receive any funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this research are available at the online repositories cited in Section 5.1.

Conflicts of Interest

The author declares that there are no conflicts of interest.

References

APWG. Phishing Activity Trends Report: 1st Quarter 2023. 2023. Available online: https://docs.apwg.org/reports/apwg_trends_report_q1_2023.pdf (accessed on 7 April 2024).
APWG. Phishing Activity Trends Report: 4th Quarters 2016. 2016. Available online: http://docs.apwg.org/reports/apwg_trends_report_q4_2016.pdf (accessed on 23 December 2016).
FBI. Internet Crime Report. 2020. Available online: https://www.ic3.gov/Media/PDF/AnnualReport/2020_IC3Report.pdf (accessed on 13 August 2021).
IBM. Cost of a Data Breach Report 2023. 2023. Available online: https://www.ibm.com/downloads/cas/E3G5JMBP (accessed on 17 April 2024).
FTC. Consumer Sentinel Network Data Book 2022. 2023. Available online: https://www.ftc.gov/system/files/ftc_gov/pdf/CSN-Data-Book-2022.pdf (accessed on 27 April 2024).
IBM Security. Cost of a Data Breach Report 2019. 2019. Available online: https://www.ibm.com/downloads/cas/ZBZLY7KL (accessed on 3 April 2020).
Gendre, A. How Much Does a Spear Phishing Attack Cost? 2015. Available online: https://www.vadesecure.com/en/spear-phishing-cost/ (accessed on 10 April 2020).
Retruster. (n.d). The True Cost of a Phishing Attack. Available online: https://retruster.com/blog/phishing-attack-true-cost.html (accessed on 3 April 2020).
Ponemon Institute. The Cost of Phishing and Value of Employee Training. 2015. Available online: https://info.wombatsecurity.com/hubfs/Ponemon_Institute_Cost_of_Phishing.pdf (accessed on 19 February 2017).
Internet Society Global Internet Report 2016. 2016. Available online: https://www.internetsociety.org/globalinternetreport/2016/wp-content/uploads/2016/11/ISOC_GIR_2016-v1.pdf (accessed on 27 June 2017).
SecureWorks. COBALT DICKENS Goes Back to School … Again. 2019. Available online: https://www.secureworks.com/blog/cobalt-dickens-goes-back-to-school-again (accessed on 7 August 2021).
Verizon. 2018 Data Breach Investigations Report. 2018. Available online: https://enterprise.verizon.com/resources/reports/DBIR_2018_Report_execsummary.pdf (accessed on 9 May 2020).
Lee, W.; Rotoloni, B. Emerging Cyber Threats Report 2016. 2016. Available online: https://www.digicert.com/dc/emerging-cyber-threats-in-2016/ (accessed on 4 July 2017).
Allianz. (n.d). Cyber Attacks on Critical Infrastructure. Available online: https://www.agcs.allianz.com/news-and-insights/expert-risk-articles/cyber-attacks-on-critical-infrastructure.html (accessed on 27 March 2020).
Ball, T. Top 5 Critical Infrastructure Cyber Attacks. 2017. Available online: https://www.anapaya.net/blog/top-5-critical-infrastructure-cyberattacks (accessed on 25 March 2020).
Gendre, A. 4 Ways Hackers Use Phishing to Launch Ransomware Attacks. 2019. Available online: https://www.vadesecure.com/en/3-ways-hackers-use-phishing-to-launch-ransomware-attacks/ (accessed on 25 March 2020).
Rodríguez, J. Most Common Attack Vector over Critical Infrastructures. 2019. Available online: https://www.cipsec.eu/content/most-common-attack-vector-over-critical-infrastructures (accessed on 6 March 2020).
Pompon, R. Three Ways to Hack the U.S. Election. 2019. Available online: https://www.f5.com/labs/articles/threat-intelligence/three-ways-to-hack-the-u-s--election (accessed on 8 April 2020).
Greenberg, A. Everything We Know about Russia’s Election-Hacking Playbook. 2017. Available online: https://www.wired.com/story/russia-election-hacking-playbook/ (accessed on 5 April 2020).
Brattberg, E.; Maurer, T. Russian Election Interference: Europe’s Counter to Fake News and Cyber Attacks. 2018. Available online: https://carnegieendowment.org/2018/05/23/russian-election-interference-europe-s-counter-to-fake-news-and-cyber-attacks-pub-76435 (accessed on 8 April 2020).
CNN. 2016 Presidential Campaign Hacking Fast Facts. 2020. Available online: https://edition.cnn.com/2016/12/26/us/2016-presidential-campaign-hacking-fast-facts/index.html (accessed on 18 April 2020).
Da Silva, J.C. Application of Genetic Algorithm in a logistic engineering problem. DESAFIOS-Rev. Interdiscip. Da Univ. Fed. Do Tocantins 2022, 9, 93–112. [Google Scholar]
Akbari, V.; Naghashzadegan, M.; Kouhikamali, R.; Afsharpanah, F.; Yaïci, W. Multi-Objective Optimization and Optimal Airfoil Blade Selection for a Small Horizontal-Axis Wind Turbine (HAWT) for Application in Regions with Various Wind Potential. Machines 2022, 10, 687. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, X.; Yuan, J. Swarm Intelligence Algorithms for Portfolio Optimization Problems: Overview and Recent Advances. Mob. Inf. Syst. 2022, 2022, 4241049. [Google Scholar] [CrossRef]
Bäck, T. Evolutionary Computation 1: Basic Algorithms and Operators, 1st ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Eiben, A.E.; Smith, J.E. Introduction to Evolutionary Computing; Springer: Heidelberg, Germany, 2015. [Google Scholar]
Engelbrecht, A.P. Computational Intelligence: An Introduction, 2nd ed.; John Wiley & Sons: West Sussex, UK, 2007. [Google Scholar]
Yang, X.-S.; Press, L. Nature-Inspired Metaheuristic Algorithms, 2nd ed.; Luniver Press: Cambridge, UK, 2010. [Google Scholar]
Liu, J.; Xia, Y. A hybrid intelligent genetic algorithm for truss optimization based on deep neutral network. Swarm Evol. Comput. 2022, 73, 101120. [Google Scholar] [CrossRef]
Hemanth, D.J.; Anitha, J. Modified Genetic Algorithm approaches for classification of abnormal Magnetic Resonance Brain tumour images. Appl. Soft Comput. 2019, 75, 21–28. [Google Scholar] [CrossRef]
Huang, X.; Yang, L. A hybrid genetic algorithm for multi-objective flexible job shop scheduling problem considering transportation time. Int. J. Intell. Comput. Cybern. 2019, 12, 154–174. [Google Scholar] [CrossRef]
Devi, R.V.; Sathya, S.S.; Coumar, M.S. Multi-objective Genetic Algorithm for De Novo Drug Design (MoGADdrug). Curr. Comput. Aided Drug Des. 2021, 17, 445–457. [Google Scholar] [CrossRef] [PubMed]
Türkoğlu, B.; Eroğlu, H. Genetic Algorithm for Route Optimization. In Applied Genetic Algorithm and Its Variants: Case Studies and New Developments; Dey, N., Ed.; Springer Nature Singapore: Singapore, 2023; pp. 51–79. [Google Scholar]
Vanneschi, L.; Silva, S. Genetic Algorithms. In Lectures on Intelligent Systems; Springer International Publishing: Cham, Switzerland, 2023; pp. 45–103. [Google Scholar]
Höschel, K.; Lakshminarayanan, V. Genetic algorithms for lens design: A review. J. Opt. 2019, 48, 134–144. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Storn, R.; Price, K.V.; Lampinen, J. Differential Evolution—A Practical Approach to Global Optimization; Springer: Berlin, Germany, 2005. [Google Scholar]
Kumar, S.R. Particle Swarm Optimization. In Swarm Intelligence; Antonio, A.-F.M., Ed.; IntechOpen: Rijeka, Croatia, 2023; pp. 1–5. [Google Scholar]
Vanneschi, L.; Silva, S. Particle Swarm Optimization. In Lectures on Intelligent Systems; Springer International Publishing: Cham, Switzerland, 2023; pp. 105–111. [Google Scholar]
Tsai, C.-W.; Chiang, M.-C. Chapter Nine—Particle swarm optimization. In Handbook of Metaheuristic Algorithms; Tsai, C.-W., Chiang, M.-C., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 163–184. [Google Scholar]
Jain, N.K.; Nangia, U.; Jain, J. A Review of Particle Swarm Optimization. J. Inst. Eng. (India) Ser. B 2018, 99, 407–411. [Google Scholar] [CrossRef]
Bonyadi, M.R.; Michalewicz, Z. Impacts of Coefficients on Movement Patterns in the Particle Swarm Optimization Algorithm. IEEE Trans. Evol. Comput. 2017, 21, 378–390. [Google Scholar] [CrossRef]
Bai, Q. Analysis of particle swarm optimization algorithm. Comput. Inf. Sci. 2010, 3, 180. [Google Scholar] [CrossRef]
You, Z.; Chen, W.; He, G.; Nan, X. Adaptive weight particle swarm optimization algorithm with constriction factor. In Proceedings of the 2010 International Conference of Information Science and Management Engineering, Xi’an, China, 7–8 August 2010; pp. 245–248. [Google Scholar]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Al-Sarem, M.; Saeed, F.; Al-Mekhlafi, Z.G.; Mohammed, B.A.; Al-Hadhrami, T.; Alshammari, M.T.; Alreshidi, A.; Alshammari, T.S. An Optimized Stacking Ensemble Model for Phishing Websites Detection. Electronics 2021, 10, 1285. [Google Scholar] [CrossRef]
Stobbs, J.; Issac, B.; Jacob, S.M. Phishing Web Page Detection Using Optimised Machine Learning. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 483–490. [Google Scholar]
Almousa, M.; Zhang, T.; Sarrafzadeh, A.; Anwar, M. Phishing website detection: How effective are deep learning-based models and hyperparameter optimization? Secur. Priv. 2022, 5, e256. [Google Scholar] [CrossRef]
Kumar, P.P.; Jaya, T.; Rajendran, V. SI-BBA—A novel phishing website detection based on Swarm intelligence with deep learning. Mater. Today Proc. 2023, 80, 3129–3139. [Google Scholar] [CrossRef]
Alqahtani, H.; Alotaibi, S.S.; Alrayes, F.S.; Al-Turaiki, I.; Alissa, K.A.; Aziz, A.S.A.; Maray, M.; Al Duhayyim, M. Evolutionary Algorithm with Deep Auto Encoder Network Based Website Phishing Detection and Classification. Appl. Sci. 2022, 12, 7441. [Google Scholar] [CrossRef]
Nagunwa, T. AI-driven approach for robust real-time detection of zero-day phishing websites. Int. J. Inf. Comput. Secur. 2024, 23, 79–118. [Google Scholar] [CrossRef]
Brownlee, J. Classification Accuracy Is Not Enough: More Performance Measures You Can Use. 2014. Available online: https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/ (accessed on 21 August 2018).
Müller, A.; Guido, S. Introduction to Machine Learning with Python, 1st ed.; O’Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
Brownlee, J. How to Use ROC Curves and Precision-Recall Curves for Classification in Python. 2018. Available online: https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ (accessed on 5 January 2019).
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10 August 2015. [Google Scholar]
Vrbančič, G.; Fister, I.; Podgorelec, V. Datasets for phishing websites detection. Data Brief 2020, 33, 106438. [Google Scholar] [CrossRef] [PubMed]
Gavrilova, Y.; Bolgurtseva, O. What Is Data Preprocessing in ML? 2020. Available online: https://serokell.io/blog/data-preprocessing (accessed on 6 May 2021).
Goyal, K. Data Preprocessing in Machine Learning: 7 Easy Steps To Follow. 2020. Available online: https://www.upgrad.com/blog/data-preprocessing-in-machine-learning/ (accessed on 6 May 2021).
Pedersen, M.E.H.; Chipperfield, A.J. Simplifying Particle Swarm Optimization. Appl. Soft Comput. 2010, 10, 618–628. [Google Scholar] [CrossRef]
Clerc, M.; Kennedy, J. The particle swarm—Explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 2002, 6, 58–73. [Google Scholar] [CrossRef]
Das, S.; Suganthan, P.N. Differential Evolution: A Survey of the State-of-the-Art. IEEE Trans. Evol. Comput. 2011, 15, 4–31. [Google Scholar] [CrossRef]
Halevy, A.; Norvig, P.; Pereira, F. The Unreasonable Effectiveness of Data. IEEE Intell. Syst. 2009, 24, 8–12. [Google Scholar] [CrossRef]
Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
Gravetter, F.J.; Wallnau, L.B.; Forzano, L.-A.B.; Witnauer, J.E. Essentials of Statistics for the Behavioral Sciences; Cengage: Boston, MA, USA, 2021. [Google Scholar]
Kim, T.K. T test as a parametric statistic. Korean J. Anesthesiol. 2015, 68, 540–546. [Google Scholar] [CrossRef]

Figure 1. A flow chart of the GA optimization process [35].

Figure 2. A diagrammatical representation of our methodology.

Figure 3. (a) Distribution of D1 by class labels; (b) distribution of D2 by class labels; (c) distribution of D3 by class labels.

Figure 4. Classifier accuracy comparison on D1, D2, and D3, with default hyperparameters.

Figure 5. Classifier accuracy comparison on D1, D2, and D3, with PSO hyperparameter optimization.

Figure 6. Classifier accuracy comparison on D1, D2, and D3, with DE hyperparameter optimization.

Figure 7. Classifier accuracy comparison on D1, D2, and D3, with GA hyperparameter optimization.

Figure 8. (a) Impact of optimization techniques on accuracy scores of classifiers on dataset D1. (b) Impact of optimization techniques on accuracy scores of classifiers on dataset D2. (c) Impact of optimization techniques on accuracy scores of classifiers on dataset D3.

Figure 9. (a) Confusion matrix of XGBoost without optimization. (b) Confusion matrix of XGBoost with GA optimization.

Figure 10. (a) Confusion matrix of ExtraTrees without optimization. (b) Confusion matrix of ExtraTrees with GA optimization.

Figure 11. (a) Confusion matrix of Random Forest without optimization. (b) Confusion matrix of Random Forest with GA optimization.

Figure 12. (a) Confusion matrix of CatBoost without optimization. (b) Confusion matrix of CatBoost with GA optimization.

Figure 13. (a) Confusion matrix of LGBM without optimization. (b) Confusion matrix of LGBM with GA optimization.

Figure 14. (a) Confusion matrix of Gradient Boosting without optimization. (b) Confusion matrix of Gradient Boosting with GA optimization.

Table 1. Selected hyperparameters of each classifier for optimization.

Classifier	Tuned Hyperparameter	Value Range
XGBoost	n_estimators	150–800
	learning_rate	0.01–0.2
	max_depth	2–30
	min_child_weight	1–5
ExtraTrees	n_estimators	150–800
	max_depth	2–30
	min_samples_split	2–10
	min_samples_leaf	2–10
Random Forest	n_estimators	150–800
	max_depth	2–30
	min_samples_split	2–10
	min_samples_leaf	2–10
CatBoost	iterations	50–150
	learning_rate	0.01–0.2
	depth	3–15
	l2_leaf_reg	1–5
LGBM	n_estimators	150–800
	learning_rate	0.01–0.2
	max_depth	2–30
	min_child_weight	1–5
Gradient Boosting	n_estimators	150–800
	learning_rate	0.01–0.2
	max_depth	2–30
	min_samples_split	2–10

Table 2. Selected parameters of each optimization technique.

Technique	Parameter Settings	Value
PSO	Swarm size	40
	Maximum iterations	10
	Inertia weight	0.7
DE	Strategy selection	best1bin
	Maximum iterations	10
	Population size	50
	Mutation	0.5
	Crossover rate	0.7
GA	Population size	50
	Generations	10
	Mutation	0.02
	Crossover rate	0.5

Table 3. (a) Performance of classifiers on dataset D1 with default hyperparameters. (b) Performance of classifiers on dataset D2 with default hyperparameters. (c) Performance of classifiers on dataset D3 with default hyperparameters.

(a)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
XGBoost	95.38	95.52	95.52	95.54
ExtraTrees	94.35	94.23	94.71	94.51
Random Forest	94.33	94.26	94.82	94.53
CatBoost	93.24	94.43	94.42	94.42
LGBM	93.27	93.91	94.14	94.06
Gradient Boosting	92.82	92.63	93.04	92.82
(b)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
XGBoost	96.12	96.22	96.92	96.61
CatBoost	95.78	95.05	96.95	96.52
ExtraTrees	95.45	95.92	96.08	96.53
Random Forest	95.37	95.84	96.13	96.45
LGBM	95.16	95.82	95.58	96.11
Gradient Boosting	94.35	93.08	95.81	94.49
(c)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
XGBoost	97.33	97.15	97.53	97.84
CatBoost	97.28	97.07	97.50	97.28
LGBM	97.26	97.13	97.31	97.26
Random Forest	96.90	96.84	97.03	97.39
Gradient Boosting	96.63	96.43	96.83	96.73
ExtraTrees	96.58	96.12	96.14	96.65

Table 4. (a) Performance of classifiers on dataset D1 with PSO hyperparameter optimization. (b) Performance of classifiers on dataset D2 with PSO hyperparameter optimization. (c) Performance of classifiers on dataset D3 with PSO hyperparameter optimization.

(a)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Gradient Boosting	95.28	95.19	95.28	95.24
LGBM	95.25	95.33	95.04	95.22
XGBoost	95.22	95.17	95.27	95.23
CatBoost	95.01	95.09	95.02	95.08
Random Forest	94.09	93.43	94.65	94.04
ExtraTrees	94.01	93.82	94.24	94.05
(b)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
LGBM	96.89	96.06	96.62	96.38
CatBoost	96.62	95.62	96.85	96.26
XGBoost	96.15	95.69	96.66	96.17
Random Forest	95.87	94.87	96.87	95.83
Gradient Boosting	95.64	95.23	96.08	95.66
ExtraTrees	95.62	94.45	96.75	95.69
(c)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
LGBM	98.56	98.23	98.31	98.26
XGBoost	98.20	98.17	98.58	98.34
CatBoost	98.02	98.01	98.52	98.22
Gradient Boosting	97.90	97.41	97.87	97.69
Random Forest	97.10	96.89	97.01	97.89
ExtraTrees	96.72	97.39	96.17	96.77

Table 5. (a) Performance of classifiers on dataset D1 with DE hyperparameter optimization. (b) Performance of classifiers on dataset D2 with DE hyperparameter optimization. (c) Performance of classifiers on dataset D3 with DE hyperparameter optimization.

(a)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
LGBM	96.85	96.36	96.41	96.37
Gradient Boosting	96.29	96.03	96.52	96.38
XGBoost	95.78	95.06	95.23	95.29
CatBoost	95.38	95.02	95.04	95.01
Random Forest	94.89	94.66	94.65	94.12
ExtraTrees	94.07	94.03	93.86	94.03
(b)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Gradient Boosting	97.68	97.11	96.96	97.67
XGBoost	97.59	97.21	96.97	97.56
LGBM	96.97	96.22	96.88	96.43
CatBoost	96.54	95.83	96.89	96.25
ExtraTrees	96.36	95.84	96.63	96.84
Random Forest	96.14	95.25	96.85	96.58
(c)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
LGBM	98.99	98.23	98.31	98.26
XGBoost	98.97	98.15	98.53	98.34
Gradient Boosting	98.92	98.43	97.83	98.63
CatBoost	98.67	98.07	97.50	98.28
Random Forest	97.56	96.84	97.03	97.89
ExtraTrees	96.95	97.32	96.14	96.75

Table 6. (a) Performance of classifiers on dataset D1 with GA hyperparameter optimization. (b) Performance of classifiers on dataset D2 with GA hyperparameter optimization. (c) Performance of classifiers on dataset D3 with GA hyperparameter optimization.

(a)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
LGBM	97.97	96.78	97.49	97.44
XGBoost	97.41	97.25	96.68	96.45
Gradient Boosting	96.82	96.07	96.62	96.36
CatBoost	96.66	96.12	96.45	96.27
Random Forest	96.24	96.23	95.89	96.88
ExtraTrees	95.89	95.28	95.08	95.66
(b)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Gradient Boosting	98.69	98.48	97.67	98.87
LGBM	98.02	98.23	97.87	98.62
XGBoost	97.96	96.27	97.65	97.64
CatBoost	97.45	97.92	97.07	97.47
Random Forest	96.93	95.29	97.03	96.05
ExtraTrees	96.82	95.45	96.82	95.91
(c)
Classifier	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
XGBoost	99.47	99.23	98.53	99.02
LGBM	99.24	99.15	98.31	98.96
Gradient Boosting	98.98	98.43	97.83	98.63
CatBoost	98.75	98.07	97.50	98.28
Random Forest	98.26	96.84	97.03	97.89
ExtraTrees	97.82	97.32	96.14	96.75

Table 7. Results of the paired t-test on the accuracy scores of each optimization technique in all three datasets over those produced by default hyperparameters.

Classifier	PSO		DE		GA
Classifier	t-Statistic	p-Value	t-Statistic	p-Value	t-Statistic	p-Value
XGBoost	−0.7795	0.5173	−3.0146	0.0947	−22.8631	0.0019
ExtraTrees	0.0605	0.9572	−0.9690	0.4348	−15.9262	0.0039
Random Forest	−0.7135	0.5495	−10.9380	0.0083	−10.0170	0.0098
CatBoost	−3.4051	0.0765	−3.9851	0.0498	−3.9305	0.0417
LGBM	−8.4098	0.0138	−4.1308	0.0490	−4.1684	0.0480
Gradient Boosting	−4.2538	0.0511	−8.1408	0.0148	−5.7982	0.0285

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nagunwa, T. Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection. Analytics 2024, 3, 344-367. https://doi.org/10.3390/analytics3030019

AMA Style

Nagunwa T. Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection. Analytics. 2024; 3(3):344-367. https://doi.org/10.3390/analytics3030019

Chicago/Turabian Style

Nagunwa, Thomas. 2024. "Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection" Analytics 3, no. 3: 344-367. https://doi.org/10.3390/analytics3030019

APA Style

Nagunwa, T. (2024). Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection. Analytics, 3(3), 344-367. https://doi.org/10.3390/analytics3030019

Article Menu

Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection

Abstract

1. Introduction

2. Natured-Inspired Metaheuristic Optimization Techniques

2.1. Genetic Algorithms

2.2. Differential Evolution

2.3. Particle Swarm Optimization

3. Related Work

4. Methodology

5. Experiments and Results

5.1. Datasets

5.2. Data Pre-Processing

5.3. Experimental Results and Discussions

5.3.1. Results of the Ensemble Classifiers with Default Hyperparameters

5.3.2. Results of the Ensemble Classifiers with PSO Optimization

5.3.3. Results of the Ensemble Classifiers with DE Optimization

5.3.4. Results of the Ensemble Classifiers with GA Optimization

5.3.5. Results Analysis and Discussion

5.3.6. Statistical Significance Tests

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI