RHSOFS: Feature Selection Using the Rock Hyrax Swarm Optimization Algorithm for Credit Card Fraud Detection System

In recent years, detecting credit card fraud transactions has been a difficult task due to the high dimensions and imbalanced datasets. Selecting a subset of important features from a high-dimensional dataset has proven to be the most prominent approach for solving high-dimensional dataset issues, and the selection of features is critical for improving classification performance, such as the fraud transaction identification process. To contribute to the field, this paper proposes a novel feature selection (FS) approach based on a metaheuristic algorithm called Rock Hyrax Swarm Optimization Feature Selection (RHSOFS), inspired by the actions of rock hyrax swarms in nature, and implements supervised machine learning techniques to improve credit card fraud transaction identification approaches. This approach is used to select a subset of optimal relevant features from a high-dimensional dataset. In a comparative efficiency analysis, RHSOFS is compared with Differential Evolutionary Feature Selection (DEFS), Genetic Algorithm Feature Selection (GAFS), Particle Swarm Optimization Feature Selection (PSOFS), and Ant Colony Optimization Feature Selection (ACOFS) in a comparative efficiency analysis. The proposed RHSOFS outperforms existing approaches, such as DEFS, GAFS, PSOFS, and ACOFS, according to the experimental results. Various statistical tests have been used to validate the statistical significance of the proposed model.


Introduction
Feature selection (also known as variable selection) is an important topic in the field of data mining. It is motivated by the requirement to choose the "optimal" selection of variables for prediction. The purpose of FS is to find the "best" subsets of features (or variables) for statistical analysis or to build a machine learning model [1,2]. Preprocessing is frequently required in FS to help the classification, prediction, or clustering stages better distinguish or represent the data and different approaches to be followed for feature selection [3,4]. In data mining and machine learning applications, FS is a critical activity that eliminates unnecessary and redundant characteristics and improves learning performance [5,6]. FS decreases dimensionality, removes irrelevant input, improves learning accuracy, and improves result comprehension as a preprocessing step for machine learning [7]. The difficulty for a learning algorithm is focusing its attention on a subset of features while ignoring the rest of the problem. As a result, processing and analyzing such large amounts of data is quite difficult. Without the use of an automated system, extracting valuable information from enormous amounts of data is a difficult task. FS is essential for detecting credit card fraud in large, multidimensional [8], and imbalanced datasets [9]. Many optimization algorithms have been used in the past decades to solve the FS algorithm by creating a subset of important features from dimensional datasets [10][11][12][13][14][15][16][17][18][19][20][21][22].
Filtering and wrapping are two types of FS approaches. If the FS technique is unaffected by the learning algorithm, then it is referred to as a filter approach; otherwise, it is referred to as a wrapper approach. The filter method is faster than the wrapper method in terms of processing time. On the other hand, the filter technique has the major disadvantage of being susceptible to inductive biases in the learning algorithms used to build the classifier. The wrapper approach has a higher processing overhead because it uses learning algorithms to evaluate a subset of features. However, in terms of accuracy, the wrapper strategy may outperform the filter method [1,2,10,11]. A preprocessing step is used in the filter technique [1,3,11] to select the best features. The filter approach's fundamental flaw is that it completely disregards the impact of the selected feature subset on the induction algorithm's performance. The wrapper methodology [1,3,11] introduced by Kohavi and John in 1997 is a simple and effective method for dealing with the problem of variable selection. The feature subset selection algorithm is used to wrap around the induction process in the wrapper approach. As part of the function evaluating feature subsets, the feature subset selection algorithm searches for a good subset using the induction process. The wrapper approach works on the simple principle of treating the induction process as a black box. The feature subset selection uses the induction technique as a black box in the wrapper approach, as shown in Figure 1. (i.e., no knowledge of the algorithm is needed, just the interface). The feature subset selection algorithm employs the induction algorithm as part of the evaluation function to conduct a search for a good subset. The wrapper-based FS technique is used in this article for optimal FS. For each individual combination of features, FS employs a search strategy to identify the best-suited features. The number of features increases exponentially with the number of viable solutions in the classic search approach [1,3,10,11].
Sensors 2022, 22, x FOR PEER REVIEW 2 of 1 for detecting credit card fraud in large, multi-dimensional [8], and imbalanced dataset [9]. Many optimization algorithms have been used in the past decades to solve the FS algorithm by creating a subset of important features from dimensional datasets [10][11][12][13][14][15][16][17][18][19][20][21][22]. Filtering and wrapping are two types of FS approaches. If the FS technique is unaf fected by the learning algorithm, then it is referred to as a filter approach; otherwise, it i referred to as a wrapper approach. The filter method is faster than the wrapper method in terms of processing time. On the other hand, the filter technique has the major disad vantage of being susceptible to inductive biases in the learning algorithms used to build the classifier. The wrapper approach has a higher processing overhead because it use learning algorithms to evaluate a subset of features. However, in terms of accuracy, th wrapper strategy may outperform the filter method [1,2,10,11]. A preprocessing step i used in the filter technique [1,3,11] to select the best features. The filter approach's funda mental flaw is that it completely disregards the impact of the selected feature subset on the induction algorithm's performance. The wrapper methodology [1,3,11] introduced by Kohavi and John in 1997 is a simple and effective method for dealing with the problem o variable selection. The feature subset selection algorithm is used to wrap around the in duction process in the wrapper approach. As part of the function evaluating feature sub sets, the feature subset selection algorithm searches for a good subset using the induction process. The wrapper approach works on the simple principle of treating the induction process as a black box. The feature subset selection uses the induction technique as a black box in the wrapper approach, as shown in Figure 1. (i.e., no knowledge of the algorithm is needed, just the interface). The feature subset selection algorithm employs the induction algorithm as part of the evaluation function to conduct a search for a good subset. Th wrapper-based FS technique is used in this article for optimal FS. For each individua combination of features, FS employs a search strategy to identify the best-suited features The number of features increases exponentially with the number of viable solutions in th classic search approach [1,3,10,11]. Grasshopper Optimization [12], Differential Evolution algorithm (DE) [17], Geneti Algorithm (GA) [18], Particle Swarm Optimization (PSO) [19], and Ant Colony Optimiza tion (ACO) [20,21] have all been used to solve the FS problem using wrapper-based FS methods. The disadvantage of this FS method is that it necessitates the tuning of variou parameters for better performance. In this context, this paper proposes a new wrapper based FS approach based on the Rock Hyrax Optimization (RHO) algorithm [23], which can detect credit card fraud [24][25][26][27][28][29][30] in massive and high-dimensional datasets and is con sidered very important for improving classification performance and fraud detection pro cesses. This method also identifies FS models with small abnormalities in large dataset [31,32] with high precision and focuses on low computation that does not require exten sive model-specific parameter settings. It finds the most necessary and pertinent feature Grasshopper Optimization [12], Differential Evolution algorithm (DE) [17], Genetic Algorithm (GA) [18], Particle Swarm Optimization (PSO) [19], and Ant Colony Optimization (ACO) [20,21] have all been used to solve the FS problem using wrapper-based FS methods. The disadvantage of this FS method is that it necessitates the tuning of various parameters for better performance. In this context, this paper proposes a new wrapperbased FS approach based on the Rock Hyrax Optimization (RHO) algorithm [23], which can detect credit card fraud [24][25][26][27][28][29][30] in massive and high-dimensional datasets and is considered very important for improving classification performance and fraud detection processes. This method also identifies FS models with small abnormalities in large datasets [31,32] with high precision and focuses on low computation that does not require  The main contributions of this study are that it presents a novel FS method based on the Rock Hyrax Swarm Optimization algorithm, as well as a detailed experimental com parison of numerous FS approaches such as DEFS, GAFS, PSOFS, ACOFS, and RHSOFS Thus, it also presents optimum features for creating an effective credit card fraud detec tion system. Several performance measures have been applied to FS approaches, and performance evaluation was carried out using extensive experiments on credit card datasets using clas sification algorithms such as NB, SVM [21,25,32], KNN, and DT, and the results were com pared to show that the experimental data were significant. The key consequences of this presented technique are that it reduces the overfitting concerns that arise when datasets are imbalanced [27] and increases the model's generalizability. Intrusion detection, spam mail detection systems, important medical disease classification, sophisticated picture classification, and industrial automation systems are examples of applications that require large and complex data processing.
The remaining sections of the paper are organized as follows. Section 2 discusses the related work on feature selection algorithms and their impact on application research ar eas. Section 3 briefly discusses feature subset selection modeling and key problem formu lation. Section 4 thoroughly describes the proposed RHOFS methodology. Section 5 pre sents experimental and statistical result analysis of various FS methods, as well as key issue discussion, and Section 6 concludes with futile suggestions and future work.

Literature Review
In the paper [1], the relationship between optimal feature subset selection and rele vance was investigated, as well as the wrapper method to feature subset selection utilizing naïve Bayes and decision trees. This paper [2] discusses the fundamental difficulties in FS such as feature relevance, redundancy, the characteristics and performance of different FS methods, and how to choose the best method for a given application. In the paper [3], the proper definitions of the objective function, as well as feature creation, were explored. I also looked at feature ranking and multivariate FS, as well as efficient search algorithms The main contributions of this study are that it presents a novel FS method based on the Rock Hyrax Swarm Optimization algorithm, as well as a detailed experimental comparison of numerous FS approaches such as DEFS, GAFS, PSOFS, ACOFS, and RHSOFS. Thus, it also presents optimum features for creating an effective credit card fraud detection system. Several performance measures have been applied to FS approaches, and performance evaluation was carried out using extensive experiments on credit card datasets using classification algorithms such as NB, SVM [21,25,32], KNN, and DT, and the results were compared to show that the experimental data were significant. The key consequences of this presented technique are that it reduces the overfitting concerns that arise when datasets are imbalanced [27] and increases the model's generalizability. Intrusion detection, spam mail detection systems, important medical disease classification, sophisticated picture classification, and industrial automation systems are examples of applications that require large and complex data processing.
The remaining sections of the paper are organized as follows. Section 2 discusses the related work on feature selection algorithms and their impact on application research areas. Section 3 briefly discusses feature subset selection modeling and key problem formulation. Section 4 thoroughly describes the proposed RHOFS methodology. Section 5 presents experimental and statistical result analysis of various FS methods, as well as key issue discussion, and Section 6 concludes with futile suggestions and future work.

Literature Review
In the paper [1], the relationship between optimal feature subset selection and relevance was investigated, as well as the wrapper method to feature subset selection utilizing naïve Bayes and decision trees. This paper [2] discusses the fundamental difficulties in FS, such as feature relevance, redundancy, the characteristics and performance of different FS methods, and how to choose the best method for a given application. In the paper [3], the proper definitions of the objective function, as well as feature creation, were explored. It also looked at feature ranking and multivariate FS, as well as efficient search algorithms and ways for determining feature validity. The paper [4] focused on various typical methods of FS and extraction, as well as comparisons of each method. The paper [5] provided a detailed assessment of semi-supervised FS strategies, outlining the benefits and drawbacks of each method. The paper [7] focused on the filer model and created a new FS method that can successfully remove both unnecessary and redundant features while being less computationally expensive than existing algorithms. The paper [9] provided a wrapper-based FS strategy for selecting the most relevant features based on the artificial electric field optimization algorithm. A new method for selecting features in biomedical data is proposed in the paper [11]. A novel FS technique based on the real-valued grasshopper optimization algorithm was proposed in this paper [12]. The jaya optimization algorithm [15] has been used to construct a unique and prominent wrapper-based FS model with an emphasis on a low-computing FS model that does not require sophisticated algorithm-specific parameter tuning. The goal of the paper [16] was to find a way to reduce the time complexity of wrapper-based FSS with an embedded K-Nearest-Neighbor (KNN) classifier by building a classifier distance matrix and incrementally updating it to speed up the calculation of relevance criteria in evaluating the quality of candidate features. The paper [17] proposed a Differential Evolution (DE) optimization technique for FS, and the result of DE was compared with GA and PSO. The use of genetic algorithms to solve the feature subset selection problem using neural network classifiers was presented in the paper [18]. The implementation of FS in intrusion detection in wireless sensor networks is provided in the paper [19], which is based on PSO and PCA space. The outcome of the approach is compared to that of GA. The Ant Colony Optimization technique for FS is provided in the paper [20,21]. The paper [23] suggested a new swarm intelligence technique based on the behavior of rock hyrax swarms. The proposed algorithm can also balance the exploration and exploitation phases, making it suitable for a wide range of optimization problems. In the paper [24], an improved Credit Card Risk Identification (CCRI) technique for detecting fraud risk is described, which is based on feature selection algorithms such as Random Forest and Support Vector Machine (SVM) classifier. The paper [25] describes an SVMtype FS strategy that uses artificial variables and mutual information to filter out noisy variables from high-dimensional metabolome data. In the paper [26], they presented a credit card fraud detection model and the necessity of using a feature selection approach. Some of the most prominent supervised and unsupervised machine learning algorithms were used to detect credit card thefts in a severely skewed dataset [30]. The analysis and comparison studies of different machine learning algorithms and boosting machine learning algorithms for fraud detection are discussed in papers [31,32].

Problem Definition
This section describes the problem formulation applied in this paper. To increase the performance of classification models, FS refers to a method for selecting an optimum subset of input features from the entire dataset. It simply picks out the elements that matter in the decision-making process. To lower the computing cost of the problem, it generates and selects the most effective subset of features by removing redundant and irrelevant features. This is an NP-hard problem [13], meaning it cannot be solved in polynomial time. The goal is to obtain the best subset of features to increase the classification process' performance. The following are the four steps that make up the best FS: (i) a subset of features is generated; (ii) using these subsets of characteristics, evaluate and compare fitness levels; (iii) verify that the termination conditions have been met, if not, repeat steps (i) and (ii); validate the results using the best subset of characteristics.
The problem formulation for FS is performed by selecting d important features from a set of D features, which is represented in Equation (1), shown below: Minimize f (x), Subject to Condition, x =|D| and x ≥ 0 . Using the optimized subset features, Equation (1) reduces error in each iteration stage, thus increasing classification accuracy in the proposed model.

Proposed Model
RHSO (Rock Hyrax Swarm Optimization) is a meta-heuristic based on the natural behavior of rock hyrax swarms. The RHSO algorithm simulates the collective behavior of rock hyraxes to find food and their unique way of looking at it. Rock hyraxes live in colonies or groups, with a dominant male keeping a close eye on the colony to ensure its protection. The algorithm seeks out the best solutions by incorporating both local heuristics and prior knowledge into the construction of the best subset of features to improve the classification process' performance [23].
The RHSOFS detailed functioning model is depicted in Figure 3, which separates the whole dataset into training and testing sets. The training data are entered into the optimization technique. (i.e., f (x)) to find the best suitable optimum features. The classification algorithm is fed the optimum subset of features (i.e., f (x)), train, and test data to evaluate the model's performance. Equation (1) can be used to represent the selection of the most optimal features. Equation (2) reduces the error in each iteration using the specified characteristics, increasing the classification accuracy in the process. Using the optimized subset features, Equation (1) reduces error in each iteration stage, thus increasing classification accuracy in the proposed model.

Proposed Model
RHSO (Rock Hyrax Swarm Optimization) is a meta-heuristic based on the natural behavior of rock hyrax swarms. The RHSO algorithm simulates the collective behavior of rock hyraxes to find food and their unique way of looking at it. Rock hyraxes live in colonies or groups, with a dominant male keeping a close eye on the colony to ensure its protection. The algorithm seeks out the best solutions by incorporating both local heuristics and prior knowledge into the construction of the best subset of features to improve the classification process' performance [23].
The RHSOFS detailed functioning model is depicted in Figure 3, which separates the whole dataset into training and testing sets. The training data are entered into the optimization technique. (i.e., f(x)) to find the best suitable optimum features. The classification algorithm is fed the optimum subset of features (i.e., f(x)), train, and test data to evaluate the model's performance. Equation (1) can be used to represent the selection of the most optimal features. Equation (2) reduces the error in each iteration using the specified characteristics, increasing the classification accuracy in the process. Population size, generation number, initial weighting factors, cognitive and social scaling factors, and probabilities of mutation and crossover are some of the regulatory parameters for population-based algorithms. To achieve the best results, these parameters must be fine-tuned, and the performance of optimization methods is determined by parameter fine-tuning; otherwise, these parameter values may end up in the optimization algorithm in a local optimum stage, increasing the computational cost of the optimization algorithm problem. To address the aforementioned concerns, the RHOSFS technique is applied. This approach is applied to create the best subset of input features to increase the efficiency of the classification process. The RHSOFS comprehensive functional model is depicted in Figure 3. Below is a detailed description of the proposed RHOSFS approach.
First, generate, select, and examine a random sample of a binary (0, 1) population for the total number of input features for FS. Create a feature subset that is equal to 1 for each representation of the input population. For the computation of fitness, the extracted optimal input features are fed into classification models such as NB, SVM, KNN, and DT to compute fitness. The goal of this research work is to find the best subset of input features that reduce the model's fitness while also improving its accuracy. Population size, generation number, initial weighting factors, cognitive and social scaling factors, and probabilities of mutation and crossover are some of the regulatory parameters for population-based algorithms. To achieve the best results, these parameters must be fine-tuned, and the performance of optimization methods is determined by parameter fine-tuning; otherwise, these parameter values may end up in the optimization algorithm in a local optimum stage, increasing the computational cost of the optimization algorithm problem. To address the aforementioned concerns, the RHOSFS technique is applied. This approach is applied to create the best subset of input features to increase the efficiency of the classification process. The RHSOFS comprehensive functional model is depicted in Figure 3. Below is a detailed description of the proposed RHOSFS approach.
First, generate, select, and examine a random sample of a binary (0, 1) population for the total number of input features for FS. Create a feature subset that is equal to 1 for each representation of the input population. For the computation of fitness, the extracted optimal input features are fed into classification models such as NB, SVM, KNN, and DT to compute fitness. The goal of this research work is to find the best subset of input features that reduce the model's fitness while also improving its accuracy.
where r 1 denotes a random number between [0, 1], x is the previous position of the leader, leader pos denotes the old position of the leader, and j refers to "each diminution". After the leader's position is updated, all members (or search agents) update their positions using Equation (5).
where circ denotes circular motion, it is calculated as follows to try to replicate the circle system in Equation (6): where r 2 is the radius and is a random number between [0, 1], and ang denotes the angle of a move and is a random value between [0, 360] in Equations (7) and (8). Every generation, the ang is updated as well, and this update is based on the lower and upper bounds of the variables, where lb and ub are the lower and upper bands of the random number generator, respectively.
If the value of the output grows larger than 360, or less than 0, the angle (ang) can be set to 360 or 0 to keep it within the desired range.
Algorithm 1 explains the RHSOFS pseudo-code. The RHSOFS begins by producing a binary population of P agents at random and examining all of the features. For each instance of the population to be studied, create a feature subset equal to 1. These chosen attributes are fed into classification models in order to calculate fitness value. Equation (2) calculates the err(x) by the differences between an actual and a predicted value of the model, where x = 1, 2, . . . , n and n is the number of testing observations. The model's fitness is calculated by dividing the sum of errors by the number of observations, as shown in Equation (3). The algorithm then attempts to update the position of the Leader according to Equation (4) and the position of each search agent according to Equation (5). Then, using Equation (3), determine each search agent's new fitness. According to Equations (9) and (10), this algorithm progresses toward angle updating. The bestX persons are those who have the lowest fitness value. This algorithm then tries to update each search agent's position in accordance with Equation (5).
The new individuals are chosen only if their new fitness value is greater than or equal to their prior fitness value, and their new fitness value is flipped. For the next generation, only those with the lowest fitness value are chosen. Finally, the algorithm selects the most suitable candidates. Create an initial population of 0 & 1 of P agents randomly.

2.
Set the dimension of the problem, D = P, where P is the number of agents. 3.
Set Low to 1 and High to D, where Low and High refer to the low and high dimensions, respectively. 4.
Generate the value of r1 and r2, where r1 is a random number (0, 1) and r2 is a random radius (0, 360).
Set Leader = the best agent. 9.
for (i = 1 to n) do 12.
Update the position of each search agent according to Equation (5). 14.
Select the best member of the population → bestX = X (min (fitness)) 16.
Update the angle according to Equations (9) and (10) Return the best agent

Experimental and Statistical Result Analysis and Discussion
To evaluate and examine the performance and effectiveness of the proposed FS approach, called RHSOFS, we have compared it with other useful approaches such as DEFS, GAFS, PSOFS, and ACOFS. Numerical experiments have been conducted on a real-world credit card fraud dataset using a range of data mining approaches to test the efficiency of the presented approach. The stratified cross-validation approaches have been used to create ten identical datasets due to a shortage of real credit card fraud datasets. The stratified cross-validation (SCV) method, which is related to the k-fold cross-validation method, is used to provide training as well as test indices for dividing the entire dataset into train and test sets by keeping the percentage of samples for each class the same for each fold. The stratified cross-validation approach is used for classification problems and when the dataset is imbalanced. Imbalanced datasets can create overfit results. The SCV technique creates new datasets by preserving the target class ratio in each fold the same as it is in the full dataset rather than randomly splitting the entire dataset.
Steps followed to create ten identical datasets using a stratified cross-validation approach: 1. Initially, the entire original dataset has been randomly shuffled; 2.
The randomly shuffled dataset is split into k folds (we have set k = 10); 3.
For each fold, the training dataset samples are selected by using stratified sampling in order to maintain the class distribution as per the original datasets. The test set is formed from the remaining datasets. This process is repeated for each fold.
In September 2013, European cardholders using credit cards collected the dataset in two days, and it is now available via the ULB machine learning group. This dataset is transformed using the PCA approach, which has 28 principal components or features spanning from V1 to V28. However, only 30 are included in the evaluation. There are a total of 284,807 transactions in this dataset, with 492 of them being fraud transactions, making it highly lopsided and skewed toward fraud [32]. The tests are performed on a PC with the following specifications: a 1.60 GHz Intel Core i5-8250U processor and 8 GB of RAM. Matlab 2014b is used to implement these approaches. The size of the population (Pop) and the maximum number of possible generations (MaxGen) are the variables that are used to train and test the model in the experiment.
The values of Pop and MaxGen have been set to 10 and 100, respectively, for the optimum overall performance. This section examines the performance of DEFS, GAFS, PSOFS, ACOFS, and RHSOFS techniques with respect to the number of selected features and the accuracy achieved by each technique. Because the procedures are stochastic, ten trials were carried out with a random sample population. The average classification accuracy for each dataset and FS approach is shown in Table 1. The results suggest that the proposed RHSOFS approach can achieve greater values than other approaches, such as DEFS, GAFS, PSOFS, and ACOFS, with the best optimal features. There are significantly fewer selected features compared to the original input features, as shown in Table 2. In comparison to other existing approaches (DEFS, GAFS, PSOFS, and ACOFS), which produce similar accuracy values with few variations, the new RHSOFS methodology produces significant accuracy results for almost all datasets using Equation (11). The average accuracy (%) over the 10 datasets using NB, KNN, SVM, and DT classifiers has been studied and analyzed, as depicted in Figure 4, applied over the FS approaches such as DEFA, GAFS, PSOFS, ACOFS, and RHSOFS, respectively. The recall comparison, as shown in Figure 5, identifies the effectiveness of the proposed method by comparing the recall values of all the models over each dataset. The results show that the proposed RHSOFS approaches have outperformed other FS approaches.
The average accuracy (%) over the 10 datasets using NB, KNN, SVM, and DT classifiers has been studied and analyzed, as depicted in Figure 4, applied over the FS approaches such as DEFA, GAFS, PSOFS, ACOFS, and RHSOFS, respectively. The recall comparison, as shown in Figure 5, identifies the effectiveness of the proposed method by comparing the recall values of all the models over each dataset. The results show that the proposed RHSOFS approaches have outperformed other FS approaches.     fiers has been studied and analyzed, as depicted in Figure 4, applied over the F proaches such as DEFA, GAFS, PSOFS, ACOFS, and RHSOFS, respectively. The comparison, as shown in Figure 5, identifies the effectiveness of the proposed metho comparing the recall values of all the models over each dataset. The results show th proposed RHSOFS approaches have outperformed other FS approaches.    The goal of FS approaches such as DEFA, GAFS, PSOFS, ACOFS, and RHSOFS is to locate the best features and cut down on execution time to build a reliable credit card fraud detection system. For most FS algorithms, controlling elements such as the size of the population and the number of iterations are considered 10 and 100, respectively. Only approach-specific regulating parameters, such as probabilities of mutation, crossover, selection operators, initial weighting factors, and cognitive and social scaling factors, differ across all other FS techniques. Tables 3 and 4 show performance indicators for all datasets, including precision, recall, f1-score, Matthews correlation coefficient (MCC), and specificity with and without the feature (WTFS) subset selection. Finally, the proposed approach has a significantly lower number of selected features than previous approaches, resulting in a significant improvement in classification accuracy. Different performance measures from the confusion matrix include classification accuracy (Equation (11)), precision (Equation (12)), recall (Equation (13)), f-measure (Equation (14)), MCC (Equation (15)), and specificity (Equation (16)). True positive, true negative, false positive, and false negative are represented by the letters TP, TN, FP, and FN, respectively.
The performance of the proposed RHSOFS model has been compared to that of other models using statistical analysis [14]. The Friedman test is a non-parametric statistical method for analyzing the results of models. Friedman proposes two hypotheses (H0 and H1), with H0 implying that there is no significant variance among all approaches and that all approaches are considered equivalent, whereas H1 implies the opposite. The Friedman test is one of the best techniques to determine the importance of statistics across all approaches, and an individual approach is rated based on its accuracy. This test has been conducted by considering the smallest number is assigned the highest rank, while the highest number is assigned the lowest rank. Table 5 shows the average rank of many models in relation to four different classifiers (NB, KNN, DT, and SVM) using Equation (17). It divides the number of classifiers by the total of all of their ranks. Similarly, Equation (18) is used to determine the average rank of various models in relation to the datasets shown in Table 6     Using Equation (19), the Friedman test statistic has a chi-square (X 2 F ) distribution with (P − 1) degrees of freedom. Whereas Equation (20) yields a value of 5.442917 for the Friedman test statistic (F F ), where N is the number of datasets and P denotes the number of models that were employed in this experiment, The number of datasets used in this study is ten, the number of models employed is six, and the threshold of significance of a is 0.05 with degrees of freedom (5,45). The null hypothesis is rejected because the crucial value of F F is 2.42, which is smaller than the Friedman statistics of F F = 5.442917.
The number of algorithms employed in this experiment is P, z is used to calculate the z-score value using Equation (21), and the number of datasets is N. AR i and AR j denote the average rank of the ith and jth models, respectively. All the models are compared to the suggested model using the z-value, p-value, and α/(P − i), and the results are shown in Table 7. There may be some insignificant, duplicated, or noisy data in the feature set, which increases processing time while also affecting the model's performance. Only optimal features are processed by the model, and all unwanted, redundant, and noisy characteristics are eliminated. These relevant features boost the model's performance while cutting down on computation time. FS methods, on the other hand, pick a subset of significant and relevant original features. This paper compares the model's performance, including NB, KNN, SVM, and DT, using wrapper-based FS methods such as DEFS, GAFS, PSOFS, ACOFS, and RHSOFS. To compute to select the best features, modeling FS processes can be used in two ways: first, to choose fixed optimum features, and second, to select variable optimum features. It is a difficult task to determine the set number of optimum features for all models in the fixed optimal FS technique. The dataset used in the experiment was already processed using the PCA method, which contains 28 principal components or features ranging from V1 to V28. There may be some features that are duplicated. Another difficult task is to reduce the redundant features and replace redundant features with the next most important features. On the other hand, distinct FS techniques based on several optimization algorithms select the fewest number of variable optimum features. Individual features are assigned a value of 1 or 0 in this method. The goal of the presented RHSOFS approach is to remove irrelevant features by selecting the most relevant and appropriate features to improve the model's performance. The number of relevant features used is determined by the performance of the optimization methods. Furthermore, applying different optimization techniques does not guarantee that you will select the same number of optimum features. Finally, the model has learned to use these selected relevant features gained by optimization procedures. The goal of this paper is to boost the model's performance by optimizing the number of relevant features. A scale limit can often be provided based on the rank of the relevant features or the number of features to be chosen in a fixed optimal FS technique, ensuring that the predefined number of features is selected. The comparison is based on the best FS using different optimization algorithms.

Conclusions
To determine an optimal subset of features, a new unique FS technique depending on the RHO algorithm named RHSOFS, has been presented. The RHSOFS approach is used to explore the most relevant features by updating the irrelevant, redundant, and noisy features. Four classifiers, such as NB, KNN, SVM, and DT, have been employed on ten datasets for evaluating the efficacy of the proposed RHSOFS approach.
The proposed RHSOFS strategy effectively reduces duplicate features and outperforms existing approaches, according to the experimental data. In terms of classification accuracy and the number of characteristics chosen, all the models are compared. The data points that are insignificant, duplicated, or noisy should be removed. This could result in data loss, which is a drawback of this method. Certainly, the use of AI techniques for the prediction of future behaviors can have good results, although, in the field of cybercrime, it is more complex since it is trained with some acquired knowledge, not counting on the evolution in the learning of the offender. Although this article presents a very interesting proposal, we would recommend taking into account works such as "Evolution Oriented Monitoring oriented to Security Properties for Cloud Applications" in the sense of providing applications with the ability to evolve securely by integrating acquired knowledge. On the other hand, it would be interesting to study how to endow Trusted Computing-type trusted hardware with this type of intelligence so that they can provide hybrid hardware-software certification mechanisms in different scenarios, as proposed in "Software and Hardware Certification Techniques in a Combined Certification Model". This proposed strategy could even be applied to a wide range of complex and different applications with many features, such as mail fraud detection, intrusion detection, and fake insurance analysis. Funding: This research is funded VIT-AP University, Amravati-522237, Andhra Pradesh, India.

Conflicts of Interest:
The authors declare no conflict of interest.