Red Fox Optimizer with Data-Science-Enabled Microarray Gene Expression Classiﬁcation Model

: Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classiﬁcation techniques suffer from overﬁtting and the high dimensionality of gene expression data. Therefore, the feature (gene) selection approach plays a vital role in handling a high dimensionality of data. Data science concepts can be widely employed in several data classiﬁcation problems, and they identify different class labels. In this aspect, we developed a novel red fox optimizer with deep-learning-enabled microarray gene expression classiﬁcation (RFODL-MGEC) model. The presented RFODL-MGEC model aims to improve classiﬁcation performance by selecting appropriate features. The RFODL-MGEC model uses a novel red fox optimizer (RFO)-based feature selection approach for deriving an optimal subset of features. Moreover, the RFODL-MGEC model involves a bidirectional cascaded deep neural network (BCDNN) for data classiﬁcation. The parameters involved in the BCDNN technique were tuned using the chaos game optimization (CGO) algorithm. Comprehensive experiments on benchmark datasets indicated that the RFODL-MGEC model accomplished superior results for subtype classiﬁcations. Therefore, the RFODL-MGEC model was found to be effective for the identiﬁcation of various classes for high-dimensional and small-scale microarray data.


Introduction
The technology of DNA microarray assists in making it simpler to monitor a huge number of genes simultaneously [1]. Earlier works indicated that the technology of DNA microarray could be useful in the classification of cancer disease [2]. To classify microarray gene expression, several techniques and methods were introduced that have satisfactory outcomes [3]. For the microarray dataset, the gene expression value is organized through the matrix, where samples are rows and genes or features are columns. The value of gene expression is a real number, and it defines the expression level of a gene following certain criteria [4]. Due to the limited number of samples with an enormous number of features from the gene expression data, the systematic machine learning (ML) technique does not work well for cancer classifiers [5].
A microarray experiment produces many gene expression data in an individual sample. The ratio of the number of genes (features) to the number of patients (samples) is skewed,

Related Works
In [11], a novel bacterial colony optimization with multidimensional population was named the BCO-MDP technique and was projected for FS to resolve classifier issues. Addressing the combinational problem connected with FS, the population with several dimensionalities was demonstrated as subsets of distinct feature sizes. Zeebaree et al. [12] examined a deep learning (DL) method dependent upon CNN for the classification of microarray data. In contrast to some approaches like vector machine recursive feature elimination and improved random forest (mSVM-RFE-iRF and varSeIRF), CNN revealed that not every datum has higher efficiency. In [13], a two-stage sparse logistic regression (LR) was presented to attain an effectual subset of genes with higher classifier abilities by integrating the screening method as a filtering model and adaptive lasso with novel weight as an embedding process. During the primary phase, the independence screening approach utilized as a screening method recollected individuals' genes and demonstrated maximum individual correlation with cancer class level. During the secondary phase, the adaptive lasso with novel weight was executed to address higher correlations amongst the screened genes from the primary step.
Shukla et al. [14] progressed a novel hybrid framework named CMIMAGA by integrating conditional mutual information maximization (CMIM) and adaptive genetic algorithm (AGA), and it is utilized for determining important biomarkers in gene expression data. CMIM was executed as a filter to extract out one of the meaningless genes. A wrapper approach such as AGA was utilized for choosing the extremely discriminating genes.
In [15], elephant search algorithm (ESA)-based optimization was presented for selecting optimum gene expression in a huge volume of microarray data. The firefly search (FFS) was utilized to understand the ESA's efficiency in the FS procedure. The stochastic gradient descent (SGD)-based DNN as DL with the Softmax activation function was utilized on the decreased feature (genes) of the optimum classifier at various instances based on its gene expression level. Sayed et al. [16] examine an ensemble FS approach dependent upon a t-test and GA. After preprocessing the data utilizing a t-test, a nested GA called Nested-GA was utilized to obtain the optimum subset of features using two distinct datasets. The nested GA had two nested GAs (outer and inner), which ran on two different types of datasets. Li et al. [17] established a more effective execution of linear SVMs, enhancing the recursive feature elimination approach and combining selected informative genes. In addition, they presented an easy resampling approach for preprocessing the dataset that creates the data distribution of distinct types of samples that is balanced and improves the classification performance.

The Proposed Model
This study proposes a novel RFODL-MGEC model for microarray gene expression classification. The presented RFODL-MGEC model primarily employed an RFO-FS approach for deriving the optimum subset of features. Next, the BCDNN model was utilized for data classification, and the parameters involving the BCDNN technique were optimally tuned using a CGO algorithm. Figure 1 demonstrates the overall block diagram of our proposed RFODL-MGEC technique.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 21 (FFS) was utilized to understand the ESA's efficiency in the FS procedure. The stochastic gradient descent (SGD)-based DNN as DL with the Softmax activation function was utilized on the decreased feature (genes) of the optimum classifier at various instances based on its gene expression level. Sayed et al. [16] examine an ensemble FS approach dependent upon a t-test and GA. After preprocessing the data utilizing a t-test, a nested GA called Nested-GA was utilized to obtain the optimum subset of features using two distinct datasets. The nested GA had two nested GAs (outer and inner), which ran on two different types of datasets. Li et al. [17] established a more effective execution of linear SVMs, enhancing the recursive feature elimination approach and combining selected informative genes. In addition, they presented an easy resampling approach for preprocessing the dataset that creates the data distribution of distinct types of samples that is balanced and improves the classification performance.

The Proposed Model
This study proposes a novel RFODL-MGEC model for microarray gene expression classification. The presented RFODL-MGEC model primarily employed an RFO-FS approach for deriving the optimum subset of features. Next, the BCDNN model was utilized for data classification, and the parameters involving the BCDNN technique were optimally tuned using a CGO algorithm. Figure 1 demonstrates the overall block diagram of our proposed RFODL-MGEC technique.

Data Preprocessing
The z-score normalization approach was derived at the initial phase, which computed the standard deviation and arithmetic mean of provided gene data. It was evident

Data Preprocessing
The z-score normalization approach was derived at the initial phase, which computed the standard deviation and arithmetic mean of provided gene data. It was evident that the normalization approach performed effectively with earlier knowledge regarding the Appl. Sci. 2022, 12, 4172 4 of 21 average score and score variation of the matcher. The normalization scores were obtained using the following: where σ implies standard deviation and µ indicates arithmetic mean of provided data. In this study, the normalization of the smoothed data was carried out via z-score normalization.

Design of RFO-Based Feature Selection Approach
During the process of feature selection, the RFO-FS model was executed and the optimum set of features was chosen. A new metaheuristic approach was determined, which was named the RFO approach, and was based on the hunting processes of red foxes. Initially, the red foxes seek food in territories [18]. This can be modelled as an exploration term for global search. Next, they move over the territory to get close to their prey before attacking. This stage can be modelled as an exploitation term for local search. The process was initiated by a constant value of random candidates; each one determines a point, where x = (x 0 , x 1 , . . . , x n−1 ) and n defines a coordinate. For discriminating every fox x i in iteration t, where i indicates the fox number in the population, we introduce the notation x i j t , in which i describes the coordinate as the solution space dimension. Based on f ∈ R n , the criterion function of the n variable depends on the dimension of the searching space, and the notation (x) ( is the optimum solution when the value of function f ( x) (i) represents a global optimal on [a, b]. The outcomes of the estimated function by the candidate are sorted initially according to fitness condition, and for (x best ) t , the square of Euclidean distance is estimated for the candidate in the following: and the candidate moves towards the optimal population as: where α defines an arbitrary number in which ∈ (0, D((x) best ) c , ( x) best ) t . In the RFO approach, movements and observations delude prey when hunting in a local searching phase. For simulating the probability of a fox approaching the prey, an arbitrary number γ ∈ [0, 1] set in the iteration for each candidate can be used.
move doser i f γ > 3/4 stay and hile i f γ ≤ 3/4 (4) Figure 2 depicts the steps involved in RFO. The radius comprises a as an arbitrary number within 0 and 0.2, and ϕ 0 denotes an arbitrary number within 0 and 2π which defines the fox observation angle: β represents an arbitrary number within 0 and 1. The approaching method of the fox was modelled as follows: . .
x Neω n−1 = a × r × sin(ϕ 1 ) + a × r × sin(ϕ 2 ) + · · · + a × r × sin(ϕ n−1 ) + a × r × sin(ϕ n−1 ) + X actual n−a The radius comprises as an arbitrary number within 0 and 0.2, and 0 denotes an arbitrary number within 0 and 2 which defines the fox observation angle: represents an arbitrary number within 0 and 1. The approaching method of the fox was modelled as follows: ( 1 ) + × × cos( −1 ) + −2 −1 = × × sin( 1 ) + × × sin( 2 ) + ⋯ + × × sin( −1 ) + × × sin( −1 ) + − (6) Five percent of the worst-case candidates were detached and replaced with upgraded candidates. In the same way, two of the optimal individuals were accomplished as ( (1)) and ( (2)) as an alpha couple in iteration . This can be mathematically expressed in the following: Moreover, the diameter of habitat using Euclidean distance can be accomplished by Equation (8): Five percent of the worst-case candidates were detached and replaced with upgraded candidates. In the same way, two of the optimal individuals were accomplished as (X(1)) t and (X(2)) t as an alpha couple in iteration t. This can be mathematically expressed in the following: Moreover, the diameter of habitat using Euclidean distance can be accomplished by Equation (8): An arbitrary number, θ, was considered in the following: New nomadic candidate i f θ > 0.45 Reproduction o f the al pha couple i f θ ≤ 0.45 (9) In this case, θ ∈ [0, 1]. In addition, the new candidate was accomplished by the alpha couple in the following:

Process Involved in BCDNN-Based Classification
The BCDNN model was developed for microarray gene expression classification [19]. The DNN is separated into decoder, encoder, translator, and simulator. Let T represent the amplitude response and phase inspired from the finite-difference time-domain (FDTD) methodology and T represent the forecast from the simulator. When the module is trained, the simulator predicts T as an input image with a rapidly moving meta-atom structure compared to its arithmetical matching part. For backward calculations, T with dimensions of 82 × 1 is converted to an image with dimensions of 40 × 40, which indicates a lower input parameter than the output parameter for regression processes. The enormous divergence makes it problematic for a system to generalize and converge well, particularly once the input spectra have stronger variation near the resonant frequency. The authors of the aforementioned study attempted to avoid this problem by including a generative adversarial network or bilinear tensor layer. Initially, it characterizes every meta-atom with a lower dimension eigen vector with dimensions of 82 × 1 through a pretrained autoencoder. The size of each tensor all over the network is noticeable below all the blocks. Dissimilar layers of the CNN are interconnected with convolution operations. The kernel multiplies the value of the tensor in the kernel region and later sums it with a novel value in tensor. In CNN, we attached two FC layers (dimensions are given below) to estimate a spectral tensor. A leaky ReLU of α = 0.2 was employed for all the convolution layers, and tan h was employed for all the FC layers. The convolution layer maps the input tensor x k with the output tensor x k+1 : Leaky ReLU (·) represents the rectified linear unit action, and CONV denotes the convolutional operators (include bias terms). The k 1 subscript signifies the number of networks. In the simulator, k 1 = 32, 32, 64, 64, 128, 128. Strides of two are employed in two, four, and six convolutions for replacing the max-pooling layer. A dropout layer by means of 0.1 drops behindhand all the FC layers except the output layer is applied to prevent overfitting networks. Mean absolute error (MAE) was adapted for calculating the weight and gradient. MAE was determined by: Now, N indicates the amount of the entrances of T predicted . For cost functions, MAE is insensitive to outliers; however, it is unconducive to the convergence. To guarantee the module stability, the learning rate declines with the number of iterations.

Parameter Optimization Using CGO Algorithm
In order to optimally tune the parameters involved in the BCDNN method, the CGO approach was employed [20]. The CGO approach was projected depending on the presented principles of the chaos model. Important methods of fractals and chaos games were utilized to formulate a mathematical model for the CGO approach. The CGO approach considered the count of solution candidates (S) in this determination, which represents some appropriate seed inside the Sierpinski triangle. The mathematical process of this feature is as follows: In this case, n signifies the count of eligible seeds (candidate solutions) inside the Sierpinski triangle (searching space), and d defines the dimension of this seed. The primary place of these eligible seeds is demonstrated arbitrarily from the searching space as: where R implies an arbitrary number in the interval of zero and one. The process for the primary seed is represented under: x i , y i , and z i define an arbitrary integer of zero or one for representing the possibility of rolling a die. Then, the schematic presentation of the described process for the second seed is defined as: The schematic representations of the third and fourth seeds are described as: in which k signifies an arbitrary integer in the interval of zero and one. During the CGO approach, different constructions are presented for x i , which controls the effort to restrict seeds.
In this case, rand implies a uniformly distributed random number in the interval of zero and one. Ψ and Ω are arbitrary integers in the interval of zero and one. For selecting better parameters in the BCDNN technique, the CGO method is offered as a main function, representing a positive combination to achieve higher performance. During this process, error rate is controlled as the fitness function, and the solution with lower error is observed as the optimum one. It can be defined as: Total number o f samples * 100 (20)

Experimental Validation
The performance validation of the RFODL-MGEC model was tested using three benchmark datasets [21], namely, prostate cancer, colon tumor, and ovarian cancer datasets. The details related to the datasets are provided in Table 1. The proposed model selected a set of 6145, 984, and 8424 features for prostate, colon, and ovarian cancer datasets, respectively.             Figure 5 shows that the RFODL-MGEC model offered reduced training/accuracy loss for the classification process of the test data.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 10 of 21 Figure 5 exemplifies the training and validation loss inspection of the RFODL-MGEC model with the prostate cancer dataset. Figure 5 shows that the RFODL-MGEC model offered reduced training/accuracy loss for the classification process of the test data.     Table 3 demonstrates a brief classification performance report on the RFODL-MGEC model with the colon tumor dataset. The experimental results indicated that the RFODL-MGEC model demonstrated effective results with the test dataset. For instance, with the entire dataset, the RFODL-MGEC model obtained an average accu y , prec n , reca l , and F score of 95.16%, 94.37%, 95.23%, and 94.77%, respectively. With 70% of the training dataset, the RFODL-MGEC method attained an average accu y , prec n , reca l , and F score of 95.35%, 93.75%, 96.55%, and 94.88%, respectively. Additionally, with 30% of the testing dataset, the RFODL-MGEC algorithm obtained an average accu y , prec n , reca l , and F score of 94.74%, 95.83%, 93.75%, and 94.49%, respectively.              Figure 9 illustrates a set of confusion matrices generated by the RFODL-MGEC algorithm on the test ovarian cancer dataset. For the entire dataset, the RFODL-MGEC technique categorized 159 images as ovarian and 87 images as normal. With 70% of the training dataset, the RFODL-MGEC algorithm categorized 102 images as ovarian and 69 images as normal. For 30% of the testing dataset, the RFODL-MGEC technique categorized 57 images as ovarian and 18 images as normal.  Figure 9 illustrates a set of confusion matrices generated by the RFODL-MGEC algorithm on the test ovarian cancer dataset. For the entire dataset, the RFODL-MGEC technique categorized 159 images as ovarian and 87 images as normal. With 70% of the training dataset, the RFODL-MGEC algorithm categorized 102 images as ovarian and 69 images as normal. For 30% of the testing dataset, the RFODL-MGEC technique categorized 57 images as ovarian and 18 images as normal. Table 4 shows a brief classification performance report on the RFODL-MGEC technique with the ovarian cancer dataset. The experimental results indicated that the RFODL-MGEC technique demonstrated effective results on the test dataset. For instance, with the entire dataset, the RFODL-MGEC system obtained an average accu y , prec n , reca l , and F score of 97.23%, 97.11%, 96.88%, and 96.99%, respectively. With 70% of the training dataset, the RFODL-MGEC algorithm obtained an average accu y , prec n , reca l , and F score of 96.61%, 96.49%, 96.49%, and 96.49%, respectively. Eventually, with 30% of the testing dataset, the RFODL-MGEC algorithm obtained an average accu y , prec n , reca l , and F score of 98.68%, 99.14%, 97.37%, and 98.21%, respectively.

Discussion
A detailed comparative examination of the RFODL-MGEC model with recent approaches [15] for prostate cancer is provided in Table 5 and Figure 12. The experimental outcomes indicated that the FFSDL and ESADL models reached lower classification outcomes than other approaches. At the same time, the SVM and RF models accomplished slightly enhanced classification outcomes compared with the FFSDL and ESADL models. Along with that, the ABC-SVM and PSO-SVM models accomplished closer classification performances, with an of 96.06% and 93.71%, respectively.

Discussion
A detailed comparative examination of the RFODL-MGEC model with recent approaches [15] for prostate cancer is provided in Table 5 and Figure 12. The experimental outcomes indicated that the FFSDL and ESADL models reached lower classification outcomes than other approaches. At the same time, the SVM and RF models accomplished slightly enhanced classification outcomes compared with the FFSDL and ESADL models. Along with that, the ABC-SVM and PSO-SVM models accomplished closer classification performances, with an accu y of 96.06% and 93.71%, respectively.  The proposed RFODL-MGEC model resulted in maximum classification efficiency, with an , , and of 96.77%, 96.88%, and 96.88% respectively. A brief comparative examination of the RFODL-MGEC approach with recent approaches for colon tumors is given in Table 6 and Figure 13. The experimental outcomes indicated that the FFSDL and ESADL approaches reached lower classification outcomes than the other approaches. Likewise, the SVM and RF approaches accomplished somewhat enhanced classification outcomes compared with the FFSDL and ESADL approaches.  The proposed RFODL-MGEC model resulted in maximum classification efficiency, with an accu y , prec n , and reca l of 96.77%, 96.88%, and 96.88% respectively.
A brief comparative examination of the RFODL-MGEC approach with recent approaches for colon tumors is given in Table 6 and Figure 13. The experimental outcomes indicated that the FFSDL and ESADL approaches reached lower classification outcomes than the other approaches. Likewise, the SVM and RF approaches accomplished somewhat enhanced classification outcomes compared with the FFSDL and ESADL approaches.  Along with that, the ABC-SVM and PSO-SVM models accomplished closer classification performances, with an of 93.94% and 93.80%, respectively. Finally, the RFODL-MGEC model resulted in higher classification efficiency with an , , and of 94.74%, 95.83%, and 93.75% respectively. A detailed comparative examination of the RFODL-MGEC algorithm with recent approaches for ovarian cancer is given in Table 7 and Figure 14. The experimental outcomes indicated that the FFSDL and ESADL methods reached lower classification outcomes than the other approaches.  Along with that, the ABC-SVM and PSO-SVM models accomplished closer classification performances, with an accu y of 93.94% and 93.80%, respectively. Finally, the RFODL-MGEC model resulted in higher classification efficiency with an accu y , prec n , and reca l of 94.74%, 95.83%, and 93.75% respectively.
A detailed comparative examination of the RFODL-MGEC algorithm with recent approaches for ovarian cancer is given in Table 7 and Figure 14. The experimental outcomes indicated that the FFSDL and ESADL methods reached lower classification outcomes than the other approaches.  The SVM and RF models accomplished some enhanced classification outcomes compared with the FFSDL and ESADL models. This was followed by the ABC-SVM and PSO-SVM techniques, which accomplished closer classification performances with an of 95.42% and 95.81%, respectively. Finally, the RFODL-MGEC methodology resulted in maximum classification efficiency, with an , , and of 98.68%, 99.11%, and 97.37%, respectively.
Finally, a computation time (CT) examination of the RFODL-MGEC technique with recent models for the three distinct datasets is provided in Table 8. The experimental results indicated that the RFODL-MGEC technique showed a lower CT compared with the other methods. The proposed RFODL-MGEC technique required a lower CT of 1.231, 0.432, and 1.542 s with the test prostate cancer, colon tumor, and ovarian cancer datasets, respectively. After examining the aforementioned tables and figures, we noted that the RFODL-MGEC model was able to maximize classification performance compared with the other The SVM and RF models accomplished some enhanced classification outcomes compared with the FFSDL and ESADL models. This was followed by the ABC-SVM and PSO-SVM techniques, which accomplished closer classification performances with an accu y of 95.42% and 95.81%, respectively. Finally, the RFODL-MGEC methodology resulted in maximum classification efficiency, with an accu y , prec n , and reca l of 98.68%, 99.11%, and 97.37%, respectively.
Finally, a computation time (CT) examination of the RFODL-MGEC technique with recent models for the three distinct datasets is provided in Table 8. The experimental results indicated that the RFODL-MGEC technique showed a lower CT compared with the other methods. The proposed RFODL-MGEC technique required a lower CT of 1.231, 0.432, and 1.542 s with the test prostate cancer, colon tumor, and ovarian cancer datasets, respectively. After examining the aforementioned tables and figures, we noted that the RFODL-MGEC model was able to maximize classification performance compared with the other methods.

Conclusions
In this study, a novel RFODL-MGEC model was established for microarray gene expression classification. The presented RFODL-MGEC model primarily employed an RFO-FS technique for deriving an optimal subset of features. Next, the BCDNN model was utilized for data classification, and the parameters involved in the BCDNN technique were optimally tuned by utilizing a CGO algorithm. Comprehensive experiments on benchmark datasets showed that the RFODL-MGEC model accomplished superior results for subtype classifications. Therefore, the RFODL-MGEC model was found to be effective for the identification of different classes for high-dimensional and small-scale microarray data. Future directions involve the use of data clustering and feature reduction approaches to enhance classification performance. The proposed model should be tested on largescale datasets. Data Availability Statement: Data sharing is not applicable to this article as no datasets were generated during this study.