Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance

Diabetes is a life-threatening, non-communicable disease. Diabetes mellitus is a prevalent chronic disease with a significant global impact. The timely detection of diabetes in patients is necessary for an effective treatment. The primary objective of this study is to propose a novel approach for identifying type II diabetes mellitus using microarray gene data. Specifically, our research focuses on the performance enhancement of methods for detecting diabetes. Four different Dimensionality Reduction techniques, Detrend Fluctuation Analysis (DFA), the Chi-square probability density function (Chi2pdf), the Firefly algorithm, and Cuckoo Search, are used to reduce high dimensional data. Metaheuristic algorithms like Particle Swarm Optimization (PSO) and Harmonic Search (HS) are used for feature selection. Seven classifiers, Non-Linear Regression (NLR), Linear Regression (LR), Logistics Regression (LoR), Gaussian Mixture Model (GMM), Bayesian Linear Discriminant Classifier (BLDC), Softmax Discriminant Classifier (SDC), and Support Vector Machine—Radial Basis Function (SVM-RBF), are utilized to classify the diabetic and non-diabetic classes. The classifiers’ performances are analyzed through parameters such as accuracy, recall, precision, F1 score, error rate, Matthews Correlation Coefficient (MCC), Jaccard metric, and kappa. The SVM (RBF) classifier with the Chi2pdf Dimensionality Reduction technique with a PSO feature selection method attained a high accuracy of 91% with a Kappa of 0.7961, outperforming all of the other classifiers.


Introduction
Some statistics related to diabetes worldwide are as follows. The global prevalence of diabetes among adults (20-79 years old) was 10.5% in 2021 [1]. The prevalence of diabetes is higher in low-and middle-income countries than in high-income countries [2]. The region with the highest prevalence of diabetes is the Middle East and North Africa, where 13.9% of adults have diabetes. Diabetes was the ninth leading cause of death worldwide in 2019, with 4.2 million deaths attributed to the disease or its complications. The causes of diabetes in most cases are consuming food at irregular intervals, not doing any physical activity, and so on [3]. When a healthy human consumes a normal meal during the day, it increases their level of blood glucose around 120-140 mg/dL [4].
India has a high prevalence of diabetes, and is called the world's diabetes capital. According to the International Diabetes Federation [5], in 2021, India had an estimated 87 million adults aged between 20 and 79 years with diabetes. This number is projected to increase to 151 million by 2045. The prevalence of diabetes in India varies across regions, with the southern and northern states having higher prevalence rates compared to the eastern and northeastern states. The states with the highest prevalence rates are Kerala, Tamil Nadu, and Punjab [6]. Type 2 diabetes accounts for more than 90% of all cases of diabetes in India [7]. Type 1 diabetes is less common and accounts for less than 10% of all diabetes cases. The complications of diabetes, such as heart disease, kidney disease, Howlader, Koushik Chandra et al. [24] conducted a study which used machine learning to identify features associated with T2D in Pima Indians. The best classifiers were Generalized Boosted Regression modeling, Sparse Distance Weighted Discrimination, a Generalized Additive Model using LOESS, and Boosted Generalized Additive Models. The study found that Generalized Boosted Regression modeling had the highest accuracy (90.91%), followed by Kappa statistics (78.77%) and specificity (85.19%). Sisodia, Deepti et al. [25] conducted a study comparing the performance of three machine learning algorithms for detecting diabetes: Decision Tree, Support Vector Machine (SVM), and Naive Bayes. The study used the Pima Indians Diabetes Database (PIDD) and evaluated the algorithms on various measures, including accuracy, precision, F-measure, and recall. The study found that Naive Bayes had the highest accuracy (76.30%), followed by Decision Tree (74.67%) and SVM (72.00%). Mathur, Prashant et al. [26] conducted a study about Indian diabetes statistics. In India, 9.3% of adults have diabetes, and 24.5% have impaired fasting blood glucose. Of those with diabetes, only 45.8% are aware of their condition, 36.1% are on treatment, and 15.7% have it under control. This is lower than the awareness, treatment, and control rates in other countries. For example, in the United States, 75% of adults with diabetes are aware of their condition, 64% are on treatment, and 54% have it under control. Kazerouni, Faranak et al. [27] conducted a performance evaluation of various algorithms, the AUC, sensitivity, and specificity were considered, and the ROC curves were plotted. The KNN algorithm had a mean AUC of 91% with a standard deviation of 0.09, while the mean sensitivity and specificity were 96% and 85%, respectively. The SVM algorithm achieved a mean AUC of 95% with a standard deviation of 0.05 after stratified 10-fold cross-validation, along with a mean sensitivity and specificity of 95% and 86%.
Ramdaniah et al. [28] used the microarray gene for the identification of diabetic classes from the GSE18732 dataset. A total of 46 diabetic classes and 72 non-diabetic classes were used in this study. The machine learning techniques used the Naive Bayes and SVM Sigmoid kernel methods with accuracies of 88.89% and 83.33% respectively. There are many researchers who have used the PIMA Indian diabetic data set to classify and analyze diabetic and non-diabetic classes for finding various performance metrics like accuracy, sensitivity, specificity MCC, etc., although very few findings are available in the microarray gene-based dataset for the identification of diabetic and non-diabetic classes.
The methodology of the research conducted in this study is depicted in Figure 1. It shows that four Dimensionality Reduction techniques, DFA, Chi2pdf, Firefly search, and Cuckoo search, are used. To further analyze the data, classification of data without feature selection and with feature selection methods is conducted. In this study, with feature selection methods, two optimization algorithms are used, PSO and HS. Moreover, seven classifiers are used-NLR, LR, LoR, GMM, BLDC, SDC, and SVM-RBF-to classify the classes as normal and diabetic. Section 1 introduces the paper. The literature review is discussed in Section 2. The material and methods of datasets are explained in Section 3. Feature extraction through Dimensionality Reduction techniques is explained in Section 4. Section 5 deals with feature selection methods for the research. The classifier's properties are explained in Section 6. Training and Testing of the classifier is discussed in Section 7. The results are discussed in Section 8. This paper is concluded in Section 9.

Material and Methods
Microarray gene expression analysis plays a vital role in understanding the molecular mechanisms and identifying gene expression patterns associated with various diseases, including diabetes. Here are some ways in which microarray gene expression analysis contributes to our understanding of diabetes: microarray analysis allows researchers to compare gene expression levels between healthy individuals and those with diabetes. These genes may be directly involved in disease development, progression, or complications. It is readily available from many search engines. "Expression data from human pancreatic islets" were taken from a Nordic islet transplantation program, consists of 57 Non-diabetic and 20 Diabetic cadaver donors, from which a total of 28735 gene data sets arrived (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA178122) (accessed on 20 August 2021). The data has accessed on 20 August 2021. The data are classified as 22,960 genes per patient and the peak intensity with the average valve were selected among the total samples. The logarithmic transformation was applied with a base 10 for standardization of individual samples with a value of 0 for mean and variance of 1.
Statistical analysis methods used in this article, in the following section, are as follows: 4 Dimensionality Reduction techniques for reduction of the high dimensional data, 2 meta-heuristic algorithms for feature selection with verification of p-value through ttest, and 7 classifiers used for classifying the diabetic and non-diabetic classes through statistical parameters such as accuracy, recall, precision F1 score, MCC, rrror rate, Jaccard metric, and kappa.

Data Set
When using biological functions to detect diabetes and the features of its secondary criteria in probability functions based on p values, a false positive error for selection of significant genes has also to be detected. The data available in many portals for human genes consist of 28735 genes with 50 non-diabetic and 20 diabetic samples, considered for

Material and Methods
Microarray gene expression analysis plays a vital role in understanding the molecular mechanisms and identifying gene expression patterns associated with various diseases, including diabetes. Here are some ways in which microarray gene expression analysis contributes to our understanding of diabetes: microarray analysis allows researchers to compare gene expression levels between healthy individuals and those with diabetes. These genes may be directly involved in disease development, progression, or complications. It is readily available from many search engines. "Expression data from human pancreatic islets" were taken from a Nordic islet transplantation program, consists of 57 Non-diabetic and 20 Diabetic cadaver donors, from which a total of 28735 gene data sets arrived (https: //www.ncbi.nlm.nih.gov/bioproject/PRJNA178122) (accessed on 20 August 2021). The data has accessed on 20 August 2021. The data are classified as 22,960 genes per patient and the peak intensity with the average valve were selected among the total samples. The logarithmic transformation was applied with a base 10 for standardization of individual samples with a value of 0 for mean and variance of 1.
Statistical analysis methods used in this article, in the following section, are as follows: 4 Dimensionality Reduction techniques for reduction of the high dimensional data, 2 metaheuristic algorithms for feature selection with verification of p-value through t-test, and 7 classifiers used for classifying the diabetic and non-diabetic classes through statistical parameters such as accuracy, recall, precision F1 score, MCC, rrror rate, Jaccard metric, and kappa.

Data Set
When using biological functions to detect diabetes and the features of its secondary criteria in probability functions based on p values, a false positive error for selection of significant genes has also to be detected. The data available in many portals for human genes consist of 28735 genes with 50 non-diabetic and 20 diabetic samples, considered for

Dimensionality Reduction
Dimensionality Reduction techniques for the analysis of microarray gene data for type II diabetic class is important. Microarray experiments often generate a vast amount of gene expression data, resulting in a high-dimensional feature space. However, not all genes contribute equally to the classification task, and the presence of noise and irrelevant features can hinder the accuracy and interpretability of the results. Dimensionality reduction methods play a pivotal role in addressing these challenges by extracting the most informative features that are relevant to type II diabetes classification. These techniques aim to reduce the dimensionality of the data while preserving the discriminatory information, enabling the efficient computation and improved performance of subsequent classification algorithms. By eliminating redundant and irrelevant features, Dimensionality Reduction can enhance the performance of the results.
In this paper, four DR methods, DFA, Chi2pdf, the Firefly algorithm, and the Cuckoo search algorithm are utilized to reduce the dimension of a data set. The DR methods are discussed in the following section of the paper.

A. Detrend Fluctuation Analysis (DFA)
DFA is the first DR method utilized in this paper which comprises the principles of inspecting the stationary and non-stationary functions of correlation, and the short range and long-range relationship has been used in DFA by Berthouze L et al. [29]. For typical application, the scaling of DFA is exponential to segregate the input data as rational and irrational. To estimate the functions of output class data it is useful to discriminate between the healthy and unhealthy objects using DFA.
The algorithm is determined by the root mean square fluctuation of natural scaling and integrated time-series of input data in the detrend.
Here, X(i) is denoted as the ith sample of input data. X is specified as the overall signal of the mean value. A(n) is indicated as the estimated value in the integrated time series.
where b n (k) is the predetermined window of scale n for a trend of the kth point.

B. Chi Square Probability Density Function
Siswantining et al. [30] proposed a slightly different approach for Chi square statistics methods, according to a fit test and the test of independence. A sample was obtained from the data which is referred to as number of cases. It was to be represented as data which is Diagnostics 2023, 13, 2654 7 of 36 segregated in every incidence of occurrence in each group. In this statistics method of Chi square, if the hypothesis is accurate, the expected number of cases in each category makes a statement of null hypothesis. The test was based on the ratio of experimental data to the predicted values in each group. It is defined as: where E i refers to the experimental data of the cases in category i, and P i refers to the number of predicted values in category i, to compute the Chi square function, and the difference between the experimental data cases to the predicted value cases is calculated. Then, the difference must be squared by the values and be divided by the predicted value such that all the values in this category are summed up for the entire distribution curve to obtain the Chi square statistics. Knowing whether the null hypothesis is a major concern depends on the data distribution. For the Chi square method, the alternative and null hypothesis is defined below: If E i − P i is small for each type, then the expected value and predicted values are very close to each other, and the null hypothesis is real. When the expected data are not associated with the predicted value of the null hypothesis, then a large difference appears between E i and P i . For real values of the null hypothesis, a small value will be found for the Chi square statistics, and if the value is false for the null hypothesis, a large value will be attained. The degree of freedom is dependent upon the variables of the categories utilized to calculate the Chi square. By means of hypothesis testing, the Chi square method reduces the dimensionality of the data.

C. Firefly algorithm As Dimensionality Reduction
Yang, Xin-She (2010) [31] and Yang (2013) [32] proposed a reliable metaheuristic model for real-life problem-solving techniques like the scheduling of events, classification of a system, dynamic problem optimization, and economic load dispatch problems. The Fireflies algorithm works by using the characteristic behaviour of the idealized flashing light of the firefly to attract another one.
Three rules have been identified in the firefly algorithm: 1. An attraction is made to another fly regardless of the sex because every fly is considered as unisex; 2.
"Opposite poles attract," the attractiveness one of the fireflies is higher to another one which is slightly less bright. If none of the flies are getting brighter, it moves randomly on the surface; 3.
If the distance increases, the brightness or light intensity of a firefly may decrease because the medium of air absorbs light and, thus, the brightness of a firefly k which is seem by another firefly is given by: where β k (0) represents the firefly (k) brightness at the zero level. In the euclidean distance (if r = 0), light adsorption coefficient of the medium is represented by α, and the Euclidean distance between i and k is denoted by r as where x i and x k are the firefly position of i and k, respectively. If the brighter firefly is j, its degree of attractiveness directs the movement of fly i, which it is based on Yang and He, 2013 [32]. (8) where γ refers to the random parameter and, in general, is represented as rnd; it was expended as a random number generator using uniform distribution between the ranges of [−1, +1]. In-between representation in this equation contains the accountability of movement of firefly i towards firefly k. The last term in the above equation gives the movement of the solution away from the local optimum value when such an incident occurs.

D. Cuckoo Search Algorithm as Dimensionality Reduction
Yang, X. S, and Deb, S (2009) [33] proposed another metaheuristic model to give finite solutions which is used for solving real-world problems such as event scheduling, dynamic problem optimization, classifications, and problems in economic load dispatch. Exciting breeding behaviors is the main objective of learning this algorithm and it particularly concentrates on the oblige brood parasitism of certain cuckoo birds. The main idealized aspect in the Cuckoo search algorithm is the breeding characteristics, and the algorithm is applicable for many real-time optimization problems.
A simple solution is obtained from each egg in a host nest, with the continuation a new solution derived from a cuckoo bird's own egg. The main aim is to obtain a better solution (cuckoos) to be replaced with one that is a less good fit. Each egg has one solution, each nest has multiple egg, and finding the best one signifies a set of solutions.
Cuckoo's lay an egg at a time, and are kept inside an arbitrarily selected shell; 2.
To create a consecutive generation, the best host shell with a good quality egg is selected to transfer its own.

3.
A fixed no. of host nests is accessible. Indeed, cuckoo's place an egg in a nest with a probability of P a (0, 1); where P a is Cuckoo egg probability. To construct a new nest in an additional location, the host can demolish the cuckoo eggs, or remove the nest Moreover, Yang and Deb detailed that an appropriate searching technique based on the random-walk (RW) technique, and its performance is better than Lévy flights and RW. The conventional method was modified by them to construct the proposed method using classification techniques.
The Lévy flights method denotes the RM characteristics of a bird's position, and its performance is to obtain the following position P where ⊕ and β represent the starting point multiplication and step size. Commonly, β > 0 is interrelated to the depth of variation and its interest for problem consideration. For almost all of the classification problems, the values are randomly fixed as 1. The above equation is based on the RW on stochastic model. To find out the following position depends on the present position and transition probability for RM, which is denoted in Markov chain.
In the classification problem, the value of β is tuned to 0.2, which denotes infinite variance with an infinite mean. The power law-based step-length distribution approach uses a heavy tail for RW to, principally, be followed by cuckoo's consecutive step. To speed up the classification process, the best solution was found using the Lévy walk method.

Statistical Analysis
The dimensionally reduced microarray genes, obtained through four DR methods, are then analysed by statistical parameters like mean, variance, skewness, kurtosis, Pearson correlation coefficient (PCC), and CCA to identify whether the outcomes represent the underlying microarray genes properties in the reduced subspace. Table 2 shows the statistical features analysis for four types of dimensionally reduced diabetic and nondiabetic pancreas microarray genes. As shown in Table 2, the DFA and Cuckoo search-based DR methods depict higher values of mean and variance among the classes. The Chi 2 pdf and Firefly algorithm display low and overlapping values of mean and variance among the classes. The negative skewness depicted only by the Chi 2 pdf DR method indicates the presence of skewed components embedded in the classes. The Firefly algorithm indicates unusually flat kurtosis and the Cuckoo search DR method indicates negative kurtosis. This, in turn, leads to the observance that the DR methods are not modifying the underlying microarray genes characteristics.
The PCC values indicate a high correlation within the class of attained outputs. This subsequently exhibits that the statistical parameters are associated with non-gaussian and non-linear outputs. The same is further examined by the histogram, Normal probability plots, and Scatter plots of the DR techniques outputs. Canonical Correlation Analysis (CCA) visualizes the correlation of the DR methods outcomes among the diabetic and non-diabetic cases. The low CCA value in Table 2 indicates that the DR outcomes are less correlated among the two classes. Further, the reduced data attained from the four DR techniques are analyzed by means of histogram, Normal probability plots, and Scatter plots to visualize the presence of non-linearity and the non-Gaussian nature of the datasets. Figure 2 shows the Histogram of Detrend Fluctuation Analysis (DFA) techniques in the diabetic gene class. It is noted in Figure 2 that the histogram displays near quasi-Gaussian qualities and the presence of non-linearity in the DR method outputs. The legend in the figure is included, as the patients, from 1 to 10, are represented as from x(:,1) to x(:,10).       In the above figure, data 1-5 represent references, data 6-10 represent upper bound values, and data 11-15 represent feature selection points. Figure 4 exhibits the Normal probability plot for the Chi Square DR techniques' features for the diabetic gene class. As indicated by Figure 4, the normal probability plot displays the total cluster of Chi Square DR outputs and the presence of non-linearly correlated variables among the classes. Figure 5 depicts the Normal probability plot for the Chi Square PDF DR techniques' features for the non-diabetic gene class. As shown in Figure 5, the Normal probability plot displays the total cluster of Chi Square DR outputs and the presence of non-linearly correlated variables among the classes. This is due to low variance and negatively skewed variables for the DR method outcomes. In the figure, data 1-5 are references, data 6-10 are upper bound values, and data 11-15 are feature selection variable points.    Figure 5 depicts the Normal probability plot for the Chi Square PDF DR techniques' features for the non-diabetic gene class. As shown in Figure 5, the Normal probability plot displays the total cluster of Chi Square DR outputs and the presence of non-linearly correlated variables among the classes. This is due to low variance and negatively skewed variables for the DR method outcomes. In the figure, data 1-5 are references, data 6-10 are upper bound values, and data 11-15 are feature selection variable points. In the above figure, data 1-5 represent references, data 6-10 represent upper bound values, and data 11-15 represent feature selection points. Figure 4 exhibits the Normal probability plot for the Chi Square DR techniques' features for the diabetic gene class. As indicated by Figure 4, the normal probability plot displays the total cluster of Chi Square DR outputs and the presence of non-linearly correlated variables among the classes. Figure 5 depicts the Normal probability plot for the Chi Square PDF DR techniques' features for the non-diabetic gene class. As shown in Figure 5, the Normal probability plot displays the total cluster of Chi Square DR outputs and the presence of non-linearly correlated variables among the classes. This is due to low variance and negatively skewed variables for the DR method outcomes. In the figure, data 1-5 are references, data 6-10 are upper bound values, and data 11-15 are feature selection variable points.     Figure 6, the normal probability plots display discrete clusters for the Firefly DR outputs. This indicates the presence of non-Gaussian and non-linear variables within the classes. This is due to low variance and flat kurtosis variables of the DR method outcomes.
Diagnostics 2023, 13, x FOR PEER REVIEW 12 of 40 Figure 6 indicates the normal Probability plot for Firefly algorithm DR techniques' features for the diabetic gene class. In the figure, data 1-5 are references, data 6-10 are upper bound values, and data 11-15 are feature selection Firefly algorithm diabetic gene points. As shown in Figure 6, the normal probability plots display discrete clusters for the Firefly DR outputs. This indicates the presence of non-Gaussian and non-linear variables within the classes. This is due to low variance and flat kurtosis variables of the DR method outcomes.   Figure 7, the normal probability plots display the discrete clusters for Firefly DR outputs. This indicates the presence of non-Gaussian and non-linear variables within the classes. This is due to a low variance and flat kurtosis variables of the DR method outcomes.   Figure 7, the normal probability plots display the discrete clusters for Firefly DR outputs. This indicates the presence of non-Gaussian and non-linear variables within the classes. This is due to a low variance and flat kurtosis variables of the DR method outcomes.  Figure 8 shows the Scatter plot for the Cuckoo search algorithm DR techniques' features for the diabetic and non-diabetic gene classes. As depicted by the Figure 8, the Scatter plots from the Cuckoo search display the total scattering of the variables of both classes across the entire subspace. The Scatter plot also indicates the presence of non-Gaussian, non-linear, and higher values for all of the statistical parameters.  Figure 8 shows the Scatter plot for the Cuckoo search algorithm DR techniques' features for the diabetic and non-diabetic gene classes. As depicted by the Figure 8, the Scatter plots from the Cuckoo search display the total scattering of the variables of both classes across the entire subspace. The Scatter plot also indicates the presence of non-Gaussian, non-linear, and higher values for all of the statistical parameters.  From the above graphs it can be observed that the DR methods are insufficient to classify the data sets into appropriate classes. Therefore, feature selection methods like PSO and Harmonic Search are used to enhance the classifier performance.

Feature Selection
In the field of optimization, finding the optimal solution for complex problems is a significant challenge. Traditional optimization algorithms often struggle to handle highdimensional search spaces or non-linear relationships between variables. To address these challenges, two popular meta-heuristic algorithms such as Particle Swarm Optimization (PSO) and Harmonic Search (HS) are incorporated as feature selection methods in this paper.

Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO), Rajaguru H et al. [35], is one of the best and most simple to understand among all search algorithms. It uses some basic parameters for its initial search and population called particles. In an h-dimensional space, any of the particles will give the best possible solution for processing and analysis. Every particle needs to be traced and positioned for the optimized values to be achieved.
Position traced by: P k j = P k j1 , P k j2 , . . . , P k jh ; Velocity traced by: Ve k j = (Ve k j1 , Ve k j2 , . . . , Ve k jh ; The updated velocity of each particle is given by: where r 1 and r 2 are the random variable search in the ranges from 0 to 1. c 1 and c 2 are the acceleration coefficient that checks the movement (motion) of the particles. The updated position of each particle is defined as: If a particle attains the best position, then it progresses to the next particle. The representation of the best position is expressed as p-best and the representation of the best position for all of the particles is expressed as g-best.
The weight function is expressed as: Steps for implementation: Step 1: Initialization of the process; Step 2: For each particle, the dimension of a space is denoted as h; Step 3: Initialization of the particle position as p j and velocity as Ve j ; Step 4: Evaluate the fitness function; Step 5: Initialize the pbest j with a copy of p j ; Step 6: Initialize the gbest j with a copy of p j with the best fitness function; Step 7: Repeat the steps until the stopping criteria are satisfied.

Harmonic Search (HS)
Harmony Search (HS) is a meta-heuristic algorithm that draws inspiration from the evolution of music and the quest for achieving perfect harmony. Bharanidharan, N et al. [36] introduced HS as an algorithm that emulates the improvisational techniques employed by musicians. The HS algorithm involves a series of steps to be implemented.
Step 1: Initialization The optimization problem is generally formulated as minimizing or maximizing the objective function f(x), subject to yi ∈ Y, where i = 1, 2, . . ., N. In this formulation, y represents the set of decision variables, N denotes the number of decision variables, and Y represents the set of all possible values for each decision variable (i.e., yiLo ≤ yi ≤ yiUp, where yiLo and yiUp are the lower and upper bounds for each decision variable). Along with defining the problem, the subsequent step involves initializing the following parameters for the Harmonic Search (HS) algorithm.
Step 2: Memory Initialization The Harmony Memory (HM) is a matrix that stores all of the decision variables. In the context of the general optimization problem, the initial HM is created by generating random values from a uniform distribution, bounded by yiLo and yiUp, for each decision variable.
Step 3: New Harmony Development During the process of solution improvisation, a new harmony is created by adhering to the following constraints: Step 4: Harmony memory updation The fitness function for both the old and new harmony vectors is calculated. If the fitness function of the new harmony vector is lower than that of the old harmony vector, the old harmony vector is replaced with the new one. Otherwise, the old harmony vector is retained.
Step 5: Stopping criteria Steps 3 and 4 are repeated until the maximum number of iterations is reached. The effectiveness of the feature selection methods outputs is analysed through the significance of the p-value from the t-test. Table 3 shows the p-value significance for the PSO and Harmonic Search feature selection methods outputs after is applied to the four DR techniques. t-test: The t-test is a statistical test used to determine if there is a significant difference between the means of two groups. It is commonly used in hypothesis testing when comparing the means of two independent samples to assess whether the difference observed in the samples is likely to reflect a true difference in the population.
The t-test and significance of the p-values is used to test for the null hypothesis of the feature selection method, which is explicated below.
(i) Formulate the null hypothesis; H0. The clustering/feature selection procedure is random or inconsistent, select the significant level α.
H1. The clustering/feature selection procedure is non-random or consistent, within the control level of α.
(ii) Compute the p-value t-test; (iii) Check for the significance of p-value < 0.01, then the null hypothesis is accepted or otherwise rejected.
In the context of the study, if you have two groups, such as diabetic and non-diabetic individuals, you might consider using a t-test to compare the mean values of certain variables (e.g., gene expression levels) between these groups.
As tabulated in Table 3, the PSO feature selection method does not show any significant p-values among the classes for all four DR methods. In the case of Harmonic Search Feature selection, a certain p-value significance is shown for the DFA and Firefly DR techniques for the diabetic class. At the same time, all of the other DR methods exhibit non-significant p-values. These p-values will be measured to quantify the presence of outliers, and nonlinear and non-Gaussian variables among the classes after applying the feature selection methods. The classification methods are explained in the following section.
Non-Linear regression The behaviour of the system is denoted as a mathematical expression for easy representation and analysis to obtain the accurate best-fit line in-between the classifier values. In this case, the author uses the mathematical method for the linear system of the variables like (a,b) for the equation in a linear mode, y = ax + b; in the case of a non-linear mode, the values of the variables a and b are nonlinear and random variable, respectively. To obtain the least sum of the squares is one of the primary objectives of non-linear regression.
The non-linear model requires more attention than the linear model because of its complex nature, and researchers have found many methods to reduce its complexity such as the Levenberg-Marquard and Gauss-Newton methods. To reduce the residual sum of the squares, the equation must be used for non-linear parameters. The Taylor series method, steepest descent method, and Levenberg-Marquardt's method, Zhang et al. [37], can be used for non-linear equations in an iterative manner.
The authors assume a model: Here, x i and z i are the independent and dependent variables of the ith iteration. θ = (θ 1 , θ 2 , . . . , θ m ) are the parameters and ε i is the error terms that follows N 0, σ 2 ). The residual sum of the squares is given by: Let θ k = θ 1k , θ 2k , . . . , θ pk be the starting values, and the successive estimates are obtained using: where g = ∂S u (θ) ∂θ θ = θ o and H = ∂ 2 S u (θ) ∂θ∂θ θ = θ 1 , τ is a multiplier and I is the identity matrix. From a previous experiment, the estimated parameter can be identified by the choice of the initial parameter and theoretical consideration for all other similar systems. By using Mean Square Error (MSE), the statistic method involved to approximate the goodness of fit model is described by: The overall experimental values in the model are represented by N, and the classification of the normal patient samples and diabetic patient samples in the dataset are determined by running the run test and normality test.
The steps to be followed for the non-linear regression algorithmic method are: To get the best-fit function in a data point, the main objective is to get the MSE value to be less for non-linear regression.
Curves value produced by the initial values; 3.
To minimize the MSE value, calculate the parameters iteratively and modify the same to get the curve to be close to the nearer value; 4.
If the MSE value has not changed when compared to the previous value, the process must stop.

Linear regression
To analyze the gene expression data, linear regression is good to get the best-fit curve, and the expression level varies to a small extent at this gene level. By comparing the training data set with the gene expression for the data class to get the most informative genes that are used in the features selection process above, the various diversified levels of data are achieved. In this linear regression model, the dependent variable of x is taken in association with y as independent variable [38]. The model is established to forecast the values using the x variable when the regression fitness value is maximized because of the population in the y variable. The hypothesis function of the single variable is given as: where θ i is the parameters. To select the range between θ o and θ 1 in such manner that g θ is near to y in the training data set (x, y), the cost function is given by: The total samples are represented by m in the training dataset. The linear regression model with n variables is given by: (22) and the cost function is given by: where θ is a set consisting of {θ 0 , θ 1 , θ 2 , . . . , θ n }. The algorithm for the linear regression is 1.
The features selection parameters based on the DFA, Chi 2 Pdf, Firefly, and Cuckoo search algorithms is input to the classifiers; 2.
Fit a line g θ = θ 0 + θ 1 x that splits the data in a linear method; 3.
To minimize the observed data for prediction and to define the cost function for computes, the total squared error value is obtained; 4.
Find the solutions by equating to zero for computing the derivate for θ 0 and θ 1 ; 5.
Repeat the steps 2, 3, and 4 to get the coefficients that give the minimum squared error.

Logistic Regression
The function Logit has been utilized effectively for the classification of problems like diabetes, cancer, and epilepsy. The author considers function y as an array of disease status with from 0 to 1 representation of normal patients to diabetic patients. Let us assume the vector gene expression as x = x 1, x 2 , . . . , x m , where x j is the jth gene expression level. A model-based approach of Π(x) is used to construct a dataset with the most likelihood of y = 1 given that x can be useful for an extremely new type of gene selection for diabetic patients. To identify the maximum likelihood in the Dimensionality Reduction techniques to find out the "q" informative genes for the Logistic Regression, let x j * be the representation of the gene expression, where j = 1, 2, 3, . . ., q, and the binary disease status in the form of an array is given by y i, where i = 1, 2, . . ., n, and the vectored gene expression are defined as x i = x i1 , . . . , x ip . The Logistic Regression model is denoted by: The fitness function and the log-likelihood should be the maximum when using the following function: where τ is the parameter that limits υ shrinkage near to 0, and π i = π(x i ), as specified by the model in the article by Hamid et al. [39,40]. υ 2 is the Euclidean length of υ = υ 1 , υ 2 , . . . , υ p . The selection of q and τ is based on the parametric bootstrap and constrains the accurate calculation of the error prediction methods. First, the value of υ was set be zero due to the computing analysis of cost function. After that, it is varied due to various parameters to minimize the cost function. The selection of the values from 0 to 1, in the sigmoid function, is conducted for the purpose of attenuation. The threshold cut-off value between the diabetic and the normal patients is fixed as 0.5. Therefore, any probability under 0.5 is taken as a normal patient and any probability above the threshold value is considered as a diabetic patient.
In the below three methods, the authors used the techniques for threshold values for separation of the dataset.

Gaussian Mixture Model (GMM)
The Gaussian Mixture model is one of the popular unsupervised learning models for machine learning which is used for pattern recognition and signal classification, and depends on integrating the related objects. By using clustering techniques, similar data are classified in a way that makes it easy to predict and compute the unrated items in the ratio of the same category. GMM [41] comes under the category of soft clustering techniques which has both hard and soft clustering techniques. Let us assume the GMM will allow the Gaussian Mixture model distribution techniques for further data analysis. For the data generated in the Gaussian distribution techniques, every GMM includes g in the Gaussian distributions. In the Probability density function of GMM, the distributed components are added in a linear form to analyze the generated data. For a random value generation in a vector form, a in a n-dimensional sample space χ, if 'a' obeys the Gaussian distribution, the probability distribution function is expressed as: where µ is represented by the mean vector of n-dimensional space and n × n is represented by the covariance of matrix Σ. The Gaussian distribution of covariance Σ and the mean vector µ is done through determination of the matrix. There are many components to be mixed up for the Gaussian distribution function and each has individual vector spaces in the distribution curve. The mixture distribution equation is expressed as: The jth Gaussian Mixture of the parameter is represented as µ j and Σ j and, with the corresponding mixing coefficient, is represented as ∝ j .

Bayesian Linear Discriminant Classifier (BLDC)
The main usage of this type of classifier is to regularize the high-dimensional signal, the reduction of noisy signals, and to avoid the computation performance. An assumption to be made before proceeding to the Bayesian linear discriminant analysis, Zhou et al. [42], is that a target is set with respect to the relation in a vectors of b and c, which is denoted as white Gaussian noise; therefore, it is expressed as a = x T b + c. The weighted function is considered as x, and its likelihood function is expressed as where the pair of {B, m} is denoted as G. The B matrix will give the training vector. a denotes the filtered signal, β denotes the inverse variance of the noise, and the sample size is denoted by C. The prior distribution of x is expressed as: Here, the regularization square is represented as where the hyper parameter α is produced from the forecasting the data, and the vector number is assigned as l. The weight x follows a Gaussian distribution which has zero mean, and a small value is contained in ε. According to the Bayes rule, the posterior distribution of x can be easily computed as: p(x|β, α, G ) = P(G|β, x )P(x|α) P(G|β, x )P(y|α)dy (29) For posterior distribution, the mean vector υ and the covariance matrix X should satisfy the norms in Equations (30) and (31). The nature of posterior distribution is highly Gaussian.
For input prediction vectorb, the expression for the probability distribution on the regression is shown as p â β, α,b, G = p â β,b , x p(x|β, α, G )dy. The nature is, again, highly Gaussian in this prediction analysis, and its mean is expressed as µ = υ Tb while its variance is expressed as δ 2 = 1 β + b T Xb.

Softmax Discriminant Classifier (SDC)
SDC [43] is included in this analysis for determination and identification of the group from which the specific test sample is taken. In this case, the weighing of its distance between the training samples to the test samples in a specific class or group of data is undertaken. The training set is denoted as: which comes from the distinct classes named q. Z q = Z q 1 , Z q 2 , . . . , Z q dq ∈ R c×d q , indicates the d q as samples from the q th class, where ∑ q i=1 d i = d. Assuming K ∈ R c×1 is the test samples, and again it is given to the classifiers. If a negligible construction error can be obtained from the test samples, then we utilize the class of q. The class sample of q and test samples were transformed in the non-linear enhancing values, by which the ideology of SDC has been satisfied in the following equations: Here, h(K) represents the distance between the ith class and the test samples. λ > 0, validates the penalty cost. Hence, if K is identified as belonging to the class of the ith value, then v and υ i j are the same characteristic function, and so v − υ i j 2 is improving close to zero and, hence, maximizing Z i w can be achieved in the asymptotic values in which its maximum possibility is attained.

Support Vector Machine-Radial Basis Function (SVM-RBF)
The SVM classifier is one of the important machine learning techniques for classification problems, especially the non-linear regression based multilayer perceptron with Radial Basis Function (RBF) by Yao, X. J., et al. [44].
The training time and computational complexity for the machine in the SVM depends on the classifier, and the data must be supported for the required SVMs. If we increase the support for the machines in the SVM, it leads to the computational requirements being high, and its floating point for the multiplication and addition is calculated.
Steps for the SVM is to identify Step 1-With the help of quadratic optimization, we can use linearization and convergence. The dual optimization problem which was transformed from the primal minimization problem is referred to as maximizing the dual lagrangian L D with respect to α i , Subject to ∑ l i=1 α i y i = 0, where α i ≥ 0∀i = 1, 2, 3, . . . , l Step 2-By solving the above quadratic programming problem in the optimal separating hyper plane, those points which have a non-zero lagrangian multiplier ( ∝ i > 0) become the support vectors Step 3-In the trained data, the optimal hyper plane is fixed by the support vectors, and it is very close to the decision boundary Step 4-The K means clustering is the data set. It will function as a group of clusters according to the conditions of Step 2 and Step 3. A vector is randomly chosen from the clusters of three points each as a cluster or center point, which are the points from the given dataset. Each point in the center will acquire the present around them.
Step 5-If there are six center points from each corner, then the SVM training data are using kernel methods.

Radial Basis Function
The hyper plane and support vectors are used to separate linearly separable and nonlinearly separable data.

Training and Testing of Classifiers
The training data for the dataset are limited. Therefore, we performed k-fold crossvalidation. k-fold cross-validation is a popular method for estimating the performance of a machine learning model. The process performed by Fushiki et al. [45] for k-fold crossvalidation is as follows. The first step is to divide the dataset into k equally sized subsets (or "folds"). For each fold i, the model is trained on all the data except the ith fold and the model is tested on the ith fold. The process is repeated for all k folds so that each is used once for testing. At the end of the process, you will have k performance estimates (one for each fold). Now, the average of the k performance estimates is calculated to get an overall estimate of the model's performance. Once the model has been trained and validated using k-fold cross-validation, you can retrain it on the full dataset and predict new, unseen data. The advantage of k-fold cross-validation is that it provides a more reliable estimate of a model's performance than a simple train-test split, as it uses all the available data. In this paper, the k-value is chosen as 10-fold. This research used a value of 2870 dimensionally reduced features per patient. This research is associated with 20 diabetic and 50 non-diabetic patients with multi-trail training of the required classifiers. The use of cross-validation removes any dependence on the choice of pattern for the test set. The training process is controlled by monitoring the Mean Square Error (MSE), which is defined as: where Ojis the observed value at time j, T j is the target value at model j; j = 1 and 2, and N is the total number of observations per epoch; in our case it is 2870. As the training progressed, the MSE value reached 1.0 × 10 −12 within 2000 iterations.  False Negative (FN): A patient is incorrectly identified as non-diabetic class when they are in diabetic class.
The training MSE always varied between 10 −4 and 10 −8 , while the testing MSE varied from 10 −4 to 10 −6 . The SVM (RBF) classifier without feature selection method settled at minimum training and testing MSEs of 1.26 × 10 −8 and 5.141 × 10 −6 , respectively. The minimum testing MSE is one of the indicators towards the attainment of a better performance of the classifier. As shown in Table 5, a higher value of the testing MSE leads to a poorer performance of the classifier irrespective of the Dimensionality Reduction techniques.  Table 6 displays the training and testing MSE performance of the classifiers with the PSO feature selection method for four Dimensionality Reduction techniques. The training MSE always varied between 10 −5 and 10 −8 , while the testing MSE varied from 10 −4 to 10 −6 . The SVM (RBF) classifier with the PSO feature selection method settled at minimum training and testing MSEs of 1.94 × 10 −9 and 1.885 × 10 −6 , respectively. All of the classifiers slightly improved the performance in the testing MSE when compared to the performance without feature selection methods. This will be indicated by the enhancement of the accuracy of the classifier performance irrespective of the type of the Dimensionality Reduction technique. Table 7 depicts the training and testing MSE performance of the classifiers with the Harmonic Search feature selection Method for four Dimensionality Reduction techniques. The training MSE always varied between 10 −5 and 10 −8 , while the testing MSE varied from 10 −4 to 10 −6 . The SVM (RBF) classifier with the Harmonic Search feature selection method settled at minimum training and testing MSEs of 1.86 × 10 −8 and 1.7 × 10 −6 , respectively. All of the classifiers enhanced the performance in the testing MSE when compared to the performance without feature selection methods. This will be indicated by the improvement of the accuracy, MCC, and Kappa parameters of the classifier performance irrespective of the type of Dimensionality Reduction technique.

Selection of Target
The target value for the non-diabetic case (T ND ) is taken at the lower side of the zero to one (0 → 1) scale and this mapping is made according to the constraint of: (38) where µ i is the mean value of input feature vectors for the N number of non-diabetic features taken for classification. Similarly, the target value for the diabetic cases (T Dia ) is taken at the upper side of the zero to one (0 → 1) scale and this mapping is made based on: where µ j is the average value of input feature vectors for the M number of diabetic cases taken for classification. Note that the target value T Dia would be greater than the average values of µ i and µ j . The difference between the selected target values must be greater than or equal to 0.5, which is given by: Based on the above constraints, the targets T ND and T Dia for the non-diabetic and diabetic patient output classes are chosen at 0.1 and 0.85, respectively. After selecting the target values, the Mean Squared Error (MSE) is used for evaluating the performance of machine learning classifiers.

Results and Discussion
The research uses standard 10-fold testing and training, in which 10% of the input features are employed for testing, whereas 90% are employed for training. The choice of performance measures is significant in evaluating classifier performance. The confusion matrix is used to evaluate the performance of classifiers, especially in binary classification (i.e., classification into two classes, such as diabetic or non-diabetic from the pancreas microarray genes). It can be used to calculate performance metrics such as accuracy, F1 score, MCC, error rate, Jaccard metric, and Kappa, which are commonly used to evaluate the model's overall performance. Table 9 depicts the parameters associated with the classifiers for performance analysis.

Accuracy
The accuracy of a classifier is a measure of how well it correctly identifies the class labels of a dataset. It is calculated by dividing the number of correctly classified instances by the total number of instances in the dataset. The equation for accuracy is given by Fawcett et al. [46]:

Recall
Recall is a critical performance metric used to evaluate the classifier's ability to correctly identify positive instances, specifically diabetic individuals, out of all the actual positive instances present in the dataset. It measures the proportion of true positive predictions out of all the instances that are positive in the dataset.

Precision
Precision is used to evaluate the classifier's ability to accurately predict positive instances. It measures the proportion of true positive predictions out of all instances predicted as positive by the classifier. A high precision value indicates that the classifier has a low false positive rate, meaning it correctly identifies a high proportion of diabetic individuals without misclassifying non-diabetic individuals as diabetic.
The F1 score is a measure of a classifier's accuracy that combines precision and recall into a single metric. It is calculated as the harmonic mean of precision and recall, with values ranging from 0 to 1, where 1 indicates perfect precision and recall. The equation for the F1 score is given by Saito et al. [47]: where precision is the proportion of true positives among all instances classified as positive, and recall is the proportion of true positives among all instances that are positive. The F1 score is useful when the classes in the dataset are imbalanced, meaning there are more instances of one class than the other. In such cases, accuracy may not be a good metric to use, as a classifier that simply predicts the majority class would have a high accuracy but low precision and recall. The F1 score provides a more balanced measure of a classifier's performance.

Matthews Correlation Coefficient (MCC)
MCC is a measure of the quality of binary (two-class) classification models. It considers true and false positives and negatives and is particularly useful in situations where the classes are imbalanced.
The MCC is defined by the following equation, as given in Chicco et al. [48]: The MCC takes on values between −1 and 1, where a coefficient of 1 represents a perfect prediction, 0 represents a random prediction, and −1 represents a perfectly incorrect prediction.

6.
Error Rate The error rate of a classifier, as mentioned in Duda et al. [49], is the proportion of instances that are misclassified. It can be calculated using the following equation: Jaccard Metric: The Jaccard metric, also known as the Tanimoto similarity coefficient, explicitly disregards the accurate classification of negative samples [50].
Changes in data distributions can greatly impact the sensitivity of the Jaccard metric.

Kappa
The Kappa statistic, also known as Cohen's Kappa, is a measure of agreement between two raters, or between a rater and a classifier. In the context of classification, it is used to evaluate the performance of a classifier on a binary or multi-class classification task. The Kappa statistic measures the agreement between the predicted and true classes, considering the possibility of agreement by chance. Kvålseth et al. [51] defined Kappa as follows: Kappa = considering P e is the proportion of agreement expected by chance. P o and P e are calculated as follows: The Kappa statistic takes on values between −1 and 1, where values greater than 0 indicate agreement better than chance, 0 indicates agreement by chance, and values less than 0 indicate agreement worse than chance. The results are tabulated in the following tables. Table 9 demonstrates the performance analysis of the seven classifiers based on parameters like accuracy, F1 score, MCC, error rate, Jaccard metric, and Kappa values for the four Dimensionality Reduction methods without feature selection methods. It can be identified from Table 9 that the SVM (RBF) classifier in the Cuckoo Search DR technique is settled at a middle accuracy of 65.71%, F1 score of 50% with a moderate error rate of 34.28%, and Jaccard metric of 33.33%. The SVM (RBF) classifier also exhibits a low value of MCC 0.2581 and Kappa value of 0.25. The Logistic Regression classifier for the firefly algorithm DR Technique is placed in the lower ebb of accuracy of 51.42%, with a high error rate of 48.57%, F1 score of 37.03%, and Jaccard metric of 22.72%. The MCC and Kappa values of the Logistic Regression classifier are 0.01807 and 0.01652, respectively. Irrespective of the Dimensionality Reduction techniques, all of the classifiers settled at an accuracy within the range of 50-65%. This is due to the inherent limitation of the Dimensionality Reduction techniques. Therefore, it is recommended to incorporate the feature selection methods to enhance the classifier performance. Figure 9 depicts the performance analysis of the seven classifiers based on parameters such as accuracy, F1 score, error rate, and Jaccard metric values for the four Dimensionality Reduction methods without feature selection methods. These attained a middle accuracy of 65.71%, F1 score of 50% with a moderate error rate of 34.28%, and Jaccard metric of 33.33%. The Logistic Regression classifier for the firefly algorithm DR Technique was placed in the lower end of accuracy, with a value of 51.42%, a high error rate of 48.57%, an F1 score of 37.03%, and a Jaccard metric of 22.72%.  Table 10 exhibits the performance analysis of the seven classifiers for the four Dimensionality Reduction methods with the PSO feature selection method. It can be observed from Table 10 Table 10 exhibits the performance analysis of the seven classifiers for the four Dimensionality Reduction methods with the PSO feature selection method. It can be observed from Table 10 Figure 10 displays the performance analysis of the seven classifiers for the four Dimensionality Reduction methods with PSO feature selection methods. It is also identified from Figure 10 that the SVM (RBF) classifier in the Chi Square pdf DR techniques is settled at a high accuracy of 91.42%, F1 score of 85.71% with low an error rate of 8.57%, and Jaccard metric of 75%. The Logistic Regression classifier for the firefly algorithm DR technique settled in the lower end of accuracy with a value of 55.71%, with a high error rate of 44.28% F1 score of 43.63%, and Jaccard metric of 27.9%. The PSO feature selection method improves the classifier accuracy by around 10-35%, irrespective of the DR techniques.       Figure 11 exhibits the performance analysis of the seven classifiers for the four Dimensionality Reduction methods with the Harmonic Search feature selection methods. It is also observed from Figure 11 that the SVM (RBF) classifier in the Cuckoo search DR techniques is settled at a high accuracy of 90%, F1 score of 83.72% with a low error rate of 10%, and Jaccard metric of 72%. The Linear Regression classifier for the Detrend Fluctuation Analysis (DFA) DR technique is settled in the lower accuracy of 52.85%, with a high error rate of 47.14%, F1 score of 37.75%, and Jaccard metric of 23.25%. The Harmonic Search feature selection method improves the classifier accuracy by around 10-25%, irrespective of the DR techniques, and achieved the position next to the PSO feature selection method. Figure 12 displays the performance of the MCC and Kappa parameters across the classifier for the four DR Techniques without and with the two-feature selection methods. The MCC and Kappa are the benchmark parameters which indicate the outcomes of the classifiers for different inputs. As in this research, there are three categories of inputs, like dimensionally reduced without feature selection, with PSO and Harmonic Feature selection methods. The classifiers' performances are observed through the attained MCC and Kappa values for these inputs. The average MCC and Kappa values from the classifiers are 0.2984 and 0.2849, respectively. A methodology was devised to identify the performance of the classifiers with reference to Figure 12. The MCC values are divided into three ranges, 0.01-0.25, 0.251-0.54, and 0.55-0.8. The performance of the classifiers is very poor in the range 1 and there is a steep increase in the MCC vs Kappa slope in region 2 of the MCC values. Region 3 of the MCC values settled at a higher performance of the classifiers without any glitches.

Computational Complexity
The classifiers are analysed based on the computational complexity. The computational complexity is found according to the input O(n) size. The computational complexity is less if it equals O(1). As the number of inputs increases, the computational complexity will increase. In this research, the complexity does not depend on the input size; this is

Computational Complexity
The classifiers are analysed based on the computational complexity. The computational complexity is found according to the input O(n) size. The computational complexity is less if it equals O(1). As the number of inputs increases, the computational complexity will increase. In this research, the complexity does not depend on the input size; this is one of the desired entities for any algorithm. If the computational complexity increases log(n) times for any increase in 'n,' it is denoted as O(logn). In this paper, all of the classifiers are hybrid, and they classify the dimensionally reduced outputs along with the feature selection methods. Table 12 shows the computational complexity of the classifiers for the four Dimensionality Reduction techniques without feature selection methods. It is observed from Table 12 that almost all of the classifiers' computational complexities are near equal, and their performances are positioned in the low level of accuracy. The Linear Regression classifier has a low computational complexity of O(2nlog2n); at the same time, the Logistic Regression classifier with the Firefly algorithm DR techniques has a higher complexity of O(2n 3 log2n) and both of the classifiers are at the same level of accuracy. The SVM (RBF) classifier for the Cuckoo search DR technique has a high computational complexity of O(2n 3 log4n) with increased accuracy, MCC, and Kappa values.  Table 13 displays the computational complexity of the classifiers for the four Dimensionality Reduction techniques with the PSO feature selection method. It is identified from Table 13 that almost all of the classifiers' computational complexities are near equal, and their performances are positioned in the high level of accuracy. The Linear Regression classifier has a low computational complexity of O(2n 3 log2n); at the same time, the Logistic Regression classifier for the Firefly algorithm DR techniques has a higher complexity of O(2n 4 log2n), and both the classifiers are at the same level of accuracy. The SVM (RBF) classifier for the Chi square pdf DR technique has a high computational complexity of O(2n 4 log4n), with the highest accuracy of 91%, and MCC and Kappa values of 0.794 and 0.7967, respectively.  Table 14 shows the computational complexity of the classifiers for the four Dimensionality Reduction techniques with the Harmonic Search feature selection method. It can be seen in Table 14 that almost all of the classifiers' computational complexities are near equivalent and their performances are positioned in the high level of accuracy. The Linear Regression classifier has a low computational complexity of O(2n 2 log2n); at the same time, the BDLC and GMM classifiers with the Firefly algorithm DR techniques have a higher complexity of O (2n 5 log2n), and both the classifiers are at the same level of accuracy. The SVM (RBF) classifier for the Cuckoo search DR technique has a high computational complexity of O(2n 4 log2n), with the highest accuracy of 90%, and MCC and Kappa values of 0.7655 and 0.767, respectively. Even though there is a high computational complexity associated with the GMM and BDLC classifiers, they have not achieved better performance metrics.  Table 15 shows the comparision of previous works with different machine learning techniques to detect the diabetics. As mentioned in the Table 15 it is observed that, majority of the machine learning classifiers such as SVM (RBF), Naïve Bayes, Logistic Regression, Decision tree, Non-linear regression, random forest, multilayer perceptron, and Deep neural networks are utilized to classify the diabetics, based on the clinical data base. All the classifiers accuracy is at the range of 67-91%. The current study is based on microarray gene to detect diabetes and SVM (RBF) achieved accuracy of 91%. Table 16 explains the comparison of classifiers performance in different data sets. It is identify that, from Table 16, explores the efficacy of machine learning classifiers in detecting various diseases with different datasets. It is also noted that, the clinical diabetes dataset