Next Article in Journal
Biowelding 3D-Printed Biodigital Brick of Seashell-Based Biocomposite by Pleurotus ostreatus Mycelium
Previous Article in Journal
Research Progress on Low-Surface-Energy Antifouling Coatings for Ship Hulls: A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data

by
Dinesh Chellappan
1 and
Harikumar Rajaguru
2,*
1
Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India
2
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India
*
Author to whom correspondence should be addressed.
Biomimetics 2023, 8(6), 503; https://doi.org/10.3390/biomimetics8060503
Submission received: 29 August 2023 / Revised: 8 October 2023 / Accepted: 20 October 2023 / Published: 22 October 2023

Abstract

:
In this study, we focused on using microarray gene data from pancreatic sources to detect diabetes mellitus. Dimensionality reduction (DR) techniques were used to reduce the dimensionally high microarray gene data. DR methods like the Bessel function, Discrete Cosine Transform (DCT), Least Squares Linear Regression (LSLR), and Artificial Algae Algorithm (AAA) are used. Subsequently, we applied meta-heuristic algorithms like the Dragonfly Optimization Algorithm (DOA) and Elephant Herding Optimization Algorithm (EHO) for feature selection. Classifiers such as Nonlinear Regression (NLR), Linear Regression (LR), Gaussian Mixture Model (GMM), Expectation Maximum (EM), Bayesian Linear Discriminant Classifier (BLDC), Logistic Regression (LoR), Softmax Discriminant Classifier (SDC), and Support Vector Machine (SVM) with three types of kernels, Linear, Polynomial, and Radial Basis Function (RBF), were utilized to detect diabetes. The classifier’s performance was analyzed based on parameters like accuracy, F1 score, MCC, error rate, FM metric, and Kappa. Without feature selection, the SVM (RBF) classifier achieved a high accuracy of 90% using the AAA DR methods. The SVM (RBF) classifier using the AAA DR method for EHO feature selection outperformed the other classifiers with an accuracy of 95.714%. This improvement in the accuracy of the classifier’s performance emphasizes the role of feature selection methods.

1. Introduction

According to the latest data from the International Diabetes Federation (IDF) Diabetes Atlas in 2021, diabetes affects around 10.5% of the global adult population aged between 20 and 79. Alarmingly, nearly half of these individuals remain unaware of their diabetes status. The projections indicate that by 2045, the number of adults living with diabetes worldwide will increase by 46% to reach approximately 783 million, which corresponds to around one in eight adults [1]. Type II DM accounts for over 90% of all diabetes cases and is influenced by several factors, including socio-economic, demographic, environmental, and genetic factors. The increase in type II DM is connected to urbanization, a growing elderly population because of higher life expectancy, reduced levels of physical activity, and a high overweight and obesity rate. To address the impact of diabetes, preventive measures, early diagnosis, and proper care for all types of diabetes are crucial. These interventions can help individuals with diabetes prevent or delay the complications associated with the condition.
According to estimates from 2019, approximately 77 million adults in India were affected by diabetes [2]. Unfortunately, the prevalence of type II DM in the country is rapidly escalating. By 2045, the number of adults living with diabetes in India could reach a staggering 134 million, with younger people under 40 being particularly affected. Several risk factors like genetic predisposition, sedentary lifestyles, unhealthy dietary habits, obesity, urbanization, and mounting stress levels increase the risk of type II diabetes. India’s southern, urban, and northern regions exhibit higher rates compared to the eastern and western regions [3]. Many cases are undiagnosed until complications arise. Diabetes continues to be the seventh leading cause of death in India, taking a toll on both human lives and the economy. It is estimated that diabetes costs the Indian economy approximately USD 100 billion annually [4].

Genesis of Diabetes Diagnosis Using Microarray Gene Technology

Creating precise and effective techniques for identifying type II diabetes mellitus holds the potential to facilitate early identification and intervention. By analyzing microarray gene data, it becomes possible to identify specific genetic markers or patterns associated with diabetes [5]. This provides opportunities for personalized medicine, where treatment plans can be tailored based on an individual’s genetic profile, leading to more targeted and effective interventions. Robust and reliable methods for detecting diabetes from microarray gene data can be developed and integrated into existing healthcare systems [6]. Novel dimensionality reduction techniques, classification algorithms, and feature selection methods can be explored, and other omics data can be integrated to further enhance the accuracy and reliability of diabetes detection methods. Such research could advance the state of the art in machine learning [7]. The proposed method could be used to detect other diseases that are characterized by changes in gene expression.
The structure of the article is as follows: in Section 1, an introduction to the research is discussed. Section 2 presents the literature review. Section 3 presents the methodology. In Section 4, the materials and methods are reviewed. Section 5 explains the dimensionality reduction techniques with and without a feature extraction process. In Section 6, the feature selection methods are discussed, and Section 7 focuses on the classifiers used. The results and discussion are presented in Section 8, and the conclusion is given in Section 9.

2. Literature Review

Type II DM is a chronic disease that affects people worldwide irrespective of age. The early detection and diagnosis of DM in patients is essential for effective treatment and management. However, traditional methods for detecting DM, such as blood glucose testing, are often inaccurate and time consuming [8]. In recent years, there has been growing interest in the use of microarray gene data to detect DM. Microarray gene data can provide a comprehensive overview of gene expression patterns in the pancreas, which can be used to identify patients who are at risk of DM [9]. Jakka et al. [10] conducted an experimental analysis using various machine learning classifiers, including KNN, DT, NB, SVM, LR, and RF. The classifiers were trained and evaluated on the Pima Indians Diabetes dataset, which consists of nine attributes and is available from the UCI Repository. Among the classifiers tested, Logistic Regression (LR) exhibited the best performance, achieving an accuracy of 77.6%. It outperformed the other algorithms in terms of accuracy, F1 score, ROC-AUC score, and misclassification rate. Radja et al. [11] carried out a study to evaluate the performance of various supervised classification algorithms for medical data analysis, specifically in disease diagnosis. The algorithms tested included NB, SVM, decision table, and J48. The evaluation utilized measurement variables such as Correctly Classified, Incorrectly Classified, Precision, and Recall. The predictive database of diabetes was used as the testing dataset. The SVM algorithm demonstrated the highest accuracy among the tested algorithms at 77.3%, making it an effective tool for disease diagnosis. Dinh et al. [12] analyzed the capabilities of machine learning models in identifying and predicting diabetes and cardiovascular diseases using survey data, including laboratory results. The NHANES dataset was utilized, and various supervised machine learning models such as LR, SVM, RF, and GB were evaluated. An ensemble model combining the strengths of different models was developed, and key variables contributing to disease detection were identified using information obtained from tree-based models. The ensemble model achieved an AUC-ROC score of 83.1% for cardiovascular disease detection and 86.2% for diabetes classification. When incorporating laboratory data, the accuracy increased to 83.9% for cardiovascular disease and 95.7% for diabetes. For pre-diabetic patients, the ensemble model achieved an AUC-ROC score of 73.7% without laboratory data, and XGBoost performed the best, with a score of 84.4% when using laboratory data. The key predictors for diabetes included waist size, age, self-reported weight, leg length, and sodium intake.
Yang et al. [13] conducted a study that aimed to develop prediction models for diabetes screening using an ensemble learning approach. The dataset was obtained from NHANES from 2011 to 2016. Three simple machine learning methods (LDA, SVM, and RF) were used, and the performance of the models was evaluated through fivefold cross-validation and external validation using the Delong test. The study included 8057 observations and 12 attributes. In the validation set, the ensemble model utilizing linear discriminant analysis showcased superior performance, achieving an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709. Muhammed et al. [14] conducted a study utilizing a diagnostic dataset of type 2 diabetes mellitus (DM) collected from Murtala Mohammed Specialist Hospital in Kano, Nigeria. Predictive supervised machine learning models were developed using LR, SVM, KNN, RF, NB, and GB algorithms. Among the developed models, the RF predictive learning-based model achieved the highest accuracy at 88.76%. Kim et al.’s [15] study aimed to assess the impact of nutritional intake on obesity, dyslipidemia, high blood pressure, and T2DM using deep learning techniques. The researchers developed a deep neural network (DNN) model and compared its performance with logistic regression and decision tree models. Data from the KNHANES were analyzed. The DNN model, consisting of three hidden layers with varying numbers of nodes, demonstrated superior prediction accuracy (ranging from 0.58654 to 0.80896) compared to the LoR and decision tree models. In conclusion, the study highlighted the advantage of using a DNN model over conventional machine learning models in predicting the impact of nutritional intake on obesity, dyslipidemia, high blood pressure, and T2DM.
Ramdaniah et al. [16] conducted a study utilizing microarray gene data from the GSE18732 dataset to distinguish between different classes of diabetes. The study consisted of 46 samples from diabetic classes and 72 samples from non-diabetic classes. Machine learning techniques, specifically Naïve Bayes and SVM with Sigmoid kernel, were employed for classification, achieving accuracy rates of 88.89% and 83.33%, respectively. The PIMA Indian diabetic dataset has been widely used by researchers to classify and analyze diabetic and non-diabetic patients. However, the use of microarray gene-based datasets for diabetic class identification has received less attention. As a result, a variety of performance metrics, such as accuracy, sensitivity, specificity, and MCC, have been investigated in the context of this microarray gene-based dataset.
The main characteristics and contributions of this paper are as follows:
  • The work suggests a novel approach for the early detection and diagnosis of diabetes using microarray gene expression data from pancreatic sources.
  • Four DR techniques are used to reduce the high dimensionality of the microarray gene data.
  • Two metaheuristic algorithms are used for feature selection to further reduce the dimensionality of the microarray gene data.
  • Ten classifiers in two categories, namely nonlinear models and learning-based classifiers, are used to detect diabetes mellitus. The performance of the classifiers is analyzed based on parameters like accuracy, F1 score, MCC, error rate, FM metric, and Kappa, both with and without feature selection techniques. The enhancement of classifier performance due to feature selection is exemplified through MCC and Kappa plots.

3. Methodology

Figure 1 shows the methodology of the research. The approach includes four DR techniques: the Bessel function (BF), Discrete Cosine Transform (DCT), Least Squares Linear Regression (LSLR), and Artificial Algae Algorithm (AAA). Following this, feature selection techniques, either with or without classification of the data, are used. In terms of those with feature selection, two optimization algorithms are used: Elephant Herding Optimization (EHO) and Dragonfly Optimization (DFO). Moreover, ten classifiers are used, namely NLR, LR, GMM, EM, BLDC, LoR, SDC, SVM-L, SVM-Poly, and SVM-RBF, to classify the genes as non-diabetic and diabetic.

Role of Microarray Gene Data

Microarray gene data play a critical role in this research. The data can be used to identify patterns of gene expression that are associated with diabetes. The data are used to train and evaluate machine learning models and to identify the most relevant features for classification. The machine learning models are then used to predict whether a patient has diabetes or not. The models are trained on a dataset of microarray gene data [17] labeled with the patient’s diabetes status.

4. Materials and Methods

Microarray gene data are readily available from many search engines. We obtained human pancreatic islet data from the Nordic Islet Transplantation program (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA178122). The data has accessed on 20 August 2021. The dataset included 28,735 genes from 57 non-diabetic and 20 diabetic patients. The data were preprocessed to only select the 22,960 genes with the highest peak intensity per patient. The logarithmic transformation was applied with a base 10 to standardize the individual samples, with a mean of 0 and a variance of 1. The data were then used to train and evaluate a machine learning model for the detection of diabetes. The model was able to achieve an accuracy of 90%, which is a significant improvement over the baseline accuracy of 50%. The results of this study suggest that microarray gene data can be used to develop effective methods for the detection of diabetes. The data are readily available and can be easily processed to identify the most relevant features for classification.

Dataset

This study focused on utilizing microarray gene data to detect diabetes and explore the features associated with the condition based on p-values using probability functions. Additionally, we aimed to address the issue of false positive errors in the selection of significant genes. The data we used for our analysis are available through multiple portals and comprised a total of 28,735 human genes as shown in the Table 1. We specifically considered 50 non-diabetic and 20 diabetic classes, selecting those with the greatest minimal intensity across 70 samples. To handle the high dimensionality of the dataset, we employed four dimensionality reduction techniques, namely BF, DCT, LSLR, and AAA. This allowed us to reduce the dimensions of the data while maintaining their informative content. The resulting dimensions were [2870 × 20] for the diabetic group and [2870 × 50] for the non-diabetic group. To further refine the dataset and improve classification accuracy, we applied feature selection techniques.
Specifically, we employed two techniques: EHO search and DOA. These techniques helped identify the most relevant features in the dataset, leading to a further reduction in dimensions to [287 × 20] for the diabetic group and [287 × 50] for the non-diabetic group. To evaluate the performance and accuracy of the classification, we employed ten classifiers, as already discussed.

5. Need for Dimensionality Reduction Techniques

Dimensionality reduction plays a crucial role in our research due to the high-dimensional nature of the microarray gene data. As the number increases, the complexity and computational costs of analyzing the data also increase significantly. Dimensionality reduction techniques allow us to reduce the number of features, making the subsequent analysis more efficient and manageable. Then, dimensionality reduction helps mitigate the curse of dimensionality [18]. In highly dimensional spaces, data points tend to become sparse, leading to difficulties in accurately representing the underlying structure of the data.

5.1. Dimensionality Reduction

To reduce the dimensionality of the dataset, BF, DCT, LSLR, and AAA were used.
  • Bessel Function as Dimensionality Reduction
In this section, an overview of the Bessel function and its relevant relationships and properties associated with these functions are represented [19]. Furthermore, we investigate several valuable connections and characteristics related to these functions, as J n x possesses the following mathematical definition:
J n x = r = 0 1 r r ! Γ n + r + 1 ( x 2 ) 2 r + n
The Gamma function is represented as Γ(λ):
Γ λ = 0 e t t λ 1 d t
The series ( J n x ) converges for all values of x ranging from negative infinity to positive infinity. In fact, the Bessel function serves as a solution to a specific Sturm–Liouville equation [20]. This equation helps to analyze the Bessel function.
x 2 y x + x   y x + x 2 n 2   y   x = 0
For x , , ( n R ) .
It is evident that the Bessel functions Jn(x) are linearly independent when n is an integer. Additionally, there exist several recursive relations for Bessel functions that can be utilized in their analysis [20]. These relations provide valuable insights into the properties and behavior of Bessel functions in various mathematical contexts.
d d x x n J n x = x n J n 1 x
J n x = J n 1 x n x J n   ( x )
J n x = n x J n x J n + 1 x
Lemma 1.
A significant recursion relation that proves useful in the analysis of the Bessel function of the first kind is:
J n x = 1 2 J n 1 x 1 2 J n + 1 x
The Bessel functions can be derived using the following procedure: Consider the vector Jn = [J0(x), J1(x), J2(x), …, Jn(x)]T, where J0, J1, J2, …, Jn denote the Bessel functions evaluated at x. To obtain the derivative operational matrix, we start with the derivative of J0(x) and denote it as J0(x), where J0(x) represents the derivative of J0(x) with respect to x. By constructing a matrix D, known as the derivative operational matrix, we can express Jn = DJ0, where D is a matrix that performs the differentiation operation on J0(x) to obtain Jn(x). This recursion relation allows for the efficient calculation and evaluation of Bessel functions, providing a valuable tool in various mathematical and scientific applications.
D = 0 1 0 0 0 0 1 / 2 0 1 / 2 0 0 0 0 1 / 2 0 1 / 2 0 0 0 0 1 / 2 0 1 / 2 0 0 a 0 a 1 a 2 a 3 0 a n
  • DCT—Discrete Cosine Transform
The Discrete Cosine Transform (DCT) is a DR technique that approximates the Kernighan–Lin method. It aims to reduce the dimensions of the input data by eliminating the most significant features, thereby simplifying further analysis. By applying the DCT method [21], the input vector and its components are orthogonalized, resulting in a reduction in complexity. This method extracts features by selecting coefficients, which is a crucial step with a significant impact on computation efficiency [22,23]. The DCT can be denoted as:
k x = x ·   u = 0 s 1 a u cos π ( 2 u + 1 ) x 2 s
  • Least Squares Linear Regression (LSLR) as Dimensionality Reduction
Another effective technique for reducing dimensionality is the LSLR. Hotelling [24] initially introduced this concept, utilizing principal component analysis (PCA) as a regression analysis tool. It uses principal component analysis to reduce the dimensionality of high-dimensional data before applying a linear regression model. The transformation is learned by minimizing the sum of squared errors between the predicted lower-dimensional representation and the actual high-dimensional data.
LSLR, as discussed in Hastie et al. [25], performs dimensionality reduction by identifying the best-fit line that represents the relationship between the features of independent variables and the target as a dependent variable. The objective of LSLR is to minimize the sum of squared differences between the actual and predicted values of the target variable. Considering a set of N observations in the form (x1, y1), (x2, y2), …, (xN, yN), where xi represents the i th observation of the independent variables and yi corresponds to the observation of the target variable, the LSR solution can be represented as a linear equation:
z = α 0 + α 1 x 1 + α 2 x 2 + + α p x p
In the context of LSR, the linear model is characterized by the parameters α_1, α_2, α_3, …, α_p, where p represents the number of independent variables. This minimization process is expressed through the following equation:
S S E = j = 1 m ( z i α 0 + α 1 x 1 + α 2 x 2 + + α p x p ) 2
After applying dimensionality reduction techniques to the microarray gene data, the resulting outputs are further analyzed using various statistical parameters such as mean, kurtosis, variance, Pearson correlation coefficient (PCC), skewness, t-test, f-test, p-value, and canonical correlation analysis (CCA). These statistical measures are used to assess whether the outcomes accurately represent the intrinsic properties of the underlying microarray genes in the reduced subspace.
  • Artificial Algae Algorithm (AAA) as Dimensionality Reduction
The Artificial Algae Algorithm (AAA) [26] is a nature-inspired optimization algorithm that mimics the behavior and characteristics of real algae to solve complex problems. Each solution in the problem space is represented by an artificial alga, which captures the essence of algae’s traits. Like real algae, artificial algae exhibit helical swimming patterns and can move towards a light source for photosynthesis. The AAA consists of three fundamental components: the evolutionary process, adaptation, and helical movement as depicted in the Figure 2. The algal colony acts as a cohesive unit, moving and responding to environmental conditions. By incorporating the principles of artificial algae into the algorithm, the AAA offers a novel approach to solving optimization problems.
P o p u l a t i o n = X 11 X 12 X 1 D X 21 X 22 X 2 D X n 1 X n 2 X n D
where X n D is an algal cell in the Dth dimension of the nth algal colony.
During the evolutionary process [27] of the AAA, the growth and reproduction of algal colonies are influenced by the availability of nutrients and light. When an algal colony is exposed to sufficient light and nutrient conditions, it undergoes growth and replicates itself through a process like real mitotic division. In this process, two new algal cells are generated at time t. Conversely, if an algal colony does not receive enough light, it can survive for a certain period, but eventually perishes. It is important to note that μ m a x is assumed to be 1, as the maximum biomass conversion should be equivalent to the substrate consumption in unit time, following the conservation of mass principle. The size of the i th algal colony at time t + 1 is determined by the Monod equation, as expressed in the subsequent equation:
H i t = μ i t H i t
where i = 1, 2, 3, …, N,
H i t represents in time with i th algal colony,
N represents number of algae.
In AAA, nutrient-rich algal colonies with optimal solutions thrive, and successful traits are transferred from larger colonies to smaller ones through cell replication during the evolutionary process.
Maximumt = max H i t , whereas i = 1, 2, 3, …, N,
Minimumt = min H i t , whereas i = 1, 2, 3, …, N,
Minimumt = maximumt, whereas m = 1, 2, 3, …, D.
In the AAA, algal colonies are ranked by size at time t. In each dimension, the smallest algal colony’s cell dies, while the largest colony’s cell replicates itself.
In the AAA algorithm, algal colonies that are unable to grow sufficiently in their environment attempt to adapt by becoming more similar to the largest colony. This process changes the starvation levels within the algorithm. Each artificial alga starts with a starvation value of zero, which increases over time if the algal cell does not receive enough light. The artificial alga with the highest starvation value is the focus of adaptation.
S t a r t = m a x   B i t ,   Where   i = 1 ,   2 ,   3 ,   , N
S t a r t + 1 = S t a r t + m a x t S t a r t r a n d
Helical movement: The cells and colonies exhibit specific swimming behavior, striving to stay near the water surface where sufficient light for their survival is available. They move in a helical manner, propelled by their flagella, which face limitations from gravity and viscous drag. In the AAA, gravity’s influence is represented by a value of 0, while viscous drag is simulated as shear force, proportional to the size of the algal cell. The cell is modeled as a spherical shape, with its size determined by its volume, and the friction surface is equivalent to the surface area of a hemisphere.
τ x i = 2 π r 2
τ x i = 2 π   ( 3 H i 4 π 3 ) 2
Friction surface is represented as τ x i .
The helical movement of algal cells is determined by three randomly selected dimensions. One dimension corresponds to linear movement, as described by Equation (18). The other two dimensions correspond to angular movement, as described by Equations (19) and (20). Equation (18) is used for one-dimensional problems, allowing the algal cell or colony to move in a single direction. Equation (19) is used for two-dimensional problems, where the algal movement follows a sinusoidal pattern. Equation (20) is used for three or more dimensions, where the algal movement takes on a helical trajectory. The step size of the movement is determined by the friction surface and the distance to the light source.
X i m t + 1 = X i m t + X j m t X i m t Δ τ t X i P
X i k t + 1 = X i k t + X j k t X i k t Δ τ t X i cos α
X i l t + 1 = X i l t + X j l t X i l t Δ τ t X i sin β
where X i m t + 1 , X i k t + 1 , X i l t + 1 represents the x, y, and z coordinates of the i th algal cell at time t.
The variables α and β are in the range [0, 2π], while p is within the interval [−1, 1]. Δ represents the shear force and τ t X i denotes the surface area of the ith algal cell.

5.2. Statistical Analysis

The microarray gene data were reduced in dimension through four distinct dimensionality reduction (DR) techniques and comprehensive analysis using the statistical metrics of mean, variance, skewness, kurtosis, PCC, and CCA. This scrutiny aimed to ascertain whether the outcomes accurately portrayed the inherent properties of microarray genes within the reduced subspace. As shown in Table 2, the DR method based on AAA exhibited elevated mean and variance values across classes. In contrast, the remaining three DR methods—namely the Bessel function, Discrete Cosine Transform (DCT), and Least Squares Linear Regression (LSLR)—revealed modest and overlapping mean and variance values within classes. Among these methods, the LSLR DR approach showcased negative skewness, indicating the occurrence of skewed elements in the classes. Additionally, the DCT and LSLR DR methods demonstrated negative kurtosis, signifying their preservation of the underlying microarray gene traits. The PCC values revealed substantial correlations within the obtained outputs for a particular class. In the case of the Bessel function DR method, all four statistical parameters exhibited positive values at their minimum. This indicates an association with non-Gaussian and nonlinear distributions, a conclusion substantiated by the histograms, normal probability plots, and scatter plots of the DR method outputs. Canonical Correlation Analysis (CCA) provided insight into the correlation between DR outputs for diabetic and non-diabetic instances. Notably, the low CCA value in Table 2 suggests a limited correlation between the DR outputs of the two distinct classes.
Figure 3 shows a histogram of the Bessel function DR techniques in the diabetic class. The histogram depicts that a skewed group of values, a gap, and the existence of nonlinearity were witnessed in this method. Patients from 1 to 10 are represented as x(:,1) to x(:,10).
Figure 4 exhibits a histogram of the BF DR techniques in the non-diabetic class, in which the marker of x(:,1) represents patient 1 and x(:,10) represents patient 10. Figure 4 shows a skewed group of values, a gap, and the existence of nonlinearity in this method.
In Figure 5, data points 1 to 5 signify the reference points, 6 to 10 highlight the upper bound, and 11 to 15 depict the clustered variable points. This representation signifies the generation of a normal probability plot for features obtained using DCT DR techniques within the diabetic gene class. As can be observed from Figure 5, the plot effectively showcases the complete cluster of DCT DR outputs, accentuating the existence of variables with the nature of nonlinearity across classes.
Figure 6 shows the normal probability plot for the DCT DR techniques for the non-diabetic gene class. The data points from 1 to 5 represent references, the upper bound values are represented from 6 to 10, and the cluster variable points are from 11 to 15. The plot shows that the total cluster of DCT DR outputs and nonlinearly correlated variables among the classes were observed due to the low values of mean and variance and the presence of negative kurtosis variables in the DR method.
Figure 7 presents data points 1 to 5 as references, 6 to 10 as upper bound values, and 11 to 15 as variable points. The normal probability plot distinctly exhibits clustered groups corresponding to LSLR DR outputs. This observation underscores the existence of non-Gaussian and nonlinearly varying variables among the classes. This phenomenon can be attributed to the low variance and negative kurtosis attributes of the outcomes generated by the DR method.
Figure 8 presents the normal probability plot for LSLR DR techniques in the non-diabetic class. The plot displays a discrete group of clusters for LSLR DR outputs. The data points 1 to 5 represent references, 6 to 10 represent upper bound values, and 11 to 15 represent variable points. The flat kurtosis variable and low variance in the DR methods indicate the presence of nonlinearity and a non-Gaussian nature.
Figure 9 presents a scatter plot of the AAA DR techniques for the non-diabetic and diabetic gene classes. As can be seen, there is total clustering and overlapping of the variables in both classes. The non-Gaussian and nonlinear nature can also be observed from this graph. Furthermore, the AAA algorithm has a heavy computational cost on the classifier design. To reduce the burden of the classifiers, a feature selection process comprising the Elephant Herd Optimization (EHO) and Dragonfly algorithms was initiated.

6. Feature Selection Methods

The reduced dimensionality dataset was used for the feature selection methods. The metaheuristic algorithms of Monarch Butterfly Optimization (MBO) [28], Slime Mold Algorithm (SMA) [29], Moth Search Algorithm (MSA) [30], Hunger Games Search (HGS) [31], Runge Kutta Method (RUN) [32], Colony Predation Algorithm (CPA) [33], weIghtedmeaNoFvectOrs (INFO) [34], Harris Hawks Optimization (HHO) [35], Rime Optimization Algorithm (RIME) [36], Elephant Herding Optimization (EHO) [37] algorithm, and Dragonfly Optimization Algorithm (DOA) [38] were considered for the FS.
MBO has two operators: migration and butterfly adjusting operator. The Lévy flight is used in the butterfly adjusting operator, which has infinite mean and variance. SMA is used for attaining global optimization. It has three stages: the first is to make a better solution approach based on the slime mold bound condition through the iterations attained from the tanh function; the second is wrap food, based on SMA, that imitates the updating position of the slime mold; and the third is an oscillator, based on step size, which is considered within bound. MSA was also used to find the global optimization. Moths have the propensity to follow Lévy flights. It exhibits similar characteristics to MBO such as being non-Gaussian and having infinite mean and infinite variance. HGS is a good population-based optimizer; however, when dealing with challenging optimization problems, the classic HGS sometimes shows premature convergence and stagnation shortcomings. Therefore, finding approaches that enhance solution diversity and exploitation capabilities is crucial. RUN is also an optimization technique. Although RUN has a solid mathematical theoretical foundation, there are still some performance defects when dealing with complex optimization problems. In the initialization phase, the focus is on constructing a population that evolves over several iterations. CPA has taken inspiration from the predatory habits of groups in nature. However, CPA suffers from poor exploratory ability and cannot always escape certain solutions. Two strategies are used in the pursuit process to increase the probability of successful predation: scattering prey and surrounding prey. Prey dispersal drives the prey in different directions and weakens the prey group. The weIghtedmeaNoFvectOrs (INFO) algorithm is also a population-based optimization algorithm operating based on the calculation of the weighted mean for a set of vectors. It has three techniques to update the vectors’ location: a local search, a vector-combining rule, and the weighted mean concept for a solid structure. The INFO algorithm’s reliance on weighted mean vectors may not capture nonlinear relationships between features and target variables effectively. It focuses on selecting individual features based on their weighted mean values, so may not effectively explore interactions or combinations of features. HHO is a computational intelligence tool, and its complexity may increase with the number of features in high-dimensional datasets. It may struggle to handle large feature spaces efficiently, leading to longer execution times. It replicates Harris hawk predator–prey dynamics. It is divided into three sections: exploring, transformation, and exploitation. It has a high convergence rate and a powerful global search capability, but it has an unsatisfactory optimization effect on high-dimensional or complex problems. RIME is also a good optimization algorithm for search space mechanisms and the typical idea is to compare the updated fitness value of an agent with the global optimum; if the updated value is better than the current global optimum, then the optimum fitness value is replaced, and the agent is recorded as the optimum. The advantage of such an operation is that it is simple and fast, but it does not help in the exploration and exploitation of the population and only serves as a record. However, algorithms like EHO and DOA are used as feature selection parameters for emulating the behavior observed in elephants and dragonflies for the better selection of features and offer effective approaches to address the abovementioned challenges in optimization techniques for FS.
  • Elephant Herding Optimization (EHO) algorithm
Wang et al. [37] introduced EHO as a metaheuristic algorithm inspired by the behavior of elephants in the African savanna. It has demonstrated effectiveness in solving optimization problems and has been successfully applied in various domains, including feature selection. In feature selection, the objective is to identify a subset of informative features from a larger set that are relevant to the target variable. EHO employs a herd of elephants to search for the optimal solution, with each elephant representing a potential solution. By combining global and local search strategies, the algorithm guides the elephants towards the best solution. The methodology of the EHO is depicted in Figure 10. EHO offers immense potential as a feature selection technique due to its ability to strike a balance between global and local searches, making it suitable for high-dimensional data. The initialization of the elephant herd involves assigning random positions to the elephants in the feature space, providing a comprehensive representation of the elephants’ positions and the overall movement of the herd.
y i n e w = y i o l d + Y b e s t Y i o l d r
The EHO algorithm [39] involves updating the positions of elephants within the herd. This update process considers both the old position ( y i o l d ) and the new position ( y i n e w ) of each elephant. A control parameter (α), which falls within the range of [0, 1], is used in conjunction with a randomly generated number (r ∈ [0, 1]) to determine the new position. Additionally, each elephant in the herd maintains a memory of its best position in the feature space. The best position is updated using the following equation, ensuring that the elephant’s memory is updated accordingly.
Y b e s t = β   Y c e n t r e
Y c e n t r e   = 1 m   i = 1 m y i
The algorithm includes the concept of the best position (Ybest) for each elephant within the herd. This best position is determined by considering the control parameter (β), which falls within the range of [0, 1]. The control parameter plays a role in updating and adjusting the best position of the elephant, ensuring that it reflects the optimal solution obtained during the optimization process.
By considering both the best and worst solutions, the EHO algorithm ensures a more comprehensive exploration of the solution space, leading to improved optimization performance.
Y w o r s t = Y m i n + Y m a x Y m i n + 1 r a n d
  • Dragonfly Optimization Algorithm (DOA)
The Dragonfly Algorithm (DA) is an optimization technique based on swarm intelligence, taking inspiration from the collective behaviors of dragonflies. Introduced by Mirjalili in 2016 [38], this algorithm mimics both static and dynamic swarming behaviors observed in nature. Figure 11 shows the flowchart of DOA. During the dynamic or exploitation phase, it forms large swarms and travels in a specific direction to confuse potential threats. In the static or exploration phase, the swarms form smaller groups, moving within a limited area to hunt and attract prey [40]. The DA is guided by five fundamental principles: separation, alignment, cohesiveness, attraction, and diversion. These principles dictate the behavior of individual dragonflies and their interactions within the swarm. In the equations that follow, K and Ki denote the current position and the ith position of a dragonfly, respectively, while N represents the total number of neighboring flies.
Separation: This implies that the static phase of the algorithm focuses on preventing dragonflies from colliding with each other in their vicinity. This calculation aims to ensure the avoidance of collisions among flies.
S e j = i = 1 n k k j
where S e j represents the motion of the ith individual aimed at maintaining separation from other dragonflies.
Alignment: This denotes the synchronization of velocities among dragonflies belonging to the same group. It is represented as
A g j = i = 1 n V e i n
This is represented by A g j , which is called the velocity of the ith individual.
Cohesiveness: This represents the inclination of individual flies to converge towards the center of swarms. The calculation is
C o j = i = 1 n k i N k
Attraction: The quantification of the attraction towards the food source is characterized by
H j = K + K
Here, H j is the attraction of the food source, and K + represents the position of the food source.
Diversion: The diversion from the enemy is determined by the outward distance, which is calculated as
D j = K + K
The calculation of the outward distance determines the diversion from the enemy, and it is expressed in terms of the step vector (∆K) and the current position vector (K) are used to update the locations of artificial dragonflies within the search space. The step vector (∆K) can be calculated using the direction of movement of the dragonfly:
K j t + 1 = s S e j + a A g j + c C o j + h H j + d D j + ω K j t
The behavior of the dragonfly algorithm is influenced by factors such as separation weight (s), alignment weight (a), cohesion weight (c), attraction weight (h), and enemy weight (d). The inertia weight is represented by “ω”, and “t” represents the iteration number.
Through the manipulation of these weights, the algorithm can attain both exploration and exploitation phases. The position of the ith dragonfly at t + 1 iterations is determined by the following equation:
K j t + 1 = K j t + K j t + 1
The evaluation of this method’s outcomes is conducted by assessing the consequence of the p-value using the t-test. Table 3 demonstrates the significance of the p-values associated with the EHO and Dragonfly Algorithm methods across the four DR techniques. The data presented in Table 3 reveal that both the EHO and Dragonfly Algorithms’ feature selection methods do not exhibit significant p-values across classes for all four dimensionality reduction methods. This p-value serves as an initial indicator to quantify the existence of outliers, nonlinearity, and non-Gaussian nature among the classes after the implementation of feature selection techniques.

7. Classification Techniques

  • NLR—Nonlinear Regression
The behavior of a system is expressed through mathematical equations to facilitate representation and analysis, ultimately aiming to determine an exact best-fit line between classifier values. Nonlinear regression introduces nonlinear and random variables (a, b) to capture the complexity of the system. The primary objective of nonlinear regression is to reduce the sum of squares. This involves measuring values from the dataset and computing the difference between the mean and each data point, squaring these differences, and summing them. The minimum value of the sum of squared differences indicates a better fit to the dataset.
Nonlinear models require more attention due to their inherent complexity, and researchers have devised various methods to mitigate this difficulty, such as the Levenberg–Marquardt and Gauss–Newton methods. Estimating parameters for nonlinear systems is achieved through least squares methods, aiming to minimize the residual sum of squares. Iterative techniques, including the Taylor series, steepest descent method, and Levenberg–Marquardt method (Zhang et al. [41]), can be employed for nonlinear equations. The Levenberg–Marquardt technique is commonly used for assessing the nonlinear least squares, offering advantages and producing reliable results through an iterative process.
The authors assume a represented model:
z i = f x i ,   θ + ε i ,   w h e r e   i = 1 ,   2 ,   3 ,   ,   n
Here, x i   and   z i represent the individual and supported variables of the ith iteration, θ = ( θ 1 , θ 2 , , θ m ) are the parameters, and ε i are the error terms that follow N ( 0 , σ 2 ).
S u θ = i = 1 n z i f x i ,   θ 2
Let θ k = θ 1 k , θ 2 k ,   , θ p k be the starting values; the successive estimates are obtained using
H + τ I θ 0 θ 1 = g
where g = S u ( θ ) θ θ = θ o and H = 2 S u ( θ ) θ θ θ = θ 1 , τ are a multiplier and I is the identitymatrix.
The integrity of the model is assessed using the MSE, which quantifies the discrepancy between the experimental and estimated values. The MSE is computed as the average squared difference between the actual and predicted values. The overall experimental values are expressed in terms of N.
MSE = 1 N ( i = 1 ) N ( y i y i ) 2
The steps for nonlinear regression are the initialization of the initial parameters and the generation of curves based on these values. The goal is to iteratively modify the parameters to minimize the MSE and bring the curve closer to the desired value. The process continues until the MSE value no longer changes compared to the previous iteration, indicating convergence.
  • Linear Regression (LR)
In the investigation of gene expression data, linear regression is a suitable method for obtaining the best-fit curve, as the conveyance levels in the genes exhibit only minor variations. To identify the most informative genes, a feature selection process is performed by comparing the training dataset with the gene expression data within different levels of diversity. In this linear regression model, the dependent variable, denoted as x, is associated with the independent variable, y. The model aims [42] to predict values using the x variable, optimizing the regression fitness value based on the population in the y variable. The hypothesis function for a single variable is given by
g θ = θ 0 + θ 1 x
where θ i represents the parameters. The objective is to select the range of θ o   and   θ 1 that ensures g θ closely approximates y in the training dataset (x, y).
R   θ 0 ,   θ 1 = 1 2 m i = 1 m ( g θ x i y i ) 2
Here, “m” symbolizes the total count of samples within the training dataset. For LR models with n variables, the hypothesis function becomes
g θ = θ 0 x 0 + θ 1 x 1 + + θ n x n
and the cost function is given by
R   θ = 1 2 m i = 1 m ( g θ x i y i ) 2
where θ is a set of parameters { θ 0 ,   θ 1 ,   θ 2 , , θ n }. The gradient descent algorithm is employed to minimize the cost function, and the partial derivative of the cost function is computed as
δ δ θ j   g θ = δ δ θ j i = 1 m g θ x i y i 2
To update the parameter value θ j , the following equation is used:
θ j n e w = θ j o l d + β   1 m i = 1 m ( g θ ( x i y i ) x j i )
where β represents the learning rate, and θ j is continuously computed until convergence is reached. In this study, β is set to 0.01.
The algorithm for LR involves the following steps:
Feature selection parameters, obtained from algorithms such as the Bessel function, DCT, LSLR, and AAA, are used as input for the classifiers.
A line represented by g θ = θ 0 + θ 1 x is fitted to the data in a linear manner.
The cost function is formulated with the aim of minimizing the squared error existing between the observed data and the predictions.
The solutions are found by equating the derivatives of θ 0   and   θ 1 to zero.
To yield the coefficient of MSE, repeat steps 2, 3, and 4.
  • Gaussian Mixture Model (GMM)
GMM is a well-known unsupervised learning technique in machine learning used for many applications like pattern recognition and signal classifications. It involves integrating related objects based on clustering techniques. By classifying the data, GMM [43] facilitates the prediction and computation of unrated items within the same category. Hard and soft clustering techniques are used by GMM, and it utilizes the distribution for data analysis. Each GMM consists of multiple Gaussian distributions (referred to as “g”). The PDF of GMM combines these distributed components linearly, enabling easier analysis of the generated data. When generating random values as a vector “a” within an n-dimensional sample space χ, if “a” adheres to a Gaussian distribution, the expression for its probability distribution function is as follows:
p a = 1 ( 2 π ) n / 2 Σ 1 / 2 e ( 1 2 ) a μ T Σ 1 ( a μ )
Here, μ represents the mean vector in the n-dimensional space, and Σ is the covariance matrix of size n × n . The determination of the covariance matrix and mean vector is essential for the Gaussian distribution. Multiple components are mixed in the Gaussian distribution function [44], with each component and the equation of mixture distribution given by
P Q a = i = 1 k j × p a μ j , Σ j
In this equation, j represents the mixing coefficient corresponding to the j th Gaussian mixture, while μ j   and   Σ j denote the mean vector and covariance matrix of that Gaussian component, respectively.
  • Expectation Maximum (EM)
The EM algorithm [45] serves as a classifier in this context. Its primary objective is to estimate missing values within a dataset and subsequently predict those values to maximize the dataset’s order based on the application’s requirements. Consider two random variables, X and Y, involved in the prediction process and determining the order of the data in rows. Variable X is observable and known in the dataset, while the unknown variable Z needs to be predicted to set the value of Y.
L   θ ; X , Y = p X , Y θ
L θ x   ε     α p   X θ ;   α > 0 .   p X θ p Y θ
The maximum likelihood estimation is obtained as
L θ ; X = p   X θ = Y p X , Z θ
To estimate the expected value of the log-likelihood function, we calculate
Q θ θ ^ ( t ) = E z | x i   θ ^ ( t )   [ log   L   ( θ ; X , Y ) ]
The above quantity is maximized to compute the maximum value, resulting in
θ ^ ( t + 1 ) =   arg   max   Q θ θ ^ ( t )
The expectation and maximization steps are iteratively repeated until a converged sequence of values is reached as mentioned in Figure 12 which is the flow diagram of expectation maximization.
  • Bayesian Linear Discriminant Classifier (BLDC)
BLDC [46] is commonly employed to regularize high-dimensional signals, reduce noisy signals, and improve computational efficiency. Before conducting Bayesian linear discriminant analysis, an assumption is made that a target, denoted as b, is related to a vector x with the addition of white Gaussian noise, c.
This relationship can be expressed as a = x T b + c . The function x is assigned weights, and its likelihood function is given by
p G β , x = β 2 π c 2 exp β 2 B T x m
where the pair B , m represents G. The x of prior distribution is expressed as
p x α = α 2 π 1 2 ε 2 π 1 2 exp ( 1 2 x T H α x )
The regularization square matrix is given by
H ( α ) = α 0 0 ε ( l + 1 ) ( l + 1 )
and α is a hyperparameter obtained from data forecasting, while l represents the assigned vector number. By applying Bayes’ rule, x can be calculated as
p x β , α , G = P G β , x P x α P G β , x P y α d y
The mean vector υ and the covariance matrix X must adhere to the specific norms outlined in Equations (50) and (52) for the posterior distribution. The predominant nature of the posterior distribution is Gaussian.
υ = β ( β B B T + H α ) 1 B a
X = ( β B B T + H α ) 1
When predicting the input vector b ^ , the probability distribution for regression can be expressed as
p a ^ β , α , b , G ^ = p a ^ β , b , x ^ p x β , α , G d y
Again, the nature of this prediction analysis is predominantly Gaussian, with the mean expressed as μ = υ T b ^ and variance expressed as δ 2 = 1 β + b T ^ X b ^ .
  • Logistic Regression (LoR)
Logistic Regression (LoR) has proven to be effective in classifying diseases such as diabetes, types of cancer, and epilepsy. In this context, function y that represents the disease level is considered, ranging from 0 to 1 to indicate non-diabetic and diabetic patients, respectively. Gene expressions are represented by a vector x = x 1 ,   x 2 ,   ,   x m , where each element x j corresponds to the expression level of the jth gene. Using a model-based approach for a dataset Π ( x ) , the aim is to identify informative genes for diabetic patients based on the likelihood of y being 1 given x. To achieve dimensionality reduction, logistic regression is utilized to select the most relevant “q” genes. The gene expression representation x j * corresponds to the gene expression, with j ranging from 1 to q, while the binary disease status is denoted by y i , where i ranges from 1 to n. The logistic regression model can be expressed as
L o g i t Π x = υ 0 + j = 1 q υ j x j *
The objective is to maximize the fitness and log-likelihood, which can be achieved by obtaining the following function
1 υ 0 , υ = j = 1 n y i log π i + 1 π i 1 2 τ 2 υ 2
where τ is a parameter that limits the reduction in υ near 0, π i = π ( x i ) as defined by the model [47,48], and υ 2 denotes the Euclidean length of υ = υ 1 , υ 2 , , υ p . The selection of q   and   τ are determined using the parametric bootstrap method, which imposes constraints on accurate error prediction. Initially, υ   = 0, for the purpose of calculating the cost function. It is then varied with different parameters to minimize the cost function. The sigmoid function is applied to restrict values between 0 and 1, serving as an attenuation mechanism. The threshold cut-off value of 0.5 is used to classify patients as either diabetic or non-diabetic. Any probability below the threshold is considered indicative of non-diabetic patients, while values above the threshold indicate diabetic patients.
  • SDC—Softmax Discriminant Classifier
The SDC is used to verify and detect the group to which a particular test sample belongs [49]. It weighs the distance between the training samples and the test sample within a particular class or group of data. Z is represented as
Z = Z 1 , Z 2 , , Z q R c × d
consisting of samples from distinct classes named q, Z q = Z 1 q , Z 2 q ,   , Z d q q   R c × d q . Each class, represented by Z q , contains samples from the q th class, where i = 1 q d i = d .   The sum of the sample sizes, given a test sample K     R c × 1 , is passed through the classifiers to obtain minimal construction errors, thereby assigning it to the class q. The transformation of class samples and test samples in SDC involves nonlinear enhancement values. This is achieved through the following equations:
h K = arg max Z w i
h K = arg max i log j = 1 d i exp ( λ v υ j i 2 )
In these equations, h K represents the differentiate of the i th class and v υ j i 2 approaches zero, resulting in the maximization of Z w i . This asymptotic behavior leads to the maximum likelihood of the test sample belonging to a particular class.
  • Support Vector Machines
The SVM classifier is a significant machine learning approach widely used for classification problems, particularly in the phase of nonlinear regression [50]. In this study, three distinct methods are explored for data classification:
SVM-Linear: this method utilizes a linear kernel to classify the data.
SVM-Polynomial: this approach involves the use of a polynomial kernel for data classification.
SVM-Radial Basis Function (RBF): the RBF kernel is used here to classify the data.
These three SVM methods offer different strategies for effectively classifying datasets, allowing researchers to choose the most suitable approach based on their specific classification requirements.
The training time and computational complexity of the SVM depend on the data and classifiers used. When the number of supports in the SVM increases, it results in higher computational requirements due to the calculation of floating-point multiplications and additions. To address this issue, K-means clustering techniques have been introduced to reduce the number of supports in the SVM. In the linear case, Lagrange multipliers can be employed, and the data points on the borders are expressed as ν = i = 1 m α i z i y i T . Here, m represents the number of supports, z i represents the target labels for y, and the linear discriminant function is used.
h y = s g n i = 1 m α i z i y i T y + C
The process of implementing the Support Vector Machine (SVM) involves several key steps.
Step 1: The first step is to use quadratic optimization to linearize and converge the problem. By transforming the primal minimization problem into a dual optimization problem, the objective is to maximize the dual Lagrangian LD with respect to α i :
M a x L D = i = 1 l α i 1 2 i = 1 l j = 1 l α i α j y i y j ( X i · X j )
subject to i = 1 l α i y i = 0 , where α i 0 i = 1 ,   2 ,   3 ,   ,   l .
Step 2: The next step involves solving the quadratic polynomial programming to obtain the optimal separating hyperplane. The data points with non-zero Lagrangian multipliers ( i > 0 ) are identified as the support vectors.
Step 3: The optimal hyperplane is determined based on the support vectors, which are the data points closest to the decision boundary in the trained data.
Step 4: K-means clustering is applied to the dataset, grouping the data into clusters according to the conditions from Steps 2 and 3. Three points are randomly chosen from each cluster as the center points, which are representative points from the dataset. Each center point acquires the points around them.
Step 5: When there are six central points, each representing an individual cluster, the SVM training data are acquired through the utilization of kernel methods.
Polynomial Function: K (X, Z) = (XT Z + 1) d
Radial   Basis   Function :   k x i , x j = exp x i x j 2 ( 2 σ ) 2

7.1. Training and Testing of Classifiers

Due to the limited availability of training data, we employed k-fold cross-validation, a widely used technique for evaluating machine learning models. The methodology described by Fushiki et al. [51] was followed to conduct the k-fold cross-validation. Initially, the dataset was divided into k equally sized subsets or “folds”. This process was repeated for all k-folds, ensuring that each fold was used once for testing. Consequently, k performance estimates (one for each fold) were obtained. To obtain an overall estimate of the model’s performance, the average of these k performance estimates was calculated. After training and validating the model using k-fold cross-validation, it was retrained on the complete dataset to make predictions on new, unseen data. The significant advantage of utilizing this method is the more reliable model performance compared to other test split methods, as the technique maximizes the utilization of the available data. Here, we adopted a k-value of 10-fold cross-validation. Furthermore, the research incorporated 2870 dimensionally reduced features per patient, focusing on a cohort of 20 patients with diabetes and 50 non-diabetic patients. The utilization of cross-validation eliminates any reliance on a specific pattern for the test set, enhancing the robustness of our findings. The training process is regulated in the MSE proposed by Wang et al. [52], which is defined as follows:
MSE = 1 N j = 1 N O j T j 2
where Oj is the observed value at time j, and Tj is the target value at model j.
Table 4 represents confusion matrix for detecting diabetes. The following terms in Table 4 can be defined as:
TP—true positive: a patient is accurately classified into the diabetic class.
TN—true negative: a patient is accurately recognized as belonging to the non-diabetic class.
FP—false positive: a patient is inaccurately classified as belonging to the diabetic class when they actually belong to the non-diabetic class.
FN—false negative: a patient is inaccurately classified as being in the non-diabetic class when they should be categorized as belonging to the diabetic class.
Table 4. Confusion matrix for detecting diabetes.
Table 4. Confusion matrix for detecting diabetes.
Clinical SituationPredicted Values
DiabeticNon-Diabetic
Real ValuesDiabetic classTPFN
Non-diabetic classFPTN
Table 5 provides insight into the performance of the classifiers without the feature selection method, focusing on the training and testing Mean Squared Error (MSE) across various DR techniques. The training MSE values consistently range between 10−4 and 10−10, while the testing MSE varies from 10−4 to 10−8. Among the classifiers, the SVM (RBF) classifier using the AAA DR technique without feature selection achieves the lowest training and testing MSE, specifically 1.93 × 10−10 and 1.77 × 10−8, respectively. Notably, a lower testing MSE indicates superior classifier performance. It is evident from Table 5 that higher testing MSE values correspond to lower classifier performance, regardless of the DR techniques used.
Table 6 exhibits the training and testing of MSE in the classifiers with EHO feature selection method across all four DR techniques. The training MSE varies from 10−5 to 10−10, while the testing MSE varies between 10−5 and 10−8. The SVM (RBF) classifier in the AAA DR method with PSO feature selection achieved a minimum training and testing MSE of 1.99 × 10−10 and 2.5 × 10−8, respectively. The Bessel function DR method indicates slightly lower training and testing MSE values for the classifiers when compared to the other three DR techniques. All of the classifiers had slightly enhanced testing performance when compared to methods without feature selection. This indicates the enhancement in classifier performance irrespective of the DR technique.
Table 7 demonstrates the training and testing Mean Squared Error (MSE) performance of classifiers utilizing the Dragonfly Algorithm-based feature selection method across various dimensionality reduction techniques. The training MSE values range from 10−6 to 10−9, while the testing MSE varies between 10−5 and 10−8. The SVM (RBF) classifier, when combined with the Dragonfly feature selection method, achieved a minimal training MSE of 1.66 × 10−9 and a testing MSE of 3.25 × 10−8. Notably, this feature selection method led to improvements in the training and testing performance of all classifiers. This enhancement is reflected in improved accuracy, MCC, and Kappa parameters, regardless of the specific dimensionality reduction technique employed.

7.2. Selection of Target

The non-diabetic class ( T N D ) target value is taken at the lower side from 0→1, and this is mapped according to the following constraint:
1 N i = 1 N μ i T N D
Here, μ i represents the mean value of the input feature vectors for the N number of non-diabetic features considered for classification. Similarly, for the diabetic class ( T D i a )), the target value is mapped to the upper end of the zero-to-one (0→1) scale. This mapping is established based on the following:
1 M j = 1 N μ j T D i a
Here, μ j signifies the average value of input feature vectors for the M number of diabetic cases used for classification. It is important to highlight that the target value T D i a is set to be higher than the average values of μ i and μ j . This selection of target values requires the discrepancy between them to be at least 0.5, as expressed by the following:
| | T D i a T N D | | 0.5
The targets for non-diabetic T N D and diabetic T D i a are chosen as 0.1 and 0.85, respectively. Once the target is fixed, MSE is used for evaluating the performance of the classifiers. Table 8 shows the selection of optimal parameters for the classifiers after training and testing process.

8. Results and Findings

The study employs the conventional tenfold testing and training approach, where 10% of the input is dedicated to testing, while the remaining 90% is utilized for training. The selection of performance metrics is pivotal for assessing the efficacy of classifiers. The assessment of classifier performance, especially in binary classification scenarios like distinguishing between diabetic and non-diabetic cases from pancreatic microarray gene data, relies on the utilization of a confusion matrix. This matrix facilitates the computation of performance metrics including accuracy, F1 score, MCC, error rate, FM metrics, and Kappa, which are commonly utilized to gauge the comprehensive performance of the model. The relevant parameters associated with the classifiers for performance analysis are illustrated in Table 9.
The performance of the classifier was evaluated using several metrics, including Acc, F1 score, MCC, ER, FM, and Kappa. Accuracy is the fraction of predictions that are correct, and it is a measure of the overall performance of the classifier. F1 score is the harmonic mean of precision and recall, and it is a measure of the classifier’s ability to both correctly identify positive instances and to correctly identify negative instances. MCC is a measure of the correlation between the observed and predicted classifications, and it is a more sensitive metric than accuracy or F1 score. Error rate is the fraction of predictions that are incorrect, and it is the complement of accuracy. The FM metric is a generalization of the F-measure that adds a beta parameter, and it is a measure of the classifier’s ability to both correctly identify the values of positive and negative instances, with a weighting that can be adjusted to favor one or the other. Kappa is a statistic that measures agreement between observed and predicted classifications, adjusted for chance. The results are tabulated in Table 10.
Table 10 illustrates the performance analysis of ten classifiers, considering metrics such as Acc, F1 score, MCC, ER, F-measure, and Kappa values. This analysis is conducted for four DR methods without the incorporation of two feature selection methods. Table 9 reveals that the EM classifier in the Bessel function DR technique achieves a moderate accuracy of 61.42%, an F1 score of 54.23%, a moderate error rate of 38.57%, and an F-measure of 57.28%. However, the EM classifier exhibits a lower MCC value of 0.3092 and a Kappa value of 0.2645. On the other hand, the SVM (linear) classifier in the Bessel function DR method demonstrates a low accuracy of 52.85% along with a high error rate of 47.15%. Additionally, it exhibits an F1 score of 40% and an F-measure of 41.57%. The MCC and Kappa values for the SVM (linear) classifier are notably low, at 0.06324 and 0.05714, respectively. Across the Bessel function DR techniques, all classifiers exhibit poor performance in the various metrics. This trend can be attributed to the intrinsic properties of the Bessel function, which is evident from the non-negative values of the statistical parameters. Equally, the SVM (RBF) classifier in the context of the DCT DR technique achieves a respectable accuracy of 88.57%, complemented by a low error rate of 11.42%. Furthermore, it attains an F1 score of 81.81% and an F-measure of 82.15%. The MCC and Kappa values of the SVM (RBF) classifier reach 0.7423 and 0.7358, respectively. Within the SVM (RBF) classifier, the AAA DR technique exhibits a remarkable accuracy score of 90%, coupled with a low error rate of 10%. This is accompanied by an F1 score of 84.44% and an F-measure of 84.97%. The MCC and Kappa values of the SVM (RBF) classifier are noteworthy, totaling 0.7825 and 0.772, respectively. Remarkably, regardless of the DR technique employed, all classifiers manage to maintain accuracy within the range of 52% to 85%. This is primarily due to the inherent limitations of the DR techniques. Therefore, incorporating feature selection methods is highly recommended to enhance the performance of these classifiers.
Figure 13 provides an overview of the performance analysis of ten classifiers concerning the metrics of accuracy, F1 score, error rate, and F-measure values. This analysis is carried out within the context of four dimensionality reduction methods, specifically without feature selection methods. Table 10 shows that the EM classifier in the Bessel function DR technique achieves a modest accuracy of 61.42%, along with an F1 score of 54.23%. Moreover, it exhibits a moderate error rate of 38.57% and an F-measure of 57.28%. On the other hand, the SVM (linear) classifier in the Bessel function DR method demonstrates a lower accuracy of 52.85%. This classifier is accompanied by a higher error rate of 47.15%, an F1 score of 40%, and an F-measure of 41.57%. Across the performance metrics, all classifiers exhibit suboptimal performance within the Bessel function DR technique. This trend is observed consistently across various measures. However, the SVM (RBF) classifier within the DCT DR technique maintains an impressive accuracy level of 88.57%. Furthermore, it exhibits a commendably low error rate of 11.42%, an F1 score of 81.81%, and an F-measure of 82.15%. Employing the AAA DR technique in the SVM (RBF) classifier results in achieving an elevated accuracy rate of 90%. Additionally, this combination yields a notably low ER of 10% and an F1 score of 84.44%, accompanied by an F-measure of 84.97%.
Table 11 presents an in-depth analysis of the performance of ten classifiers concerning four DR methods integrated with the EHO feature selection technique. Notably, the SVM (RBF) classifier within the AAA DR technique achieves an exceptional accuracy of 95.71%. This classifier further demonstrates a commendable F1 score of 92.68%, accompanied by a notably low error rate of 4.28% and an impressive F-measure of 92.71%. Additionally, the SVM (RBF) has a high MCC value of 0.897 and a Kappa value of 0.8965. However, a contrasting performance is observed with the SVM(Linear) classifier within the Bessel function DR technique. Once again, this classifier registers a relatively low accuracy of 50%, coupled with a high error rate of 50%. Further metrics include an F1 score of 36.36% and an F-measure of 37.79%. Intriguingly, the SVM (Linear) classifier achieves null values for both MCC and Kappa, marking a unique and distinctive characteristic of its performance. All classifiers exhibit improved accuracy within the DCT, LSLR, and AAA DR techniques. However, the impact of the EHO feature selection method does not translate into substantial enhancements for classifiers employing the Bessel function DR method.
Figure 14 presents the analysis of the ten classifiers concerning the four DR methods combined with the EHO feature selection techniques. Furthermore, it is evident from the insights presented in Table 11 that the SVM (RBF) classifier, operating within the AAA DR technique, achieves an impressively high accuracy of 95.71%. Additionally, this classifier demonstrates a notable F1 score of 92.68%, accompanied by a commendably low error rate of 4.29% and an impressive F-measure of 92.71%. Equally, the SVM(Linear) classifier used within the Bessel function DR technique reflects a lower accuracy of 50%, coupled with a higher error rate of 50%. Correspondingly, the F1 score is registered at 36.36%, and the F-measure reaches 37.79%. Overall, the classifiers exhibit relatively low performance within the context of the Bessel function DR technique.
Table 12 presents the analysis of the ten classifiers concerning the four DR methods combined with the Dragonfly method. As depicted in Table 12, it is evident that the SVM (RBF), operating within the AAA DR technique, achieves an impressively high accuracy rate of 94.28%. Moreover, this classifier demonstrates a commendable F1 score of 90.47%, accompanied by a relatively low error rate of 5.72% and an appreciable F-measure of 90.57%. Furthermore, the SVM (RBF) classifier exhibits notable values of MCC and Kappa, standing at 0.866 and 0.864, respectively. On the other hand, the SVM (Polynomial) classifier, applied within the context of the Bessel function DR technique, achieves a lower accuracy rate of 58.57%. Correspondingly, it registers a higher error rate of 41.43%, along with an F1 score of 43.13% and an F-measure of 44.17%. However, the MCC and Kappa values for the SVM (Polynomial) classifier are notably lower, reaching 0.1364 and 0.1287, respectively. Among the classifiers utilized in the Bessel function DR method, only the SVM (RBF) classifier achieves an accuracy above 78%. Additionally, the SVM (RBF) classifier attains high accuracy in the DCT DR and LSLR DR methods, reaching 91% and 90%, respectively.
Figure 15 illustrates the performance assessment of the ten classifiers concerning the four DR methods, paired with the Dragonfly feature selection technique. It is observed from Table 12 that the SVM (RBF) classifier, within the AAA DR technique, attains a notably high accuracy rate of 94.28%. This classifier also demonstrates a commendable F1 score of 90.47%, coupled with a comparatively low ER of 5.72%, and a noteworthy F-measure of 90.57%. Conversely, the SVM (Polynomial) classifier, employed in the context of the Bessel function DR technique, registers a relatively low accuracy of 58.57%. Correspondingly, it records a higher error rate of 41.43%, accompanied by an F1 score of 43.13%, and an F-measure of 44.17%. Among the four dimensionality reduction methods, the SVM (RBF) classifier consistently achieves individual accuracy levels exceeding 81%. However, it is important to note that the classifier’s performance in the Bessel function DR method, when paired with the Dragonfly feature selection, remains in the lower performance category.
Figure 16 presents the comparative analysis of the MCC and Kappa parameters across the various classifiers concerning the four different DR techniques. This analysis was conducted for measuring the MCC and Kappa that serve as benchmarks, shedding light on the performance outcomes of the classifiers across diverse inputs. In this study, the inputs are categorized into three groups: dimensionally reduced without feature selection, with EHO feature selection, and with Dragonfly feature selection methods. These classifiers’ performance is evaluated based on the MCC and Kappa values derived from these inputs. The average MCC and Kappa values across the classifiers are calculated to be 0.2984 and 0.2849, respectively. A systematic approach is formulated to assess the classifiers’ performance, drawing insights from Figure 14. The MCC values are categorized into three ranges: 0.0–0.25, 0.251–0.54, and 0.55–0.9. Notably, the classifiers exhibit poor performance within the first range, while the MCC vs. Kappa slope demonstrates a significant upsurge within the second range of MCC values. In contrast, the third range of MCC values corresponds to a higher level of classifier performance, devoid of any substantial anomalies.
Figure 17 shows a histogram of the error rate and MCC (%) parameters that were analyzed. It can be seen that the maximum error rate is 50% and the maximum MCC is 90%. The histogram of the error rate is skewed at the right side of the graph, which indicates that for any of the DR methods, and irrespective of the feature selection method, the classifier’s error rate does not go beyond 50%. The histogram of MCC depicts the classifier as being sparser at the edges and covering more points in the middle area.

8.1. Computational Complexity (CC)

The analysis of the classifiers in this study considers their CC, which is determined based on the size of the input (denoted as O(n)). A lower CC, indicated by O(1), is desirable as it indicates that the complexity remains constant regardless of the input size. However, it is directly proportional to the number of inputs and computational complexity. It is notable that CC is independent of the size of the input, which is a favorable characteristic for any type of algorithm. If it increases logarithmically with the increase in ‘n’, it is represented as O(logn). Additionally, hybrid models of classifiers are used that incorporate DR techniques and feature selection methods in their classification process.
Table 13 presents the CC of the classifiers without incorporating feature selection methods. A noteworthy observation from the table is that the CC of all of the classifiers is relatively similar. However, their performance in terms of accuracy is relatively low. Among the classifiers, the Bessel function classifier demonstrates a moderate CC of O(n3logn), while the Discrete Cosine Transform, Least Squares Linear Regression, and Artificial Algae Algorithm exhibit higher CC with improved accuracy, represented by O(2n4log2n), O(2n5log4n), and O(2n5log8n), respectively, when compared to the other classifiers. Additionally, when considering the values of MCC and Kappa, the DST, LSLR, and AAA classifiers exhibit similar performance.
Table 14 illustrates the CC of the classifiers utilizing the EHO feature selection method. The table reveals that the CC of all of the classifiers is relatively similar, while their performance demonstrates significant accuracy. Similar to the case without feature selection, the Expectation Maximum classifier exhibits a higher computational complexity of O(n5logn) along with remarkable accuracy. Regarding the DCT, LSLR, and AAA classifiers, they achieve similar CC to the SVM (RBF) classifier with O(2n6log2n), O(2n7log4n), and O(2n7log8n), respectively. Notably, the SVM (RBF) classifier in combination with the EHO feature selection technique for DCT, LSLR, and AAA achieves the highest accuracy among all classifiers, with accuracies of 90%, 88.57%, and 95.71%, respectively. Furthermore, the corresponding Kappa values for these classifiers are 0.7655, 0.65, and 0.8965, indicating their strong performance.
Table 15 provides insights into the CC of the classifiers using the Dragonfly method. From the table, it can be seen that the CC of all of the classifiers is relatively similar, while their performance exhibits a significant level of accuracy. Notably, all four dimensionality reduction techniques demonstrate the highest CC compared to their counterparts. Specifically, the Bessel function, DCT, LSLR, and AAA classifiers achieve a computational complexity of O(8n5log2n), O(8n5log2n), O(8n6log4n), and O(8n6log8n), respectively. Regarding accuracy, the Bessel function, DCT, LSLR, and AAA classifiers achieve the highest accuracy values of 81.42%, 91.42%, 90%, and 95.71%, respectively. Moreover, the corresponding Kappa values for these classifiers are 0.538, 0.796, 0.772, and 0.864, indicating their robust performance. A comparison with previous work is provided in Table 16.
As observed in Table 16, it is evident that a variety of machine learning classifiers, including SVM (RBF), NB, LoR, DT, NLR, RF, multilayer perceptron, and DNN, have been employed for diabetic classification using clinical databases. The accuracies of these classifiers span the range of 67% to 95%. However, the present investigation focuses on diabetes detection using microarray gene data, where SVM (RBF) stands out with an accuracy of 95%.

8.2. Limitations and Major Outcomes

The findings of this study may be limited to the specific population of type II diabetes mellitus patients and may not be applicable to other populations or different types of diabetes. The analysis in this study relies on microarray gene data, which may not be readily available or accessible in all healthcare settings. The methods proposed in this study, such as microarray gene arrays, may involve complex and expensive procedures that are not feasible for routine clinical practice. The performance of the classifiers in this study may be influenced by the presence of outliers in the data. Outliers can have a significant impact on the accuracy and reliability of the classification results. The developed classification approach, which utilizes various dimensionality reduction techniques and feature selection methods, has demonstrated its potential in effectively screening and predicting diabetic markers, while also identifying associated diseases such as strokes, kidney failure, and neuropathy. An outcome of this study is the establishment of a comprehensive database for the mass screening and sequencing of diabetic genomes. By incorporating microarray gene data and leveraging the proposed classification techniques, this database enables the identification of patterns and trends in diabetes outbreaks associated with different lifestyles.
The ability to detect diabetes in its early stages and predict associated diseases is of utmost importance for chronic diabetic patients. This will facilitate timely interventions, improve disease management, and, ultimately, lead to better patient outcomes. Overall, this study contributes valuable insights to the field and lays the foundation for further investigations into the early detection and management of type II diabetes mellitus patients.

9. Conclusions

The results showed that the classifiers exhibited lower accuracy and other performance metrics when using the BF-DR technique, which can be attributed to the inherent limitations of the Bessel function. However, the DCT and LSLR techniques produced improved accuracy and performance metrics for specific classifiers, such as the SVM (RBF) classifier. In particular, the AAA technique, combined with the SVM (RBF) classifier, achieved the highest accuracy of 90% without feature selection. The SVM (RBF) classifier in combination with the EHO feature selection technique achieved the highest accuracy values of 81.42, 90%, 88.57%, and 95.71% for BF, DCT, LSLR, and AAA, respectively. With the use of the Dragonfly feature selection method, which also showed promising results, the classifiers achieved high accuracy values of 81.42%, 91.42%, 90%, and 94.28% for BF, DCT, LSLR, and AAA, respectively. In terms of computational complexity, we observed that the classifiers exhibited similar complexities across the different dimensionality reduction techniques. However, their performance in terms of accuracy varied significantly. Notably, the SVM (RBF) classifier in combination with the EHO feature selection technique consistently achieved the highest accuracy values across the different dimensionality reduction techniques. In conclusion, this research article presents a novel method for detecting type II DM using microarray gene data. Future work will be carried out in the direction of the Convolution Neural Network (CNN), Deep Learning Network (DNN), LSTM, and hyperparameter tuning of classifiers. Moreover, this approach will be used for continuous monitoring in clinical practice.

Author Contributions

Conceptualization, D.C.; methodology, D.C. and H.R.; software, D.C.; validation, H.R.; formal analysis, D.C. and H.R.; investigation, D.C. and H.R.; resources, D.C. and H.R.; data curation, H.R.; writing—original draft, D.C.; writing—review and editing, H.R.; visualization, D.C.; supervision, H.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Facts & Figures. International Diabetes Federation. Available online: https://idf.org/about-diabetes/facts-figures/ (accessed on 20 August 2021).
  2. Pradeepa, R.; Mohan, V. Epidemiology of type 2 diabetes in India. Indian J. Ophthalmol. 2021, 69, 2932–2938. [Google Scholar] [CrossRef]
  3. Chockalingam, S.; Aluru, M.; Aluru, S. Microarray data processing techniques for genome-scale network inference from large public repositories. Microarrays 2016, 5, 23. [Google Scholar] [CrossRef]
  4. Herman, W.H.; Ye, W.; Griffin, S.J.; Simmons, R.K.; Davies, M.J.; Khunti, K.; Wareham, N.J. Early detection and treatment of type 2 diabetes reduce cardiovascular morbidity and mortality: A simulation of the results of the Anglo-Danish-Dutch study of intensive treatment in people with screen-detected diabetes in primary care (ADDITION-Europe). Diabetes Care 2015, 38, 1449–1455. [Google Scholar] [CrossRef]
  5. Strianese, O.; Rizzo, F.; Ciccarelli, M.; Galasso, G.; D’Agostino, Y.; Salvati, A.; Rusciano, M.R. Precision and personalized medicine: How genomic approach improves the management of cardiovascular and neurodegenerative disease. Genes 2020, 11, 747. [Google Scholar] [CrossRef] [PubMed]
  6. Abul-Husn, N.S.; Kenny, E.E. Personalized medicine and the power of electronic health records. Cell 2019, 177, 58–69. [Google Scholar] [CrossRef] [PubMed]
  7. Schnell, O.; Crocker, J.B.; Weng, J. Impact of HbA1c testing at point of care on diabetes management. J. Diabetes Sci. Technol. 2017, 11, 611–617. [Google Scholar] [CrossRef]
  8. Lu, H.; Chen, J.; Yan, K.; Jin, Q.; Xue, Y.; Gao, Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017, 256, 56–62. [Google Scholar] [CrossRef]
  9. American Diabetes Association Professional Practice Committee. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2022. Diabetes Care 2022, 45 (Suppl. S1), S17–S38. [Google Scholar] [CrossRef]
  10. Jakka, A.; Jakka, V.R. Performance evaluation of machine learning models for diabetes prediction. Int. J. Innov. Technol. Explor. Eng. Regul. Issue 2019, 8, 1976–1980. [Google Scholar] [CrossRef]
  11. Radja, M.; Emanuel, A.W.R. Performance evaluation of supervised machine learning algorithms using different data set sizes for diabetes prediction. In Proceedings of the 2019 5th International Conference on Science in Information Technology (ICSITech), Jogjakarta, Indonesia, 23–24 October 2019. [Google Scholar] [CrossRef]
  12. Dinh, A.; Miertschin, S.; Young, A.; Mohanty, S.D. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med. Inform. Decis. Mak. 2019, 19, 211. [Google Scholar] [CrossRef]
  13. Yang, T.; Zhang, L.; Yi, L.; Feng, H.; Li, S.; Chen, H.; Zhu, J.; Zhao, J.; Zeng, Y.; Liu, H.; et al. Ensemble learning models based on noninvasive features for type 2 diabetes screening: Model development and validation. JMIR Med. Inform. 2020, 8, e15431. [Google Scholar] [CrossRef]
  14. Muhammad, L.J.; Algehyne, E.A.; Usman, S.S. Predictive supervised machine learning models for diabetes mellitus. SN Comput. Sci. 2020, 1, 240. [Google Scholar] [CrossRef]
  15. Kim, H.; Lim, D.H.; Kim, Y. Classification and prediction on the effects of nutritional intake on overweight/obesity, dyslipidemia, hypertension and type 2 diabetes mellitus using deep learning model: 4–7th Korea national health and nutrition examination survey. Int. J. Environ. Res. Public Health 2021, 18, 5597. [Google Scholar] [CrossRef] [PubMed]
  16. Lawi, A.; Syarif, S. Performance evaluation of naive Bayes and support vector machine in type 2 diabetes Mellitus gene expression microarray data. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2019; Volume 1341, p. 042018. [Google Scholar]
  17. Ciaramella, A.; Staiano, A. On the role of clustering and visualization techniques in gene microarray data. Algorithms 2019, 12, 123. [Google Scholar] [CrossRef]
  18. Velliangiri, S.; Alagumuthukrishnan, S.; Joseph, S.I.T. A review of dimensionality reduction techniques for efficient computation. Procedia Comput. Sci. 2019, 165, 104–111. [Google Scholar] [CrossRef]
  19. Parand, K.; Nikarya, M. New numerical method based on generalized Bessel function to solve nonlinear Abel fractional differential equation of the first kind. Nonlinear Eng. 2019, 8, 438–448. [Google Scholar] [CrossRef]
  20. Bell, W.W. Special Functions for Scientists and Engineers; Courier Corporation: North Chelmsford, MA, USA, 1967. [Google Scholar]
  21. Kalaiyarasi, M.; Rajaguru, H. Performance Analysis of Ovarian Cancer Detection and Classification for Microarray Gene Data. BioMed Res. Int. 2022, 2022, 6750457. [Google Scholar] [CrossRef]
  22. Ahmed, N.; Natarajan, T.; Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. 1974, C-23, 90–93. [Google Scholar] [CrossRef]
  23. Epps, J.; Ambikairajah, E. Use of the discrete cosine transform for gene expression data analysis. In Proceedings of the Workshop on Genomic Signal Processing and Statistics, Baltimore, MD, USA, 26–27 May 2004; Volume 1. [Google Scholar]
  24. Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. [Google Scholar] [CrossRef]
  25. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009; Volume 2. [Google Scholar]
  26. Uymaz, S.A.; Tezel, G.; Yel, E. Artificial algae algorithm (AAA) for nonlinear global optimization. Appl. Soft Comput. 2015, 31, 153–171. [Google Scholar] [CrossRef]
  27. Prabhakar, S.K.; Lee, S.W. An integrated approach for ovarian cancer classification with the application of stochastic optimization. IEEE Access 2020, 8, 127866–127882. [Google Scholar] [CrossRef]
  28. Parhi, P.; Bisoi, R.; Dash, P.K. Influential gene selection from high-dimensional genomic data using a bio-inspired algorithm wrapped broad learning system. IEEE Access 2022, 10, 49219–49232. [Google Scholar] [CrossRef]
  29. Ewees, A.A.; Al-Qaness, M.A.; Abualigah, L.; Algamal, Z.Y.; Oliva, D.; Yousri, D.; Elaziz, M.A. Enhanced feature selection technique using slime mould algorithm: A case study on chemical data. Neural Comput. Appl. 2023, 35, 3307–3324. [Google Scholar] [CrossRef] [PubMed]
  30. Wang, G.G. Moth search algorithm: A bio-inspired metaheuristic algorithm for global optimization problems. Memetic Comput. 2018, 10, 151–164. [Google Scholar] [CrossRef]
  31. Lin, Y.; Heidari, A.A.; Wang, S.; Chen, H.; Zhang, Y. An Enhanced Hunger Games Search Optimization with Application to Constrained Engineering Optimization Problems. Biomimetics 2023, 8, 441. [Google Scholar] [CrossRef]
  32. Qiao, Z.; Li, L.; Zhao, X.; Liu, L.; Zhang, Q.; Hechmi, S.; Atri, M.; Li, X. An enhanced Runge Kutta boosted machine learning framework for medical diagnosis. Comput. Biol. Med. 2023, 160, 106949. [Google Scholar] [CrossRef] [PubMed]
  33. He, X.; Shan, W.; Zhang, R.; Heidari, A.A.; Chen, H.; Zhang, Y. Improved Colony Predation Algorithm Optimized Convolutional Neural Networks for Electrocardiogram Signal Classification. Biomimetics 2023, 8, 268. [Google Scholar] [CrossRef] [PubMed]
  34. Izci, D.; Ekinci, S.; Eker, E.; Demirören, A. Biomedical application of a random learning and elite opposition-based weighted mean of vectors algorithm with pattern search mechanism. J. Control. Autom. Electr. Syst. 2023, 34, 333–343. [Google Scholar] [CrossRef]
  35. Peng, L.; Cai, Z.; Heidari, A.A.; Zhang, L.; Chen, H. Hierarchical Harris hawks optimizer for feature selection. J. Adv. Res. 2023. [Google Scholar] [CrossRef]
  36. Su, H.; Zhao, D.; Heidari, A.A.; Liu, L.; Zhang, X.; Mafarja, M.; Chen, H. RIME: A physics-based optimization. Neurocomputing 2023, 532, 183–214. [Google Scholar] [CrossRef]
  37. Wang, G.G.; Deb, S.; Coelho, L.D.S. Elephant herding optimization. In Proceedings of the 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI), Bali, Indonesia, 9 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–5. [Google Scholar]
  38. Mirjalili, S. Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 2016, 27, 1053–1073. [Google Scholar] [CrossRef]
  39. Bharanidharan, N.; Rajaguru, H. Dementia MRI image classification using transformation technique based on elephant herding optimization with Randomized Adam method for updating the hyper-parameters. Int. J. Imaging Syst. Technol. 2021, 31, 1221–1245. [Google Scholar] [CrossRef]
  40. Bharanidharan, N.; Rajaguru, H. Performance enhancement of swarm intelligence techniques in dementia classification using dragonfly-based hybrid algorithms. Int. J. Imaging Syst. Technol. 2020, 30, 57–74. [Google Scholar] [CrossRef]
  41. Zhang, G.; Allaire, D.; Cagan, J. Reducing the Search Space for Global Minimum: A Focused Regions Identification Method for Least Squares Parameter Estimation in Nonlinear Models. J. Comput. Inf. Sci. Eng. 2023, 23, 021006. [Google Scholar] [CrossRef]
  42. Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998; Volume 326. [Google Scholar]
  43. Llaha, O.; Rista, A. Prediction and Detection of Diabetes using Machine Learning. In Proceedings of the 20th International Conference on Real-Time Applications in Computer Science and Information Technology (RTA-CSIT), Tirana, Albania, 21–22 May 2021; pp. 94–102. [Google Scholar]
  44. Prabhakar, S.K.; Rajaguru, H.; Lee, S.-W. A comprehensive analysis of alcoholic EEG signals with detrend fluctuation analysis and post classifiers. In Proceedings of the 2019 7th International Winter Conference on Brain-Computer Interface (BCI), Gangwon, Republic of Korea, 18–20 February 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
  45. Liu, S.; Zhang, X.; Xu, L.; Ding, F. Expectation–maximization algorithm for bilinear systems by using the Rauch–Tung–Striebel smoother. Automatica 2022, 142, 110365. [Google Scholar] [CrossRef]
  46. Zhou, W.; Liu, Y.; Yuan, Q.; Li, X. Epileptic seizure detection using lacunarity and Bayesian linear discriminant analysis in intracranial EEG. IEEE Trans. Biomed. Eng. 2013, 60, 3375–3381. [Google Scholar] [CrossRef]
  47. Hamid, I.Y. Prediction of Type 2 Diabetes through Risk Factors using Binary Logistic Regression. J. Al-Qadisiyah Comput. Sci. Math. 2020, 12, 1–11. [Google Scholar] [CrossRef]
  48. Adiwijaya, K.; Wisesty, U.N.; Lisnawati, E.; Aditsania, A.; Kusumo, D.S. Dimensionality reduction using principal component analysis for cancer detection based on microarray data classification. J. Comput. Sci. 2018, 14, 1521–1530. [Google Scholar] [CrossRef]
  49. Zang, F.; Zhang, J.S. Softmax Discriminant Classifier. In Proceedings of the 3rd International Conference on Multimedia Information Networking and Security, Shanghai, China, 4–6 November 2011; pp. 16–20. [Google Scholar]
  50. Yao, X.J.; Panaye, A.; Doucet, J.; Chen, H.; Zhang, R.; Fan, B.; Liu, M.; Hu, Z. Comparative classification study of toxicity mechanisms using support vector machines and radial basis function neural networks. Anal. Chim. Acta 2005, 535, 259–273. [Google Scholar] [CrossRef]
  51. Fushiki, T. Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 2011, 21, 137–146. [Google Scholar] [CrossRef]
  52. Wang, Z.; Bovik, A.C. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 2009, 26, 98–117. [Google Scholar] [CrossRef]
  53. Maniruzzaman, M.; Kumar, N.; Abedin, M.M.; Islam, M.S.; Suri, H.S.; El-Baz, A.S.; Suri, J.S. Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Comput. Methods Programs Biomed. 2017, 152, 23–34. [Google Scholar] [CrossRef] [PubMed]
  54. Pham, T.; Tran, T.; Phung, D.; Venkatesh, S. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomed. Inform. 2017, 69, 218–229. [Google Scholar] [CrossRef] [PubMed]
  55. Hertroijs, D.F.L.; Elissen, A.M.J.; Brouwers, M.C.G.J.; Schaper, N.C.; Köhler, S.; Popa, M.C.; Asteriadis, S.; Hendriks, S.H.; Bilo, H.J.; Ruwaard, D.; et al. A risk score including body mass index, glycated hemoglobin and triglycerides predicts future glycemic control in people with type 2 diabetes. Diabetes Obes. Metab. 2017, 20, 681–688. [Google Scholar] [CrossRef]
  56. Arellano-Campos, O.; Gómez-Velasco, D.V.; Bello-Chavolla, O.Y.; Cruz-Bautista, I.; Melgarejo-Hernandez, M.A.; Muñoz-Hernandez, L.; Guillén, L.E.; Garduño-Garcia, J.D.J.; Alvirde, U.; Ono-Yoshikawa, Y.; et al. Development and validation of a predictive model for incident type 2 diabetes in middleaged Mexican adults: The metabolic syndrome cohort. BMC Endocr. Disord. 2019, 19, 41. [Google Scholar] [CrossRef]
  57. Deo, R.; Panigrahi, S. Performance assessment of machine learning based models for diabetes prediction. In Proceedings of the 2019 IEEE Healthcare Innovations and Point of Care Technologies, (HI-POCT), Bethesda, MD, USA, 20–22 November 2019. [Google Scholar] [CrossRef]
  58. Choi, B.G.; Rha, S.-W.; Kim, S.W.; Kang, J.H.; Park, J.Y.; Noh, Y.-K. Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. Yonsei Med. J. 2019, 60, 191. [Google Scholar] [CrossRef]
  59. Akula, R.; Nguyen, N.; Garibay, I. Supervised machine learning based ensemble model for accurate prediction of type 2 diabetes. In Proceedings of the 2019 Southeast Conference, Huntsville, AL, USA, 11–14 April 2019. [Google Scholar] [CrossRef]
  60. Xie, Z.; Nikolayeva, O.; Luo, J.; Li, D. Building risk prediction models for type 2 diabetes using machine learning techniques. Prev. Chronic Dis. 2019, 16, E130. [Google Scholar] [CrossRef]
  61. Bernardini, M.; Morettini, M.; Romeo, L.; Frontoni, E.; Burattini, L. Early temporal prediction of type 2 diabetes risk condition from a general practitioner electronic health record: A multiple instance boosting approach. Artif. Intell. Med. 2020, 105, 101847. [Google Scholar] [CrossRef]
  62. Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan rural cohort study. Sci. Rep. 2020, 10, 4406. [Google Scholar] [CrossRef]
  63. Jain, S. A supervised model for diabetes divination. Biosci. Biotechnol. Res. Commun. 2020, 13 (Suppl. S14), 315–318. [Google Scholar] [CrossRef]
  64. Kalagotla, S.K.; Gangashetty, S.V.; Giridhar, K. A novel stacking technique for prediction of diabetes. Comput. Biol. Med. 2021, 135, 104554. [Google Scholar] [CrossRef]
  65. Haneef, R.; Fuentes, S.; Fosse-Edorh, S.; Hrzic, R.; Kab, S.; Cosson, E.; Gallay, A. Use of artifcial intelligence for public health surveillance: A case study to develop a machine learning-algorithm to estimate the incidence of diabetes mellitus in France. Arch. Public Health 2021, 79, 168. [Google Scholar] [CrossRef] [PubMed]
  66. Deberneh, H.M.; Kim, I. Prediction of Type 2 diabetes based on machine learning algorithm. Int. J. Environ. Res. Public Health 2021, 18, 3317. [Google Scholar] [CrossRef] [PubMed]
  67. Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Nonlaboratory based risk assessment model for type 2 diabetes mellitus screening in Chinese rural population: A joint bagging boosting model. IEEE J. Biomed. Health Inform. 2021, 25, 4005–4016. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Workflow diagram.
Figure 1. Workflow diagram.
Biomimetics 08 00503 g001
Figure 2. Flow diagram for Artificial Algae Algorithm.
Figure 2. Flow diagram for Artificial Algae Algorithm.
Biomimetics 08 00503 g002
Figure 3. Histogram of Bessel function technique in the diabetic gene class.
Figure 3. Histogram of Bessel function technique in the diabetic gene class.
Biomimetics 08 00503 g003
Figure 4. Histogram of Bessel function technique in the non-diabetic gene class.
Figure 4. Histogram of Bessel function technique in the non-diabetic gene class.
Biomimetics 08 00503 g004
Figure 5. Normal probability plot showcasing DCT features for the diabetic gene class.
Figure 5. Normal probability plot showcasing DCT features for the diabetic gene class.
Biomimetics 08 00503 g005
Figure 6. Normal probability plot representing DCT features for the non-diabetic gene class.
Figure 6. Normal probability plot representing DCT features for the non-diabetic gene class.
Biomimetics 08 00503 g006
Figure 7. Normal probability plot for LSLR DR techniques in diabetic gene class.
Figure 7. Normal probability plot for LSLR DR techniques in diabetic gene class.
Biomimetics 08 00503 g007
Figure 8. Normal probability plot for LSLR DR techniques in non-diabetic gene class.
Figure 8. Normal probability plot for LSLR DR techniques in non-diabetic gene class.
Biomimetics 08 00503 g008
Figure 9. Scatter plot depicting AAA DR results for both non-diabetic and diabetic classes.
Figure 9. Scatter plot depicting AAA DR results for both non-diabetic and diabetic classes.
Biomimetics 08 00503 g009
Figure 10. Diagram illustrating the process of the EHO algorithm.
Figure 10. Diagram illustrating the process of the EHO algorithm.
Biomimetics 08 00503 g010
Figure 11. Flowchart of the Dragonfly Optimization algorithm.
Figure 11. Flowchart of the Dragonfly Optimization algorithm.
Biomimetics 08 00503 g011
Figure 12. Flow diagram of Expectation Maximum.
Figure 12. Flow diagram of Expectation Maximum.
Biomimetics 08 00503 g012
Figure 13. Different classifiers without feature selection methods.
Figure 13. Different classifiers without feature selection methods.
Biomimetics 08 00503 g013
Figure 14. Different classifiers with EHO feature selection methods.
Figure 14. Different classifiers with EHO feature selection methods.
Biomimetics 08 00503 g014
Figure 15. Different classifiers with Dragonfly feature selection method.
Figure 15. Different classifiers with Dragonfly feature selection method.
Biomimetics 08 00503 g015
Figure 16. Classifier performance in terms of MCC and Kappa.
Figure 16. Classifier performance in terms of MCC and Kappa.
Biomimetics 08 00503 g016
Figure 17. Performance of error rate and MCC (%).
Figure 17. Performance of error rate and MCC (%).
Biomimetics 08 00503 g017
Table 1. Pancreatic microarray gene dataset for non-diabetic and diabetic classes.
Table 1. Pancreatic microarray gene dataset for non-diabetic and diabetic classes.
TypeTotal NumberDiabetic ClassNon-Diabetic ClassTotal Classes
Pancreatic dataset28,735205070
Table 2. Statistical analysis for different DR techniques.
Table 2. Statistical analysis for different DR techniques.
Statistical ParametersBessel FunctionDiscrete Cosine Transform (DCT)Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
DiaNormDiaNormDiaNormDiaNorm
Mean0.0829610.0841621.8820121.8836180.004670.00457121.664120.5492
Variance0.0051650.0053780.508190.5069570.0004320.000417101.6366103.0168
Skewness0.8651690.8561620.1879030.2289240.003787−0.03150.0427440.054472
Kurtosis0.1809260.135504−0.34524−0.40687−0.16576−0.086670.1522720.091169
Pearson CC0.8662640.8592110.981380.9831180.9754460.9773180.98260.985246
CCA0.059040.2602750.0908250.082321
Table 3. Significance of p-values for feature selection methods using t-test across various DR techniques.
Table 3. Significance of p-values for feature selection methods using t-test across various DR techniques.
Feature SelectionDR TechniquesBessel Function Discrete Cosine Transform (DCT) Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
GenesDiaNormDiaNormDiaNormDiaNorm
EHOp-value
< 0.05
0.97210.99980.9940.99960.99610.99990.94660.9605
Dragonflyp-value
< 0.05
0.999850.8760.99560.9980.99510.999310.99360.9977
Table 5. Analysis of MSE in for different DR techniques without feature selection.
Table 5. Analysis of MSE in for different DR techniques without feature selection.
ClassifiersBessel Function Discrete Cosine Transform (DCT) Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
MSE Training SetMSE Testing SetMSE Training SetMSE Testing SetMSE Training SetMSE Testing SetMSE Training SetMSE Testing Set
NLR2.3 × 10−61.76 × 10−36.41 × 10−62.48 × 10−57.75 × 10−65.12 × 10−52.91 × 10−71.6 × 10−5
LR2.41 × 10−59.51 × 10−57.52 × 10−63.11 × 10−52.18 × 10−74.66 × 10−53.67 × 10−81.45 × 10−5
GMM2.1 × 10−51.75 × 10−45.72 × 10−76.8 × 10−63.09 × 10−71.11 × 10−53.76 × 10−65.33 × 10−5
EM1.62 × 10−79.87 × 10−62.71 × 10−61.3 × 10−59.87 × 10−71.99 × 10−58.97 × 10−97.3 × 10−6
BLDC1.4 × 10−62.53 × 10−32.86 × 10−73.94 × 10−54.74 × 10−65.28 × 10−51.43 × 10−71.64 × 10−5
LoR1.2 × 10−62.89 × 10−39.47 × 10−63.58 × 10−58.69 × 10−64.54 × 10−59.26 × 10−81.45 × 10−5
SDC1.9 × 10−62.03 × 10−33.66 × 10−61.07 × 10−52.47 × 10−61.86 × 10−52.31 × 10−95 × 10−6
SVM (L)3.1 × 10−62.7 × 10−38.92 × 10−62.89 × 10−51.09 × 10−54.01 × 10−54.13 × 10−98.2 × 10−6
SVM (Poly)3.6 × 10−52.11 × 10−33.36 × 10−62.11 × 10−51.29 × 10−62.85 × 10−57.84 × 10−94.69 × 10−6
SVM (RBF)4.16 × 10−78.3 × 10−51.57 × 10−82.41 × 10−63.22 × 10−85.64 × 10−61.93 × 10−101.77 × 10−8
Table 6. Analysis of MSE performance for classifiers using EHO feature selection methods across different DR techniques.
Table 6. Analysis of MSE performance for classifiers using EHO feature selection methods across different DR techniques.
ClassifiersBessel Function Discrete Cosine Transform (DCT) Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
Training
MSE
Testing MSETraining MSETesting
MSE
Training
MSE
Testing
MSE
Training
MSE
Testing
MSE
NLR4.85 × 10−62.64 × 10−54.13 × 10−52.88 × 10−51.21 × 10−63.64 × 10−57.21 × 10−79.53 × 10−6
LR3.62 × 10−64.79 × 10−56.92 × 10−61.35 × 10−57.72 × 10−61.96 × 10−56.98 × 10−74.23 × 10−6
GMM6.13 × 10−62.26 × 10−47.63 × 10−79.22 × 10−64.57 × 10−61.39 × 10−53.81 × 10−74.52 × 10−6
EM2.19 × 10−71.2 × 10−64.39 × 10−62.25 × 10−54.81 × 10−63.92 × 10−54.67 × 10−71 × 10−5
BLDC4.47 × 10−66.56 × 10−57.94 × 10−75.8 × 10−53.72 × 10−61.56 × 10−53.52 × 10−73.97 × 10−6
LoR3.24 × 10−62.26 × 10−43.32 × 10−61.09 × 10−58.37 × 10−62.26 × 10−57.61 × 10−83.82 × 10−6
SDC9.62 × 10−62.31 × 10−49.13 × 10−74.62 × 10−54.87 × 10−61.52 × 10−59.93 × 10−83.84 × 10−6
SVM (L)4.12 × 10−55.29 × 10−48.47 × 10−74.16 × 10−61.93 × 10−89.61 × 10−61.67 × 10−83.81 × 10−6
SVM (Poly)6.41 × 10−52.34 × 10−42.19 × 10−76.41 × 10−65.77 × 10−81.24 × 10−51.62 × 10−82.05 × 10−6
SVM (RBF)3.72 × 10−72.56 × 10−56.17 × 10−81.35 × 10−66.79 × 10−92.42 × 10−61.99 × 10−102.5 × 10−8
Table 7. Analysis of MSE in classifiers for various DR techniques with Dragonfly feature selection methods.
Table 7. Analysis of MSE in classifiers for various DR techniques with Dragonfly feature selection methods.
ClassifiersBessel FunctionDiscrete Cosine Transform (DCT)Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
Training MSETesting MSETraining MSETesting MSETraining MSETesting MSETraining MSETesting MSE
NLR3.62 × 10−64.54 × 10−54.16 × 10−61.36 × 10−58.21 × 10−62.72 × 10−53.86 × 10−61.28 × 10−5
LR4.36 × 10−67.12 × 10−52.84 × 10−61.39 × 10−59.4 × 10−63.8 × 10−52.51 × 10−84.32 × 10−6
GMM7.58 × 10−74.71 × 10−55.66 × 10−87.84 × 10−63.61 × 10−62.09 × 10−54.63 × 10−81.02 × 10−5
EM4.79 × 10−73.31 × 10−53.79 × 10−81.68 × 10−55.33 × 10−66.12 × 10−53.43 × 10−81.46 × 10−5
BLDC6.52 × 10−74.16 × 10−52.92 × 10−84.49 × 10−57.54 × 10−89.12 × 10−67.68 × 10−88.1 × 10−6
LoR6.54 × 10−75.04 × 10−57.23 × 10−86.05 × 10−61.92 × 10−72.23 × 10−64.84 × 10−93.36 × 10−6
SDC3.86 × 10−72.57 × 10−58.95 × 10−73.08 × 10−67.52 × 10−86.31 × 10−61.63 × 10−82.52 × 10−6
SVM (L)5.42 × 10−73.51 × 10−58.45 × 10−71.03 × 10−51.41 × 10−72.83 × 10−51.95 × 10−71.7 × 10−6
SVM (Poly)9.67 × 10−77.23 × 10−56.67 × 10−67.08 × 10−66.3 × 10−71.05 × 10−56.42 × 10−85.33 × 10−6
SVM (RBF)8.64 × 10−82.72 × 10−61.82 × 10−89.05 × 10−73.4 × 10−81.69 × 10−61.66 × 10−83.25 × 10−8
Table 8. Selection of optimal parametric values for classifiers.
Table 8. Selection of optimal parametric values for classifiers.
ClassifiersDescription
NLRUniform weight w = 0.4, bias b = 0.001, iteratively modified sum of least square error, criterion: MSE
Linear RegressionUniform weight w = 0.451, bias b = 0.003, criterion: MSE
GMMMean covariance of the input samples and tuning parameter using EM steps. Criterion: MSE
EM0.13 likelihood probability, 0.45 cluster probability, with convergence rate of 0.631. Condition: MSE
BDLCP(y), prior probability: 0.5, class mean: 0.85, 0.1; criterion: MSE
Logistic regressionThreshold Hθ(x) < 0.48 with criterion: MSE
SDCΓ = 0.5 along with mean of each class target values as 0.1 and 0.85
SVM (Linear)C (Regularization Parameter): 0.85, class weights: 0.4, convergence criterion: MSE
SVM (Polynomial)C: 0.76, coefficient of the kernel function (gamma): 10, class weights: 0.5, convergence criterion: MSE
SVM (RBF)C: 1, coefficient of the kernel function (gamma): 100, class weights: 0.86, convergence criterion: MSE
Table 9. Performance metrics.
Table 9. Performance metrics.
MetricsFormulaAssessment Focus
Accuracy A c c = T N + T P T N + F N + T P + F P Fraction of predictions that are correct
F1 Score F 1 = 2 × T P ( 2 × T P + F P + F N ) Harmonic mean of precision and recall
Matthews Correlation Coefficient (MCC) M C C = ( T P × T N F P × F N ) T P + F P ) × ( T P + F N ) × ( T N + F P ) × ( T N + F N ) Correlation between the observed and predicted classifications
Error Rate E r r o r   r a t e = ( F P + F N ) ( T P + T N + F P + F N ) Fraction of predictions that are incorrect
FM Metric F M = T P T P + F P × T P T P + F N Generalization of the F-measure that adds a beta parameter
Kappa K a p p a = ( P o P e ) ( 1 P e ) P o = ( T P + T N ) ( T P + T N + F P + F N ) P e = ( T P + F P ) × ( T P + F N ) + ( F P + T N ) × ( F N + T N ) ( T P + T N + F P + F N ) 2 Statistic that measures agreement between observed and predicted classifications, adjusted for chance
Abbreviations: TP—true positive: an accurate prediction where the true value was positive. TN—true negative: an accurate prediction where the true value was negative. FP—false positive: an inaccurate prediction where the actual value was negative. FN—false negative: an erroneous prediction where the actual value was positive.
Table 10. Parametric analysis of different classifiers through various DM techniques.
Table 10. Parametric analysis of different classifiers through various DM techniques.
Dimensionality ReductionClassifiersParameters
Accuracy
(%)
F1 Score
(%)
MCCError Rate
(%)
FM (%)Kappa
Bessel FunctionNLR54.285740.74070.081345.714242.18310.0743
LR58.571447.27270.189741.428549.13540.1714
GMM57.142848.27580.199542.857150.78330.1732
EM61.428554.23720.309238.571457.28920.2645
BLDC52.8571400.063247.142841.57610.0571
LoR54.285740.74070.081345.714242.18310.0743
SDC54.285740.74070.081345.714242.18310.0743
SVM (L)52.8571400.063247.142841.57610.0571
SVM (Poly)54.285742.85710.108445.714244.72140.0967
SVM (RBF)61.428552.63150.280538.571455.14110.2470
Discrete Cosine Transform (DCT)NLR75.714262.22220.452524.285762.60990.4465
LR71.428556.52170.364628.571457.00880.3577
GMM85.714276.19040.661714.285776.2770.6601
EM8069.56520.56092070.16460.5504
BLDC65.714253.84610.308334.285755.33990.2881
LoR67.142859.64910.407232.857162.49320.3585
SDC81.428572.34040.603218.571473.15640.5882
SVM (L)7060.37730.41623062.27990.3849
SVM (Poly)72.857162.74510.454727.142864.25750.4291
SVM (RBF)88.571481.81810.742311.428582.15840.7358
Least Squares Linear Regression (LSLR)NLR67.142848.88880.254532.857149.19350.2511
LR65.7142520.282934.285753.07230.2695
GMM82.857172.72720.609117.142873.02970.6037
EM72.857162.74510.454727.142864.25750.4291
BLDC62.857151.85180.271137.142853.68750.2479
LoR64.285757.62710.372835.714260.86980.3190
SDC75.714265.30610.495224.285766.43640.4757
SVM (L)64.285754.54540.316235.714256.69470.2857
SVM (Poly)71.428561.53840.435228.571463.24560.4067
SVM (RBF)84.285775.55550.650515.714276.02630.6418
Artificial Algae Algorithm (AAA)NLR8066.66660.52542066.74240.5242
LR8068.18180.54242068.46530.5377
GMM85.714277.27270.675714.285777.5940.6698
EM84.285775.55550.650515.714276.02630.6418
BLDC78.571468.08510.538221.428568.8530.5248
LoR77.142869.23070.562222.857171.15120.5254
SDC85.714278.26080.691814.285778.93520.6788
SVM (L)82.8571750.645417.142876.06390.625
SVM (Poly)87.1428800.716512.857180.49840.7069
SVM (RBF)9084.44440.78251084.97060.7720
Table 11. Performance metrics with Elephant Herding Optimization (EHO) feature selection method for different DR techniques.
Table 11. Performance metrics with Elephant Herding Optimization (EHO) feature selection method for different DR techniques.
Dimensionality ReductionClassifiersParameters
Accuracy
(%)
F1 Score
(%)
MCCError Rate
(%)
FM (%)Kappa
Bessel FunctionNLR71.428561.53840.435228.571463.24560.4067
LR62.8571500.244837.142851.3870.2288
GMM54.285740.74070.081345.714242.18310.0743
EM81.428572.34040.603218.571473.15640.5882
BLDC6046.15380.18134047.43420.1694
LoR54.285740.74070.081345.714242.18310.0743
SDC54.285740.74070.081345.714242.18310.0743
SVM (L)5036.363605037.79640
SVM (Poly)52.8571400.063247.142841.57610.0571
SVM (RBF)71.4285600.410728.571461.23720.3913
Discrete Cosine Transform (DCT)NLR7061.81810.44273064.2540.4
LR78.571468.08510.538221.428568.8530.5248
GMM81.428572.34040.603218.571473.15640.5882
EM72.857164.15090.479627.142866.17240.4435
BLDC85.714277.27270.675714.285777.5940.6698
LoR81.428571.11110.584518.571471.55420.5767
SDC85.714277.27270.675714.285777.5940.6698
SVM (L)88.571480.95230.729811.428581.04430.7281
SVM (Poly)84.285775.55550.650515.714276.02630.6418
SVM (RBF)9083.72090.76941083.92540.7655
Least Squares Linear Regression (LSLR)NLR67.142859.64910.407232.857162.49320.3585
LR74.2857640.474625.714265.31970.4521
GMM74.285765.38460.498725.714267.19840.4661
EM65.714257.14280.361534.285759.62850.3225
BLDC75.714265.30610.495224.285766.43640.4757
LoR72.857161.22440.431027.142862.28410.4140
SDC8068.18180.54242068.46530.5377
SVM (L)85.714276.19040.661714.285776.2770.6601
SVM (Poly)8069.56520.56092070.16460.5504
SVM (RBF)88.571481.81810.742311.428582.15840.7358
Artificial Algae Algorithm (AAA)NLR81.428573.46930.623618.571474.74090.5991
LR87.1428800.716512.857180.49840.7069
GMM85.714278.26080.691814.285778.93520.6788
EM81.428573.46930.623618.571474.74090.5991
BLDC87.1428800.716512.857180.49840.7069
LoR87.142879.06970.702112.857179.26290.6985
SDC88.571480.95230.729811.428581.04430.7281
SVM (L)97.142894.73680.93022.8571494.86830.9278
SVM (Poly)88.571481.81810.742311.428582.15840.7358
SVM (RBF)95.714292.68290.89704.2857192.71050.8965
Table 12. Performance metric of different classifiers with four DR techniques with the Dragonfly feature selection method.
Table 12. Performance metric of different classifiers with four DR techniques with the Dragonfly feature selection method.
DRClassifiersParameters
Accuracy
(%)
F1 Score
(%)
MCCError Rate
(%)
FM (%)Kappa
Bessel FunctionNLR64.285757.62710.372835.714260.86980.3190
LR60440.15514044.90730.1478
GMM64.285756.14030.343835.714258.81720.3027
EM78.571461.53840.467321.428561.55870.4670
BLDC65.7142520.282934.285753.07230.2695
LoR64.285750.98030.263735.714252.20930.2489
SDC72.857159.57440.408327.142860.24640.3981
SVM (L)67.142856.60370.352932.857158.38740.3263
SVM (Poly)58.571443.13720.136441.428544.17710.1287
SVM (RBF)81.428566.66660.538418.571466.68860.5380
Discrete Cosine Transform (DCT)NLR8068.18180.54242068.46530.5377
LR78.571468.08510.538221.428568.8530.5248
GMM84.285775.55550.650515.714276.02630.6418
EM74.285765.38460.498725.714267.19840.4661
BLDC85.714278.26080.691814.285778.93520.6788
LoR88.5714800.7211.4285800.72
SDC88.571481.81810.742311.428582.15840.7358
SVM (L)82.857173.91300.626417.142874.54990.6146
SVM (Poly)84.285775.55550.650515.714276.02630.6418
SVM (RBF)91.428585.71420.79798.5714285.81160.7961
Least Squares Linear Regression (LSLR)NLR75.714262.22220.452524.285762.60990.4465
LR65.714253.84610.308334.285755.33990.2881
GMM75.714262.22220.452524.285762.60990.4465
EM6048.14810.20784049.85270.1900
BLDC81.428572.34040.603218.571473.15640.5882
LoR80650.5120650.51
SDC87.142879.06970.702112.857179.26290.6985
SVM (L)7060.37730.41623062.27990.3849
SVM (Poly)81.428572.34040.603218.571473.15640.5882
SVM (RBF)9084.44440.78251084.97060.7720
Artificial Algae Algorithm (AAA)NLR78.571468.08510.538221.428568.8530.5248
LR87.142879.06970.702112.857179.26290.6985
GMM81.428573.46930.623618.571474.74090.5991
EM8068.18180.54242068.46530.5377
BLDC82.8571750.645417.142876.06390.625
LoR88.571480.95230.729811.428581.04430.7281
SDC88.571481.81810.742311.428582.15840.7358
SVM (L)82.857173.91300.626417.142874.54990.6146
SVM (Poly)85.714278.26080.691814.285778.93520.6788
SVM (RBF)94.285790.47610.86605.7142890.57890.8640
Table 13. CC of the classifiers without feature selection methods.
Table 13. CC of the classifiers without feature selection methods.
ClassifiersDR Method
Bessel FunctionDiscrete Cosine Transform (DST)Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
NLRO(n2logn)O(n2logn)O(n3log2n)O(n3log4n)
LRO(n2logn)O(n2logn)O(n3log2n)O(n3log4n)
GMMO(n2log2n)O(n2log2n)O(n3log2n)O(n3log4n)
EMO(n3logn)O(n3logn)O(n3log2n)O(n3log4n)
BLDCO(n3logn)O(n3logn)O(2n3log2n)O(2n3log4n)
LoRO(2n2logn)O(2n2logn)O(2n4log2n)O(2n4log4n)
SDCO(n3logn)O(n3logn)O(n4log2n)O(n4log4n)
SVM (L)O(2n3logn)O(2n3logn)O(2n4log2n)O(2n4log4n)
SVM (Poly)O(2n3log2n)O(2n3log2n)O(2n4log4n)O(2n4log8n)
SVM (RBF)O(2n4log2n)O(2n4log2n)O(2n5log4n)O(2n5log8n)
Table 14. CC of the classifiers with EHO feature selection method.
Table 14. CC of the classifiers with EHO feature selection method.
ClassifiersDR Method
Bessel FunctionDiscrete Cosine Transform (DCT)Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
NLRO(n4logn)O(n4logn)O(n5log2n)O(n5log4n)
LRO(n4logn)O(n4logn)O(n5log2n)O(n5log4n)
GMMO(n4log2n)O(n4log2n)O(n5log2n)O(n5log4n)
EMO(n5logn)O(n5logn)O(n5log2n)O(n5log4n)
BLDCO(n5logn)O(n5logn)O(2n5log2n)O(2n5log4n)
LoRO(2n4logn)O(2n4logn)O(2n5log2n)O(2n5log4n)
SDCO(n5logn)O(n5logn)O(n6log2n)O(n6log4n)
SVM (L)O(2n5logn)O(2n5logn)O(2n6log2n)O(2n6log4n)
SVM (Poly)O(2n5log2n)O(2n5log2n)O(2n6log4n)O(2n6log8n)
SVM (RBF)O(2n6log2n)O(2n6log2n)O(2n7log4n)O(2n7log8n)
Table 15. CC of the classifiers with Dragonfly feature selection method.
Table 15. CC of the classifiers with Dragonfly feature selection method.
ClassifiersDR Method
Bessel FunctionDiscrete Cosine Transform (DST)Least Squares Linear Regression (LSLR)Artificial Algae Algorithm (AAA)
NLRO(4n3logn)O(4n3logn)O(4n4log2n)O(4n4log4n)
LRO(4n3logn)O(4n3logn)O(4n4log2n)O(4n4log4n)
GMMO(4n3log2n)O(4n3log2n)O(4n4log2n)O(4n4log4n)
EMO(4n4logn)O(4n4logn)O(4n4log2n)O(4n4log4n)
BLDCO(4n4logn)O(4n4logn)O(8n4log2n)O(8n4log4n)
LoRO(8n3logn)O(8n3logn)O(8n5log2n)O(8n5log4n)
SDCO(4n4logn)O(4n4logn)O(4n5log2n)O(4n5log4n)
SVM (L)O(8n4logn)O(8n4logn)O(8n5log2n)O(8n5log4n)
SVM (Poly)O(8n4log2n)O(8n4log2n)O(8n5log4n)O(8n5log8n)
SVM (RBF)O(8n5log2n)O(8n5log2n)O(8n6log4n)O(8n6log8n)
Table 16. Comparison with previous work.
Table 16. Comparison with previous work.
S.NoAuthor (with Year)Description
of the Population
Data
Sampling
Machine
Learning Parameter
Accuracy (%)
1.Maniruzzaman et al., (2017) [53]PIDD (Pima Indian diabetic dataset)Cross-validation K2, K4, K5,
K10, and JK
LDA, QDA, NB, GPC, SVM, ANN, AB,
LoR, DT, RF
ACC: 92
2.Pham et al., (2017) [54]Diabetes: 12,000, aged between 18 and 100
Age (mean): 73
Training set—66%; tuning set
17%; test set—17%
RNN, CLST Memory (C-LSTM)ACC—79
3.Hertroijs et al., (2018) [55]Total: 105,814
Age (mean): greater than 18
Training set of 90% and test set of 10%
fivefold cross-validation
Latent Growth Mixture Modeling (LGMM)ACC: 92.3
4.ArellanoCampos et al., (2019) [56]Base L: 7636 follow: 6144
diabetes: 331 age: 32–54
K = 10, cross-validation and bootstrapping modelCox proportional hazard regressionACC: 75
5.Deo et al., (2019) [57]Total: 140 diabetes: 14 imbalanced age: 12–90Training set of 70% and 30% test set with fivefold cross-validation,
holdout validation
BT, SVM (L)ACC: 91
6.Choi et al., (2019) [58]Total: 8454 diabetes: 404 age: 40–72Tenfold cross-validationLoR, LDA, QDA,
KNN
ACC: 78, 77 76, 77
7.Akula et al., (2019) [59] PIDD
Practice Fusion Dataset total: 10,000
age: 18–80
Training set: 800;
test set: 10,000
KNN, SVM, DT, RF, GB, NN, NBACC: 86
8.Xie et al., (2019) [60]Total: 138,146 diabetes: 20,467
age: 30–80
Training set is around 67%, test set is around 33%SVM, DT, LoR, RF, NN, NBACC: 81, 74, 81, 79, 82, 78
9.Bernardini et al., (2020) [61]Total: 252 diabetes: 252 age: 54–72Tenfold cross-validationMultiple instance learning
boosting
ACC: 83
10.Zhang et al., (2020) [62]Total: 36,652 age: 18–79Tenfold cross-validationLoR, classification, and regression tree,
GB,
ANN, RF, SVM
ACC: 75, 80, 81, 74, 86, 76
11.Jain et al., (2020) [63]Control: 500 diabetes: 268 age: 21–81Training set is around 70%, test set is around 30%SVM, RF, k-NNACC: 74, 74, 76
12.Kalagotla et al., (2021) [64]Pima Indian datasetHold out k-fold cross-validationStacking multi-layer perceptron, SVM, LoRACC: 78
13.Haneef et al., (2021) [65]Total 44,659 age 18–69 data are imbalancedTraining set 80%, test set
20%
LDAACC: 67
14.Deberneh et al., (2021) [66]Total: 535,169, diabetes: 4.3%
prediabetes: 36%, age: 18–108
Tenfold cross-validationRF, SVM, XGBoostACC: 73, 73, 72
15.Zhang et al., (2021) [67]Total: 37,730, diabetes: 9.4%
age: 50–70 imbalanced
Training set is around 80%
test set is around 20%
Tenfold cross-validation
Bagging boosting, GBT, RF, GBMACC: 82
16.This article Nordic Islet Transplantation programTenfold cross-validationBessel function, DCT, LSLR and AAA95
LDA—Linear Discriminant Analysis; QDA—Quadratic Discriminant Analysis; NB—Naïve Bayes; GPC—Gaussian Process Classification; SVM—Support Vector Machine; ANN—Artificial Neural Network; AB—ADA Boost; LoR—Logistic Regression; DT—Decision Tree; RF—Random Forest; RRN—Recurrent Neural Network; CLST Memory—Convolutional Long Short-Term Memory; BT—Bagged Tree; KNN—k-Nearest Neighbor; GB—Gradient Boost; NN—Neural Network; k-NN—k-Nearest Neighbor; GBT—Bagging Boost GBT; ACC—accuracy.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chellappan, D.; Rajaguru, H. Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data. Biomimetics 2023, 8, 503. https://doi.org/10.3390/biomimetics8060503

AMA Style

Chellappan D, Rajaguru H. Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data. Biomimetics. 2023; 8(6):503. https://doi.org/10.3390/biomimetics8060503

Chicago/Turabian Style

Chellappan, Dinesh, and Harikumar Rajaguru. 2023. "Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data" Biomimetics 8, no. 6: 503. https://doi.org/10.3390/biomimetics8060503

Article Metrics

Back to TopTop