Performance Analysis of Conventional Machine Learning Algorithms for Diabetic Sensorimotor Polyneuropathy Severity Classification

Background: Diabetic peripheral neuropathy (DSPN), a major form of diabetic neuropathy, is a complication that arises in long-term diabetic patients. Even though the application of machine learning (ML) in disease diagnosis is a very common and well-established field of research, its application in diabetic peripheral neuropathy (DSPN) diagnosis using composite scoring techniques like Michigan Neuropathy Screening Instrumentation (MNSI), is very limited in the existing literature. Method: In this study, the MNSI data were collected from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials. Two different datasets with different MNSI variable combinations based on the results from the eXtreme Gradient Boosting feature ranking technique were used to analyze the performance of eight different conventional ML algorithms. Results: The random forest (RF) classifier outperformed other ML models for both datasets. However, all ML models showed almost perfect reliability based on Kappa statistics and a high correlation between the predicted output and actual class of the EDIC patients when all six MNSI variables were considered as inputs. Conclusions: This study suggests that the RF algorithm-based classifier using all MNSI variables can help to predict the DSPN severity which will help to enhance the medical facilities for diabetic patients.


Introduction
Diabetes mellitus (DM), one of the fastest rising health concerns of the 21st century. The number of patients affected with DM worldwide has increased from 151 million in 2000 to 463 million in 2019; over just 20 years [1]. International Diabetic Federation (IFD) estimated that globally by 2045, approximately 700 million people will be affected by diabetes [1]. DM is a common yet costly metabolic disease, which leads to serious damage to different organs of the body with the long-term uncontrolled blood glucose level [2][3][4][5]. Among all the complications that arise due to DM, diabetic sensorimotor polyneuropathy (DSPN) a very common form of neuropathy caused by diabetes. It affects limb nerves, especially in the lower limb. Globally, 40 to 60 million people with diabetes are suffering from lower limb complications due to DSPN [1]. Long-term DSPN leads to ulceration and amputations, significantly increasing the chance of early death and reducing the quality of life. Globally, every 30 s, one lower limb amputation is happening due to DSPN [6]. Henceforth, early identification of DSPN to provide proper treatment to prevent the life-threatening condition man corneal in-vivo. Much research is being conducted, emphasizing the automation of the CCM system using ML for a more accurate, reliable, and regenerable diagnosis of DSPN [25][26][27]. However, as CCM uses corneal images for identifying DSPN, it requires specialized personal and equipment which made it difficult to be available in regular healthcare facilities. In the initial stage, composite scores (NDS, MNSI, etc.) are widely used for screening DSPN signs and symptoms [12]. Intelligent systems using these DSPN scoring techniques can be a potential solution to solve the uniformity agreement problem with the DSPN severity grading due to their ability of reliable, accurate, reproducible diagnosis. In literature, few intelligent systems-such as fuzzy inference system (FIS) [28][29][30], multicategory support vector machine (SVM) [31], and adaptive fuzzy inference system (ANFIS) [32]-are being reported to use composite scoring methods for stratification of DSPN severity. The studies reported DSPN classifiers using fuzzy systems are not reliable because the FIS works relaying on the if-then rule base set by the experts or research based on human experience, Kazemi et al. [31], developed a multiclass SVM based DSPN severity classifier using NDS; however, their reported accuracy was only 76%. Fahmida et al. [32] have developed an ANFIS system for DSPN severity classification using MNSI with an accuracy of 91%, however, they have only considered three MNSI variables (questionnaire, vibration perception, and tactical sensitivity) to design their model. MNSI has been recommended on the position statement by ADA for the clinical diagnosis of DSPN [11]. The MNSI is very simple, inexpensive, and can be managed by any healthcare professional treating diabetic patients. The reliability and accuracy of the MNSI have been discussed elsewhere [10,33]. Therefore, this research proposes an ML-based DSPN severity classifier using MNSI.
In literature, conventional ML algorithms such as support vector machines (SVM) [34], k-nearest neighbor (KNN) [35], random forest (RF) [36], and artificial neural network (ANN) [37] are being used in different diseases diagnosis problems. Although the application of ML in disease diagnosis is a very common and well-established field of research, the application of ML in DSPN diagnosis using composite scoring techniques like MNSI is very limited in the existing literature. More studies are required to understand the performance of different ML techniques in DSPN diagnosis and stratification. In this research, we aim to observe the performance of eight different conventional ML algorithms: support vector machine (SVM), k-nearest neighbor (KNN), random forest (RF), discriminant analysis classifier (DAC), ensemble classifier (EA), naive Bayes (NB), linear regression (LR), and artificial neural network (ANN) for severity classification of DSPN using MNSI. A descriptive statistical analysis will be performed to evaluate the performance of these algorithms in DSPN severity classification. We aim to classify the DSPN patients into four severity classes: absent, mild, moderate, and severe classes with a good classification accuracy using different conventional ML.
The novelty of this research work is the implementation and performance analysis of different conventional ML-based intelligent classifiers that will be able to classify DSPN severity levels using MNSI scores. This study will benefit DSPN patients as well as diabetic patients with accurate, reliable, and early identification and stratification of DSPN and will help to received early treatments to prevent severe complications like ulceration and amputation. As the classifiers will be using the MNSI, it can be used with regular checkups in normal healthcare centers. This study will also investigate the effect of MNSI variables on DSPN severity classification using feature ranking. This study will investigate the best performing ML algorithms with different MNSI variable combinations in the stratification of DSPN. Still now, the identification and stratification of DSPN are based on offline analysis by the experts. This study can support health professionals in accurate, reliable, and realtime decision-making. Also, the problems due to lack of uniformity and agreements in the severity grading by different experts can be solved using an ML-based intelligent DSPN severity classifier. Therefore, this research aims to analyze the performance of different conventional ML-based intelligent classifiers for screening and stratification of DSPN severity and find the best performing classifier and MNSI variables.

Data Acquisition
In this research data was collected from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials which are conducted by the National Institute of Diabetes, Digestive and Kidney Diseases (Bethesda, Maryland, USA) to annually assess DSPN among type 1 diabetic patients since 1994 [38,39]. This clinical trial is still under continuous process which initially started with 1375 patients. Eight EDIC years of MNSI data were collected with 10,543 samples in total. MNSI is used to annually screen DSPN among the enrolled participants in EDIC trials.

Data Imputation
The MNSI dataset had a total of 10,543 samples, with 363 blank entries. After removing the blank entries, 10,180 samples were retrieved after removing the blank entries with missing values for MNSI variables. The k-nearest neighbor [35] data imputation technique had been used to fill the missing data.

Data Augmentation
The imputed EDIC dataset with 10,180 samples was imbalanced. Duplicate data were removed from the dataset keeping the first combinations only. Synthetic Minority Oversampling Technique (SMOTE) technique [40] had been used to balance the dataset with no overfitted data. Python 3.7 in-house written code was used for data imputation and augmentation.

DSPN Severity Scoring for MNSI
There are two parts to the MNSI [10] scoring system. The first part is a questionnaire consists of 15 yes/no questions related to the patient's symptoms. The second part consists of five clinical examinations: the appearance of the foot (AF), ulceration (Ulc), ankle reflection (AR), vibration perception (VP), and tactile sensitivity (TS) are included in the clinical tests. The detailed scoring mechanism is described in [10]. In this study, a total of six MNSI variables (Questionnaire, AF, Ulc, AR, VP, and TS) were used. The preprocessed MNSI dataset was graded using the scoring technique proposed by Watari et al. [30]. The scoring was ranged from 0 to 10 and the severity classes are divided as follows: (i). x ≤ 2.5: absent neuropathy (ii). 2.5 < x < 5.0: mild neuropathy (iii). 5.0 ≤ x < 8.0: moderate neuropathy (iv). x ≥ 8.0: severe neuropathy

Feature Ranking
The eXtreme Gradient Boosting (XGBoost) [41] algorithm-based feature ranking model was developed to observe the effects of MNSI variables for DSPN diagnosis. XGBoost is a decision-tree-based ensemble Machine Learning algorithm that is capable of finding the effectiveness of different features from a prediction model. The preprocessed dataset was used to rank the MNSI features according to their impact on DSPN identification. The design of the XGBoost model has been discussed in our previous study [32].

ML Model Development Using MNSI Data
This study focus on performance analysis of different conventional ML algorithm based DSPN severity classifier using MNSI variables. Here we have considered eight conventional and supervised ML algorithms: support vector machine (SVM), k-nearest neighbor (KNN), random forest (RF), discriminant analysis classifier (DAC), ensemble classifier (EA), naive Bayes (NB), linear regression (LR), and artificial neural network (ANN). All the ML models were designed using MATLAB ver. R2020a, (The MathWorks, Inc., Natick, Massachusetts, MA, USA) with two different input combinations from the MNSI variables and DSPN severity level as output. Stratified 10-fold cross-validation was used to train and test the designed ML models. The performance of the designed ML models was evaluated using confusion matrices and the calculation of different performance parameters. A multiclass SVM model had been considered in this work. KNN model was designed for 20 nearest neighbors. RF model was designed with a 100 bagged decision tree. ANN was designed with 100 hidden layers and trained for 100 epochs for each fold.

Statistical Analysis
For Statistical analysis SPSS software (version 21.0; SPSS Inc., Chicago, Illinois, IL, USA) was used. All the statistical analyses for baseline characteristics of the EDIC patients were performed based on the DSPN and Non-DSPN groups and expressed as mean ± standard deviation (SD). Analysis of variance (ANOVA) was used to find out the statistical significance of the variables. An independent t-test was used to find out the 95% confidence intervals (95% CI). Pearson's correlation coefficient was used to find out the correlation between different variables with DSPN classes. For the performance analysis of the ML models, ANOVA was used to find the statistical significance, Cohen's kappa statistic [42] was used to find the reliability of the performance of the ML models, and Matthews Correlation Coefficient [43] was used to find the correlations between the observed and predicted classifications. Statistical significance was considered at p < 0.05.

MNSI Dataset
EDIC patients' baseline demographic variables have been observed to understand the characteristics of the patients and been shown in Table 1. The EDIC patients' average age in the first year was 35.93 ± 6.945 years (657 male, 598 female), and the mean diabetic duration was 14.56 ± 4.906 years. Initially, we could have retrieved 957 non-neuropathic patients and 298 neuropathic patients, a total of 1255 patients' data from the first year of the EDIC trials. 8 year of EIDC dataset there was 8819 absent, 1075 mild, 245 moderate, and 40 severe samples. After processing the EDIC dataset by data imputation and augmentation techniques, the final data set was prepared with 1200 samples per class. As per our previous study [32], we have observed the importance index of all MNSI variables using the XGBoost model. From Figure 1 we can observe that the questionnaire has an important index of 0.35 whereas clinical tests are ranked as VP, TS, AR, and AF based on the importance index in between 0.10 to 0.20 and Ulc has the lowest index of 0.5 [32].  Two datasets were prepared based on the feature ranking results. The first dataset (dataset-1) consists of the top three MNSI variables from feature ranking-i.e., questionnaire, VP, and TS-were considered as input variables to training the ML models. Also, one study by Watari et al. [30] used these three variables to classify DSPN patients' severity using a fuzzy system. In the second dataset (dataset-2), all six MNSI variables were considered as inputs (questionnaire, AF, Ulc., AR, VP, TS) to train the ML models. Therefore, dataset-1 consisted of three inputs: VP and TS scores with a range from 0 to 2, questionnaire score with a range from 0 to 13, and one output: DSPN severity levels, 0: absent, 1: mild, 2: moderate, 3: severe neuropathic and dataset-2: consists of six variables: vibration perception, tactile sensitivity, ankle reflection, the appearance of feet and ulceration, ranging from 0 to 2 for each test and questionnaire with range 0 to 13, and one output: DSPN severity levels (0,1,2,3).

Performance Evaluation of ML Models
Two datasets were used for training eight conventional ML models-i.e., RF, SVM, EA, KNN, DAC, NB, LR, ANN, for DSPN severity classifiers-in total 16 models were trained. In the classification models, 10-fold stratified cross-validation was used and in case, 9-fold was used as train data, and 1-fold with 120 samples per class as test data. Tables 2 and 3 are showing the performance evaluation of ML-based DSPN severity classifiers for 10-fold cross-validation using dataset-1 and dataset-2, respectively. Figures S1 and S2 (Supplementary Materials) are showing the confusion matrix for all the ML classifiers using dataset-1 ( Table 2) and dataset-2 (Table 3). For dataset-1, RF has better accuracy (91.87 ± 1.42), sensitivity (91.8 ± 1.66), specificity (97.23 ± 0.55) compared with other ML algorithms, afterward, ANN, and SVM showed second-best performance for the dataset-1. All these three exhibit high correlation coefficients and substantial reliability based on kappa value. All the ML classifiers outputs showed a statistically significant relationship with test sets results.  For dataset-2, RF has better accuracy (98.50 ± 0.74), sensitivity (98.58 ± 1.67), specificity (99.50 ± 0.24) compare with other ML algorithms, afterward, SVM and EA showed second-best performance for the dataset-2. However, ANN showed poor performance for dataset-2 with 10-fold cross-validation and having a high standard deviation in performance parameters. All these three ML (RF, SVM, and EA) exhibits high correlation coefficients and near-perfect reliability based on kappa values. All the ML classifiers outputs showed a statistically significant relationship with test sets results for dataset-2. From Tables 2 and 3, it is visible that all the ML classifiers' performance enhanced with dataset-2 in comparison with dataset-1.

Discussion
Diabetic peripheral neuropathy (DSPN) is one of the major length-dependent complications of diabetic mellitus (DM). Since the 1900s, researchers are going on to establish a standardized diagnosis method for DSPN. To date, diagnosis, and severity stratification of DSPN requires manual grading by specialized expertise which are always subjective and vary depending on different screening methods. According to the study [7], less than one-third of health physicians would be able to identify the signs of DSPN, resulting in misleading diagnoses, contributing to high rates of morbidity and mortality. Although a variety of screening and diagnosing methods are accessible for DSPN, most of them require expensive equipment and specialized personnel to analyze the results from these tests; some of the methods are invasive and painful; some are not reproducible, and some have contradictory outcomes due to lack of standardization in diagnosis measures. Moreover, among the health professionals, there is a lack of understanding regarding the thorough diagnosis, controlling, and treatment process for DSPN [8]. Therefore, for early identification and satisfaction of the DSPN severity, a simple, cost-effective, reproducible, accurate diagnosis method is necessary, which can be globally applicable to solve the lack of understanding among the health professionals regarding DSPN.
Nowadays machine learning approaches are being researched in different aspects of healthcare systems due to their advantage of flexibility, cost-effectiveness, self-learning capacity, and being able to work as a second helping system for health professionals with accurate and reliable performance. Intelligent healthcare systems are capable of providing better patient satisfaction, helps health professionals with accurate, reliable, and real-time diagnosis, thus improving the healthcare facilities for DM patients. Intelligent systems using ML algorithms have now been widely researched for different biomedical systems and special importance is given to its application for disease diagnosis and minimization of health risks [21][22][23][24][44][45][46][47][48]. Alike other life-threatening diseases, DSPN has also caught the researchers' attention for the development of an artificial intelligence-based diagnosing system for DSPN [25][26][27][28][29][30][31][32]48].
In literature, detection and stratification of DSPN severity have been reported using the FIS, ANFIS, SVM, and ANN algorithms [28][29][30][31][32]48]. DSPN exhibits non-linear characteristics and progresses differently in every patient. As FIS is developed using the if-then rule base, there is a chance of having human error and reliance on expert knowledge in characterizing the non-linear DSPN characteristics, thus the performance can be biased. Duckstein et al. [28] used electrophysiological examination for diabetic neuropathy classification using a fuzzy inference system. Picon et al. [29] have used four input variables including symptom assessment, sign examination from MNSI and diabetic duration, and HbAc1. They proposed a fuzzy inference system that was based on expert knowledge. Watari et al. [30] also have used the same fuzzy model to classify DSPN into four classes and have considered only 3 MNSI parameters including system assessment, vibration perception, and tactile sensitivity as the model input. However, as fuzzy works with if-then rules, it requires professionals training to set the rules for the fuzzy system, which can vary as to its subjective to healthcare professionals evaluation thus have a chance of having human errors. Kazemi et al. [31] developed a multicategorybased SVM model for DSPN severity classification based on NDS; however, the performance of the model is not reliable and reported an accuracy of 76%. In the study [32], an ANFIS based DSPN severity classifier was designed using the same three MNSI variables that have been proposed in [30]. This study has reported an accuracy of 91% using the three MNSI variables, whereas, in our study, we have observed that, the results got improved significantly when all the MNSI variables were considered to design ML models. In a study [48], the ANN model was developed for the diagnosis of DSPN using NCS, but no severity classification had been studied.
This research aims to develop different conventional ML-based DSPN severity classifiers for accurate and reliable stratification of DSPN severity. Here eight conventional ML models-i.e., SVM, KNN, RF, EA, NB, DAC, ANN, and LR-were trained for the classification of DSPN patients into four severity groups: absent, mild, moderate, and severe. In this study, we only have considered the conventional machine learning models for developing the severity classifiers. Deep learning models have not been studies here as they are being widely used in complex classifications and regression problems where the data have high dimensions and complex features [49]. As we intend to develop a simple and cost-effective DSPN severity classifier, using deep models can introduce higher costs due to its complex computational models [50].
As DSPN exhibits non-linear characteristics, the data to train the model plays a crucial role. The performance of the ML models will depend on how well the data is showcasing the real situation. Therefore, for better accuracy of the models, we have considered a database from the EDIC trial, which is a large and continuous clinical trial, uses MNSI to follow-up the enrolled patients' DSPN condition annually [10,38,39]. As models were trained with a real dataset, it can accurately learn the non-linear characteristics of DSPN. As the MNSI variables are semi quantitation or non-quantitative tests, it can be easily deployed in any regular healthcare facility. As the EDIC trials consist of a wide range of demographic variables from 29 different clinical centers, this dataset is realistic in observing different classes of DSPN severity with a variety of populations.
Two datasets were used to train the ML models. For both of the datasets, the RF model was working better in comparison with other ML models used in this study. For models training dataset-1-i.e., top three MNSI variables from feature ranking-the performance of the ML models can be ranked as RF > EA > ANN > SVM > KNN > NB > DAC > LR. All the ML models using dataset-1 showed substantial reliability with kappa values between 0.66 to 0.78 [42] states that, the inputs used in dataset-1 are moderately accurate to identify DSPN severity. From the performance analysis for different ML models, it can be seen that only three variables are not enough to accurately identifying DSPN severity even though these variables got high importance index based on feature ranking. From dataset-2, where we have considered all the MNSI variables exhibit that, all the ML models exhibited very good performance except ANN. ANN performance has not been improved much after using all the MNSI variables and has a higher standard deviation in performance, indicating that, in some of the folds from the cross-validation process where ANN was not able to train properly and had poor performance. For dataset-2, ML models performance can be ranked as RF > SVM > EA > KNN > NB > DAC > LR > ANN. Also using all six MNSI variables to train ML models, the kappa values for the models were between 0.89 to 0.98 which indicates that the models are in perfect agreement [42] with the data and the variables used in dataset-2 are perfectly accurate to identify DSPN using ML models. Predicted classes by ML models and the true classes using dataset-2 have a higher correlation in comparison with dataset-1. From this study, we can recommend that all the six MNSI variables need to be considered while DSPN severity grading for higher accuracy of the model's performance.
According to the International Diabetic Federation [1] in 2019 almost 463 million people are affected with diabetes and 50% of the total prevalence is suffering from DSPN. USD 760 million is spent on diabetics and the health expenditure for diabetic patients increases with severity [1,51]. By enhancing the awareness among patients about DSPN as well as the performance of the diagnosis methods will help to improve the healthcare facilities for diabetic patients. As almost 50% of the DM patients are affected by DSPN at some point of DM duration, the global expenditure can be significantly reduced if an improved, cost-effective, accurate, reliable diagnosis method can be deployed which will be able to help with real-time DSPN severity identification and will allow early detection and treatment of diabetic neuropathy as well as prevent from severe complications like foot ulceration and amputation. ML algorithms based on DSPN severity classifiers are capable of providing all these benefits to DM patients. It will also be beneficial in overcoming the shortcomings in the available conventional diagnosis methods which relied on offline analysis by healthcare professionals, leading to a delayed and sometimes biased diagnosis for DSPN. The analysis results showed that RL models outperforming the other ML models with all MNSI variables for DSPN severity classification. This RF based DSPN severity classifier can be used as a support system for the healthcare professionals in more accurate, reliable, and faster identification and stratification of DSPN. A limitation of this study is that it had been conducted using the EDIC dataset, which only recruited type-1 diabetic patients. The effect of DSPN in type-2 patients and their severity classification using MNSI still need to be studied. Nerve conduction studies (NCS) have been considered the gold standard for DSPN identification and stratification. In the future, we aim to use NCS and other risk factors for DSPN with MNSI for severity identification and stratification using ML models. In the future, a prediction system can be incorporated with an RF-based DSPN classifier so that health professionals will be able to predict patients' future conditions using patients' previous and present conditions. This will help to identify the high-risk individuals ahead of time so that proper treatment can be provided to the patients to avoid extreme situations.

Conclusions
DSPN is one of the most common forms of diabetic neuropathy (DN) and almost 90% of the DN patients suffer from it. Diagnosis of DSPN is complicated because of contradictory and subjective diagnosis techniques.
Although many diagnoses and composite scoring techniques have been reported and many studies are being conducted to validate these systems, yet it lacks consistency and is sensitive to population size. To overcome this issue, machine learning techniques can be a good solution. The application of ML in different aspects of the biomedical sector has shown a promising impact in improving the performance from the usual methods. In this research, we have studied the performance of different conventional ML techniques (RF, SVM, KNN, EA, NB, DAC, ANN, LR) in the diagnosis and stratification of DSPN. We have using the MNSI composite scoring technique for DSPN diagnosis and observed the importance of the MNSI variables on DSPN identification and stratification. From this analysis, we have found that the random forest algorithm with all MNSI variable model works better in DSPN stratification. Therefore, a random forest based MNSI scoring technique can help health care professionals to identify DSPN patients and grade their severity. This type of system can overcome the problem of inconsistency and lack of agreement between professionals with diagnostic criteria for DSPN.