A Machine Learning-Based Severity Prediction Tool for the Michigan Neuropathy Screening Instrument

Diabetic sensorimotor polyneuropathy (DSPN) is a serious long-term complication of diabetes, which may lead to foot ulceration and amputation. Among the screening tools for DSPN, the Michigan neuropathy screening instrument (MNSI) is frequently deployed, but it lacks a straightforward rating of severity. A DSPN severity grading system has been built and simulated for the MNSI, utilizing longitudinal data captured over 19 years from the Epidemiology of Diabetes Interventions and Complications (EDIC) trial. Machine learning algorithms were used to establish the MNSI factors and patient outcomes to characterise the features with the best ability to detect DSPN severity. A nomogram based on multivariable logistic regression was designed, developed and validated. The extra tree model was applied to identify the top seven ranked MNSI features that identified DSPN, namely vibration perception (R), 10-gm filament, previous diabetic neuropathy, vibration perception (L), presence of callus, deformities and fissure. The nomogram’s area under the curve (AUC) was 0.9421 and 0.946 for the internal and external datasets, respectively. The probability of DSPN was predicted from the nomogram and a DSPN severity grading system for MNSI was created using the probability score. An independent dataset was used to validate the model’s performance. The patients were divided into four different severity levels, i.e., absent, mild, moderate, and severe, with cut-off values of 10.50, 12.70 and 15.00 for a DSPN probability of less than 50, 75 and 100%, respectively. We provide an easy-to-use, straightforward and reproducible approach to determine prognosis in patients with DSPN.


Introduction
Diabetic sensorimotor polyneuropathy (DSPN) leads to ulceration and amputation which are independently associated with increased mortality [1]. Early identification of DSPN is key to improve risk factors that may prevent the progression of DSPN [2][3][4][5]. The American Diabetic Association (ADA) [1] and Toronto consensus statements [6] recommended that the diagnosis of DSPN should be based on an assessment of symptoms and signs and nerve conduction studies (NCS). A number of diagnostic techniques are available for DSPN [1,[7][8][9][10], alongside several composite scoring methods for severity stratification [11][12][13]. The Toronto consensus endorsed the use of a composite screening technique for defining the severity of DSPN [6].
The Michigan neuropathy screening instrument (MNSI) is a commonly utilized composite scoring technique recommended in the ADA position statement [1] for the clinical diagnosis of DSPN. It is a simple, inexpensive, reliable, and accurate assessment [13,14] that has been used to identify DSPN in many studies and clinical trials [14][15][16][17][18][19][20]. Neuropathy symptoms are assessed from 15 yes/no questions and neuropathy signs are assessed from five simple clinical tests. A patient is considered to have DSPN if the total score is ≥7 or ≥2 on the MNSI questionnaire or clinical tests, respectively [13]. However, there is controversy on the optimal cut-off value for identifying DSPN, with studies suggesting different cut-offs ranging from 2 to 1.5 [18], 2.5 [18][19][20], 3 [18] and 4 [21]. Moghtaderi et al. [18] reported an MNSI cut-off of 2 with a reliability of 0.81. Other studies have reported 80% sensitivity and 95% specificity and good repeatability for an MNSI examination cut-off ≥2.0 [13]. Herman et al. [19] suggested the use of MNSI in clinical trials due to its ease of use compared to NCS. However, the MNSI lacks a standardized grading system for severity classification.
Recently, machine learning (ML) approaches have been successfully used to solve different disease prediction and classification problems [22][23][24], because of their ability and reliability in extracting information from complex, non-linear, or incomplete data, supporting healthcare professionals in decision-making [25,26]. The fuzzy inference system (FIS) [27,28], multi-category support vector machine (SVM) learning [29], and adaptive fuzzy inference system (ANFIS) [30], have been reported to aid in the identification and stratification of diabetic neuropathy (DN). However, fuzzy systems-based classifiers do not appear to be reliable because they make use of the if-then rule-based set. Kazemi et al. [29], put forward a DSPN severity classifier based on a multiclass SVM, utilizing the neuropathy disability score (NDS), and reported an accuracy of 76%. Haque et al. [30] used ANFIS to report an accuracy of 91% for DSPN severity classification based on three MNSI variables (vibration perception, questionnaire, and tactile sensitivity). Reddy et al. [31] identified various risk factors for DN and proposed a Radial basis function (RBF) network for DN prediction, but only achieved 68.18% accuracy. Chen et al. [32] developed a prediction model to identify diabetic peripheral neuropathy (DPN) using MNSI by applying logistic regression (LR) and reported the value of the concordance index (c-index) to be 0.75.
We have deployed ML to develop a DSPN severity grading system from MNSI data. Initially, the most appropriate MNSI features were identified from a nomogram based on multivariable logistic regression and this was then developed and validated for classifying the severity of DSPN.

Database Description
Two different Michigan neuropathy screening instrument datasets were collected. The first dataset was sourced from the Epidemiology of Diabetes Interventions and Complications (EDIC) study [33,34]. In EDIC, the MNSI was used annually to assess DSPN in patients with type 1 diabetes [33,34]. A detailed description of the EDIC trial procedures and baseline characteristics of the patients have been reported previously [33][34][35]. Validation of our model was achieved in an independent MNSI dataset made available by Watari et al. [28] and is comprised of 102 patients with 21 MNSI variables: 15 questionnaires, vibration perception (L), vibration perception (R), 10-gm filament (combined results from both legs), the appearance of deformities (combined results from both legs), the appearance of callus (combined results from both legs), the appearance of fissure (combined results from both legs). For consistency we considered 21 variables from both data sets to design our prediction model.

Data Imputation
In practice, missing values in clinical data from larger clinical trials such as EDIC are quite a common phenomenon. Because the training of ML models depends highly on the dataset provided, missing data can be misleading for ML model training. To overcome this issue, data imputation techniques were applied [36]. MNSI data from 19 years of EDIC trials with 14,166 samples were collected. Many duplicate responses were removed, and 3754 unique samples were retrieved. In this study, missing data were calculated by the multiple imputations by chained equations (MICE) technique [37,38].

Feature Ranking
To ascertain the best possible combination of MNSI features to identify DSPN, three different feature ranking techniques, namely random forest (RF), [39] multi-tree extreme gradient boost (XGBoost) [40], and extremely randomized trees (extra tree) [41] techniques were used, and the best-performing algorithm was identified and reported. The in-house code for data imputation and feature ranking was written using Python 3.7.

Logistic Regression Classifiers
A supervised logistic regression classifier was utilized [42] for validating the performance of the top entries of the feature ranking. Logistic regression is commonly used for biomedical classification tasks [42,43], and in this case could assess the association of multiple variables with an outcome, e.g., DSPN or non-DSPN. The dataset was partitioned into a 70/30 ratio for the train and test set. The LR model was trained using five-fold cross-validation. Different performance parameters were calculated for evaluating the model's performance.

Development and Validation of Logistic Regression-Based Nomogram
A diagnostic nomogram was constructed by Zlotnik and Abraira [44] using multivariate logistic regression analysis in Stata/MP software (StataCorp LLC, College Station, TX, USA). The multivariate logistic regression model was developed for two classes: DSPN and non-DSPN. The coefficients calculated from the LR model were used to calculate linear prediction as shown in Equations (1) and (2). Using Equation (2), we calculated the probability of having DSPN, as shown in Equation (3).
The top-ranked features (i.e., the independent variables) exhibiting the best performance with the LR classifier were used to create the logistic regression-based nomogram. Calibration curves were plotted for evaluating the performance of the model. Utilizing the Stata tool, we also performed the decision curve analysis (DCA) for identifying the threshold values for clinically useful nomograms.

Development and Validation of Severity Grading Score
From the nomogram, a four-class DSPN severity scoring technique was proposed based on the probable cut-off values. The performance of the proposed grading system was validated with EDIC ground truth and the grading system proposed in [28].

Patients' Characteristics and Clinical Outcomes
The EDIC patients' baseline demographic variables are presented in Table 1. More details on EDIC patients can be found in other studies [33][34][35]. From the collected dataset, 3754 unique data samples were retrieved after removing duplicate responses. Among the 3754 unique samples, 2177 samples were from non-DSPN and the remaining 1577 samples were from DSPN patients. Figure 1 demonstrates the top-10 ranked MNSI features, as identified by the extra tree feature ranking technique. These are sensitivity to the 10gm filament, vibration perception (L), vibration perception (R), the appearance of callus, appearance of deformities, previous diabetic neuropathy, the appearance of fissure, numb leg, burning leg, and response to bed cover touch. The results of the Xgboost and RF feature ranking techniques are shown in Supplementary Figures S1 and S2. There is no difference in the ranked features by the extra tree and RF technique. Therefore, we studied the extra tree and Xgboost technique to find the combination of features with the best performance.

Patients' Characteristics and Clinical Outcomes
The EDIC patients' baseline demographic variables are presented in Table 1. More details on EDIC patients can be found in other studies [33][34][35]. From the collected dataset, 3754 unique data samples were retrieved after removing duplicate responses. Among the 3754 unique samples, 2177 samples were from non-DSPN and the remaining 1577 samples were from DSPN patients. Figure 1 demonstrates the top-10 ranked MNSI features, as identified by the extra tree feature ranking technique. These are sensitivity to the 10-gm filament, vibration perception (L), vibration perception (R), the appearance of callus, appearance of deformities, previous diabetic neuropathy, the appearance of fissure, numb leg, burning leg, and response to bed cover touch. The results of the Xgboost and RF feature ranking techniques are shown in Supplementary Figures S1 and S2. There is no difference in the ranked features by the extra tree and RF technique. Therefore, we studied the extra tree and Xgboost technique to find the combination of features with the best performance.

Univariate Logistic Regression Model for Identifying Variables Significantly Associated with DSPN
Both the top 9 and top 10 features had an AUC of 0.96 for the data imputed utilizing MICE and the extra tree feature-ranking technique ( Figure 2). Visually, it seems that

Univariate Logistic Regression Model for Identifying Variables Significantly Associated with DSPN
Both the top 9 and top 10 features had an AUC of 0.96 for the data imputed utilizing MICE and the extra tree feature-ranking technique ( Figure 2). Visually, it seems that model performance was saturated after the top 9 features. To confirm and identify the best possible combination of the features, we used logistic regression classifiers for performance evaluation. In order to determine how the ranked features performed for identifying DSPN, the logistic regression classifier was trained with the top-1 to top-15 feature combination. Table 2 demonstrates the weighted average performance and the overall accuracies of other matrices for different models, utilizing the top-1 to top-15 features for the five-fold cross-validation through a logistic regression classifier, together with the confusion matrices for each of the cases. With more than the top-10 features, there was no major change in the performance of the logistic regression classifier. The results from the LR classifier using the top-10 ranked features for the Xgboost technique are reported in Supplementary Table S1. The top-10-ranked features utilizing the extra tree technique have the best performance for the diagnosis of DSPN and non-DSPN patients compared to the Xgboost technique. The top-10 feature combinations provide the best performance accuracy of 92% for DSPN identification (Table 2). Although, the top-7 feature exhibits a reasonable performance in identifying DSPN and non-DSPN classes with 90% sensitivity and specificity; hence, balanced performance in identifying both classes. To establish the best feature combination between the two we considered both the top 7 and top 10 feature models. Tables 3 and 4 show the LR models for the top 7 and top 10 features, respectively. In LR models, the z-value indicates the contribution of each variable used in the model to predict the output. As seen in Tables 3 and 4 all the features were statistically significant with a p-value less than 0.05. To choose the best performing model, between the top 10 and top 7 feature LR models, both models were implemented on the EDIC and independent test set from Watari et al. [28]. Table 5 shows the performance evaluation metrics for both models. The top-10 features model has an accuracy of 91% on the EDIC dataset and an accuracy of 86% on the independent dataset from Watari et al. [28] (Table 5). However, The top-10-ranked features utilizing the extra tree technique have the best performance for the diagnosis of DSPN and non-DSPN patients compared to the Xgboost technique. The top-10 feature combinations provide the best performance accuracy of 92% for DSPN identification (Table 2). Although, the top-7 feature exhibits a reasonable performance in identifying DSPN and non-DSPN classes with 90% sensitivity and specificity; hence, balanced performance in identifying both classes. To establish the best feature combination between the two we considered both the top 7 and top 10 feature models.  Tables 3 and 4 show the LR models for the top 7 and top 10 features, respectively. In LR models, the z-value indicates the contribution of each variable used in the model to predict the output. As seen in Tables 3 and 4 all the features were statistically significant with a p-value less than 0.05. To choose the best performing model, between the top 10 and top 7 feature LR models, both models were implemented on the EDIC and independent test set from Watari et al. [28]. Table 5 shows the performance evaluation metrics for both models. The top-10 features model has an accuracy of 91% on the EDIC dataset and an accuracy of 86% on the independent dataset from Watari et al. [28] (Table 5). However, the top-7 features model exhibited consistently high and comparable performance on the EDIC and Watari et al. [28] data sets with an accuracy of 90% and 91%, respectively. Given that the LR model with the top-7 feature combination has reliable performance on both datasets we developed the nomogram and the severity grading system from the top-7 feature combinations: 10-gm filament, vibration perception (L), vibration perception (R), appearance of callus, appearance of deformities, previous diabetic neuropathy, appearance of fissure.      Figure 5 shows the nomogram generated using multivariate logistic regression for DSPN probability prediction utilizing the top-7 MNSI features. The nomogram spreads over 10 rows. The top 1-7 rows represent seven MNSI variables, together with a scale indicating the corresponding responses. The eighth row is the score scale for the responses of the seven variables. Row 9 is the probability axis indicating the probability of DSPN in patients based on the MNSI responses. Row 10 is the total score scale, where all the scores for each MNSI response are added to calculate the final score. Figure 6 demonstrates an  Figure 5 shows the nomogram generated using multivariate logistic regression for DSPN probability prediction utilizing the top-7 MNSI features. The nomogram spreads over 10 rows. The top 1-7 rows represent seven MNSI variables, together with a scale indicating the corresponding responses. The eighth row is the score scale for the responses of the seven variables. Row 9 is the probability axis indicating the probability of DSPN in patients based on the MNSI responses. Row 10 is the total score scale, where all the scores for each MNSI response are added to calculate the final score. Figure 6 demonstrates an example scoring system based on a nomogram, for a DSPN patient who possesses the variable values at baseline. Individual scores for each predictor were computed and added to calculate the total score. The calculated DSPN probability is 98% and according to Table 6 the patient has severe DSPN. The DSPN probability of a patient can also be calculated using Equations (4) and (5), which were derived from the LR model for the top 7 features (Table 3). For each MNSI response, a score was generated by the nomogram. Supplementary  Table S2 (Supplementary Materials) shows the MNSI responses and their corresponding score. All the scores corresponding to the MNSI responses were added together to obtain the total score. The total score was then used to calculate the DSPN probability from the nomogram. Using the total score and corresponding probability, we developed a four-class severity grading system as shown in Table 6. The probability values less than 50%, between 50% and 75 %, between 75% and 90%, and more than 90% were categorized into absent, mild, moderate, and severe groups, respectively. For each MNSI response, a score was generated by the nomogram. Supplementary  Table S2 (Supplementary Material) shows the MNSI responses and their corresponding score. All the scores corresponding to the MNSI responses were added together to obtain the total score. The total score was then used to calculate the DSPN probability from the nomogram. Using the total score and corresponding probability, we developed a fourclass severity grading system as shown in Table 6. The probability values less than 50%, between 50% and 75 %, between 75% and 90%, and more than 90% were categorized into absent, mild, moderate, and severe groups, respectively.

Evaluation of Performance of the Nomogram Model
We applied the developed grading system on the train, test and independent test set and classified patients into four different classes of DSPN severity, namely absent (non-DSPN), mild, moderate and severe DSPN. For the EDIC train and test set, the patient's severity classes were cross correlated with the EDIC binary ground truth (Tables 7 and 8). For the EDIC training set (  (Table 9), 93.1% of the patients classified as absent were non-DSPN while the remaining 6.9% had DSPN. In patients classified as mild, there was an equal number of both DSPN and non-DSPN patients (i.e., 50% of each). For both the moderate and severe DSPN; no patient was mis-classified. Watari et al. [28] put forward a DSPN severity grading system by utilizing a fuzzy inference system (FIS) using three MNSI variables (questionnaire, vibration perception, and 10-gm monofilament) and a patient severity class using their grading system was available. In Table 10, we compare their results with our prediction models on the same MNSI data set. According to Watari et al. [28], among 102 patients, 29, 25, 27, and 21 had absent, mild, moderate, and severe DSPN, whereas based on our proposed model, 59, 10, 9 and 25 patients had absent, mild, moderate and severe DSPN (Table 10) showing a lack of agreement between the grading systems. Furthermore, according to the EDIC definition of DSPN [14,34], there were 59 non-DSPN patients and 43 DSPN patients in the study by Watari et al. [28] (Table 11). However, the fuzzy system classified 29 as non-DSPN and 73 as DSPN and as per our proposed grading system, the dataset had 58 non-DSPN and 44 DSPN patients, indicating that the proposed grading system agrees with the EDIC definition of DSPN [14,34]. However, because Watari et al. [28] selected only three variables, i.e., questionnaire, vibration perception, and tactile sensitivity for input, because the fuzzy inference system is an if/then rule-based system, there is the possibility of bias due to an inadequate number of variables for the identification of DSPN. Our prediction model could detect the moderate and severe DSPN groups accurately without any misclassification of the training, test, and independent test datasets, and additionally demonstrated better accuracy in identifying the absent DSPN class patients (Tables 7-9).

Evaluation of Performance of the Nomogram Model
We applied the developed grading system on the train, test and independent test set and classified patients into four different classes of DSPN severity, namely absent (non-DSPN), mild, moderate and severe DSPN. For the EDIC train and test set, the patient's severity classes were cross correlated with the EDIC binary ground truth (Tables 7 and 8). For the EDIC training set ( Table 7), out of the 1526 patients classified by the proposed grading system as absent, 89.2% were non-DSPN while the remaining 10.8% had DSPN as per the EDIC ground truth. In 292 mild DSPN patients, 55.5% were non-DSPN while    The difference in DSPN identification as per the EDIC definition and fuzzy model suggests a need to improve the latter. There was an association between different DSPN severity classes in the independent test set and the grading by Watari et al. [28]. Watari et al. [28] had 29 absent, 25 mild, 27 moderate, and 21 severe patients, whereas our model predicted 59 absent, 10 mild, 9 moderate, and 25 severe cases (Table 10) in the same groups as Watari et al. [28]. Our nomogram-based model is more robust because it considers all the important MNSI parameters in DSPN prediction and severity grading compared to only a few parameters in the fuzzy model. This scoring technique based on a nomogram can diagnose and infer the DSPN severity of patients into absent, mild, moderate, and severe (please refer to Table 6).

Discussion
Diabetic neuropathy may be classified as sensorimotor polyneuropathy or autonomic neuropathy. This research has focused on sensorimotor polyneuropathy as it has significant consequences in relation to foot ulceration, amputation and increased mortality. Whilst the ADA position statement advocates the use of symptoms, signs, and electrophysiology [1], other guidelines have suggested the use of quantitative sensory testing and intraepidermal nerve fibre density (IENFD) for diagnosing DSPN [6][7][8][9][10]. However, neurophysiology and IENFD are expensive, require specialized personnel, and are not suitable for large clinical trials. Composite screening methods that assess symptoms and signs of DSPN have been used widely [12] and include the MNSI which has been used in epidemiological studies [13][14][15][16][17], large clinical trials such as DCCT⁄EDIC [33][34][35] and the Action to Control Cardiovascular Disease in Diabetes (ACCORD) [45].
The MNSI questionnaire and examination can identify the presence of clinical neuropathy but have not been validated to grade the severity of DSPN as per the neuropathy disability score (NDS) or the neuropathy symptom score (NSS) [11,12]. Feldman et al. [13] advised that patients with a positive MNSI should undergo assessment of the Michigan diabetic neuropathy score (MDNS), which includes a clinical examination and nerve conduction studies (NCS). However, NCS have a large inter-individual variability and moderate reproducibility and are therefore not suitable for large clinical trials, unless the outcome is standardised using a central reading facility. A simple and reliable DPSN severity scoring system is highly desirable to identify patients with mild disease, in addition to those at high risk of foot ulceration.
Using a state-of-the-art machine learning model, we have designed and deployed a prediction scoring system utilizing MNSI to classify patients in the DCCT/EDIC clinical trial into absent, mild, moderate and severe DSPN. Of note, the original dataset from the EDIC clinical trial had missing and duplicate responses for many patients and therefore after eliminating duplicate samples, we imputed the dataset utilizing the MICE algorithm to predict the missing values. The MNSI variables were ranked after taking into consideration their importance index for DSPN identification using various feature ranking techniques. The extra tree algorithm was found to be the best-performing algorithm for identifying the best combination of MNSI variables. The logistic regression classifier was trained for the top 1 to 15 feature combinations using five-fold cross-validation for identifying the best combination of features. Two models with the top 7 and top 10 variables showed promising results with AUCs of 94% and 96%, respectively. The top 10 models showed better AUC, sensitivity, and accuracy compared to the top 7 ranked features model when validated on an external independent dataset by Watari et al. [28]. However, only marginal improvements were achieved by using the top-10 ranked features model from the top 7 feature model, therefore the top 7 ranked features were selected to develop the nomogram using a multivariant logistic regression model. On the basis of this nomogram, the DSPN severity grading system was proposed based on the predicted DSPN probability and total score on MNSI.
A major strength of our study is that it was undertaken using data from a large number of patients in the established DCCT/EDIC trials. Our model could infer moderate and severe DSPN without any misclassification for the train, test, and independent test set, and also exhibited high accuracy for absent DSPN. Although, misclassification was evident for those with mild DSPN, using MNSI as ground truth may not be adequate as it relies primarily on identifying large fibre damage with the possibility of missing earlier small fibre damage evident in mild DSPN. Furthermore, the model performed well in patients with either type 1 or type 2 diabetes. However, this model has only been validated with the performance of the FIS model used by Watari et al. [28]. In the future, we plan to validate the model performance utilising NCS and NDS to improve reproducibility and robustness of the model. In conclusion, we have designed, implemented and validate a DSPN severity scoring system based on a machine learning model, utilizing MNSI which could aid researchers and clinicians as an auxiliary decision-making system. This study highlights the potential for machine learning-based applications to diagnose and stage DSPN severity.

Conclusions
The detection of early DSPN is key to preventing foot ulceration, amputation and increased mortality in patients with diabetes. MNSI, originally developed to screen for DSPN, has been used widely in epidemiological studies and even in clinical trials, even though it lacks a severity grading system. In this study, we have applied ML-based approaches to develop a DSPN severity grading system for MNSI. Using the extra tree feature ranking technique, we have identified the seven best MNSI features i.e., vibration perception (R), 10-gm monofilament, presence of diabetic neuropathy, vibration perception (L), the appearance of callus, deformities and fissure for identifying DSPN. These features were used to develop a nomogram-based probability model, and from the probability model, a severity scoring technique was proposed and validated in three data sets. MNSI could therefore be easily used to detect DSPN severity in large clinical trials.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics13020264/s1. Figure S1. Top-ranked 10 features identified using Random Forest algorithm from data imputed using MICE algorithm. Figure S2. Top-ranked 10 features identified using Xgboost algorithm from data imputed using MICE algorithm. Table S1. Comparison of the average performance matrix and confusion matrix from five-fold crossvalidation for top 1 to 10 features using MICE data imputation and logistic regression classification techniques for XGBoost feature selection algorithms. Table S2. MNSI variables score from the generated nomogram.

Data Availability Statement:
The datasets used in this study is not publicly available.