Prediction of Diabetic Sensorimotor Polyneuropathy Using Machine Learning Techniques

Diabetic sensorimotor polyneuropathy (DSPN) is a major complication in patients with diabetes mellitus (DM), and early detection or prediction of DSPN is important for preventing or managing neuropathic pain and foot ulcer. Our aim is to delineate whether machine learning techniques are more useful than traditional statistical methods for predicting DSPN in DM patients. Four hundred seventy DM patients were classified into four groups (normal, possible, probable, and confirmed) based on clinical and electrophysiological findings of suspected DSPN. Three ML methods, XGBoost (XGB), support vector machine (SVM), and random forest (RF), and their combinations were used for analysis. RF showed the best area under the receiver operator characteristic curve (AUC, 0.8250) for differentiating between two categories—criteria by clinical findings (normal, possible, and probable groups) and those by electrophysiological findings (confirmed group)—and the result was superior to that of linear regression analysis (AUC = 0.6620). Average values of serum glucose, International Federation of Clinical Chemistry (IFCC), HbA1c, and albumin levels were identified as the four most important predictors of DSPN. In conclusion, machine learning techniques, especially RF, can predict DSPN in DM patients effectively, and electrophysiological analysis is important for identifying DSPN.


Introduction
Type 2 diabetes mellitus (T2DM), the most common form of diabetes, is a major disease in humans worldwide [1], and its incidence is increasing with aging and lifestyle changes [2]. There is evidence that half of T2DM patients experience neurological disorders and a progressive disability of nerve fibers in the course of diabetes, and serious neurological symptoms lead to poor quality of life [3]. Diabetic sensorimotor polyneuropathy (DSPN) is a common neurological complication resulting from neuroinflammation, mitochondrial dysfunction, and apoptosis due to hyperglycemia, dyslipidemia, and altered insulin signaling, and leads to various symptoms and signs, including neuropathic pain, decreased sensation, and foot ulceration [4,5]. The management of DSPN is not limited to controlling hyperglycemia, and multidisciplinary programs, such as patient education, lifestyle modification, and physical activity, are required to control various physical and psychological symptoms and foot complications [6]. Therefore, early detection and prediction of DSPN is very important in DM patients.
The classification of DSPN has been defined in previous studies [7][8][9][10]. Typical DSPN is the most common form in DM patients and chronic, symmetrical, and length-dependent sensorimotor polyneuropathy [11]. Tesfaye et al. defined the minimal criteria for typical DSPN to estimate severity: possible, probable, confirmed, and subclinical based on clinical symptoms and signs and electrophysiology [7]. Numerous staging and scoring systems have been developed to assess the severity of DSPN; however, choosing the optimal scoring system is confusing because the results of previous studies are different regarding which system is effective [12][13][14]. Electrophysiological assessments, including nerve conduction studies (NCS), are important for diagnosing DSPN objectively [15,16]; however, special equipment is needed, and these assessments cannot be performed routinely for patients without clinical symptoms or signs because of the discomfort caused by electrical stimulation or needle insertion. Because the pathophysiology of diabetic neuropathy reveals a broad spectrum of axonal involvement and segmental demyelination, electrophysiological findings also indicate both axonal degeneration and demyelination [17]. Numerous predisposing factors for the development of DSPN have been found [18][19][20][21]. DSPN is significantly correlated with poor glucose control [18,19], longer duration of diabetes, poor metabolic management, smoking and the presence of cardiovascular disease, and DSPN severity is correlated with hypertension, dyslipidemia, microalbuminuria, alcohol consumption, and body mass index [20,21]. Most previous studies on the prediction of DSPN used various statistical methods. While traditional statistical methods draw only population inferences from clinical information, recently developed machine learning (ML) methods focus on developing predictive models from general-purpose learning algorithms [22]. Therefore, ML is considered to be a better way to predict DSPN in DM patients.
ML is a computationally broad and powerful data mining technique that can accommodate a large set of proposed variables as inputs to identify factors related to the results of interest [23], and ML develops algorithms that can learn patterns and decision rules, such as early detection, prediction and diagnosis, from data that are attributable to the medical field. Recent studies have used various ML techniques to predict complications, including retinopathy, nephropathy, foot ulceration and DSPN, in T2DM patients [24][25][26][27][28], and ML was effective for prediction of DSPN severity [24], 3-year complication developments [25], high-risk retinopathy, and numerous complications in nonadherent T2DM [27]. Haque et al. found that machine learning algorithms, especially random forest (RF), were effective in predicting DSPN severity based on the scoring system using Michigan Neuropathy Screening Instrumentation [29], which is not used widely, and that study assessed only type 1 diabetes mellitus (T1DM) patients.
The purpose of the current study was to delineate whether machine learning techniques are more useful than traditional statistical methods for predicting DSPN in type 2 DM patients, and whether the widely used classification for DSPN, which is based on clinical and electrophysiological findings, is amenable to the use of predictive models.

Subjects
Medical records of patients with T2DM who visited Dankook University Hospital for the management of DM were collected, and 746 subjects were initially enrolled ( Figure 1). Patients were diagnosed with T2DM by a physician at the Department of Endocrinology, based on the guideline of the American Diabetes Association [30]. Patients who did not undergo electrophysiological studies (n = 206) or had incomplete clinical data (n = 53) were excluded at first, and then patients who had other types of polyneuropathies, including heavy alcohol use (n = 3), hepatic failure (n = 2), renal failure (n = 4), chemotherapy for malignancy (n = 7), and typical musculoskeletal anomalies (n = 1), were subsequently excluded. As a result, 470 patients were included in the study (Figure 1). This study was approved by the Dankook University Hospital Institutional Review Board (IRB No. 2019-12-009). malignancy (n = 7), and typical musculoskeletal anomalies (n = 1), were subsequently excluded. As a result, 470 patients were included in the study (Figure 1). This study was approved by the Dankook University Hospital Institutional Review Board (IRB No. 2019-12-009).

Classification
Subjects were classified into 4 groups according to definitions of minimal criteria for typical DSPN based on the area of clinical care by Tesfaye et al. [7]: normal, possible, prob-

Classification
Subjects were classified into 4 groups according to definitions of minimal criteria for typical DSPN based on the area of clinical care by Tesfaye et al. [7]: normal, possible, probable, and confirmed. The normal group (n = 93) consisted of subjects without any neurological symptoms or signs as previously described [7], and the possible group (n = 91) comprised subjects with one of the neurological symptoms or signs. The probable group (n = 13) comprised subjects with two or more neurological symptoms or signs. The confirmed group (n = 273) consisted of subjects with abnormal electrophysiological findings and neurological symptoms or signs. Electrophysiological assessments were performed according to the guidelines of the American Academy of Neurology [16], and NCS and electromyography of the upper and lower extremities were conducted. According to electrophysiological findings, the confirmed group was divided into two subgroups: A demyelinated subgroup (n = 87) with subjects who predominantly showed demyelination and a mixed subgroup (n = 186) with subjects who showed abnormal spontaneous activities during needle electromyography and demyelination ( Figure 1).

Clinical Data
All subjects' clinical information, such as baseline characteristics, past medical history, current health status, diabetic complications, and medications, was analyzed. Baseline characteristics included age, sex, weight, height, body mass index (BMI), disease duration (from initial diagnosis of T2DM to the date of the last follow-up at the hospital), smoking (current smoking, past smoking, or nonsmoking), family history of T2DM, and diabetes education. Past medical history included hypertension (HTN), dyslipidemia, and history of stroke and coronary artery disease. HTN was defined as systolic blood pressure > 140 mmHg, diastolic blood pressure > 90 mmHg or the use of antihypertensive medications. Diabetic retinopathy was included in diabetic complications. Medications for DM, HTN and dyslipidemia were included; medications for DM were metformin, sulfonylureas, thiazolidinediones (TZDs), dipeptidyl peptidase-4 inhibitors (DPP4is), sodium-glucose cotransporter-2 inhibitors (SGLT2is), and insulin; medications for HTN were calcium channel blockers (CCBs), angiotensin-converting-enzyme inhibitors (ACEis), angiotensin II receptor blockers (ARBs), beta blockers (BBs) and thiazides; and medications for dyslipidemia were statins. BMI was calculated as weight in kilograms divided by the square of height in meters.

Laboratory Data
A total of 432 laboratory codes from blood and urine tests were obtained from all subjects, and we divided subjects into a control group (n = 197) with normal electrophysiological findings and a test group (n = 273) with abnormal electrophysiological findings within the criteria of DSPN to identify the optimal number of laboratory codes ( Figure 2). Forty-eight codes could be obtained for more than half of the subjects (n = 98) in the control group, and 62 codes could be obtained for more than half of the subjects (n = 135) in the test group (Figure 2a). When the results of the two groups were combined, 39 laboratory codes were ultimately selected ( Figure 2b). Each laboratory code was assessed several times during the follow-up periods (range: 31-18368 days, mean value: 5202.9 days), and various changes in the values were observed within the period (Figure 2c).
Three methods were used to standardize the values of laboratory codes for ML analysis. Method 1 refers to the average value of each laboratory code during the follow-up period, method 2 is the first value of each laboratory code when T2DM was initially diagnosed while visiting the hospital, and method 3 refers to the pattern of laboratory code changes. The pattern was defined as −1, 0, and 1 as follows. If the initial value was 10% or more lower than the overall average of the values excluding the initial value, it was considered −1; if the change was less than 10%, it was regarded as 0; and if the initial value was greater than 10% of the overall average of the values excluding the initial value, it was regarded as 1.

Machine Learning Analysis
First, to define which variable set will be used for the classification model, a random forest (RF) model trained by different variable combinations was tested. As described above, there are four different variable sets: clinical data and methods 1, 2, and 3 for laboratory data. RF was trained with all possible combinations of four variable sets. Because of the limitation of the sample size, the sample was divided into ten groups, and each group was used as the test set. For each test set, the remainder of the samples were divided into a training set and a validation set at a 4:1 ratio by preserving the percentage of samples for each class. Fivefold cross-validation was performed for each test set, and the final performance was defined as the average of the performance over 10 iterations [31]. The combination set of clinical data and methods 1 and 3 for laboratory data (total, 105 variables) showed the best performance in cases of classifying patients [area under the curve (AUC) = 0.8350 and accuracy = 74.85%, Table 1; therefore, the combination set was used as an input variable for model training. Note: method 1 = average value of each laboratory code during the follow-up period; method 2 = the first value of each laboratory code when T2DM was diagnosed initially; method 3 = the pattern of laboratory code changes (−1, 0, or 1), Abbreviations: AUC = area under the curve.
The DSPN predictor model was trained with the input variables identified above. The model performance was tested with the same method used when identifying the input variables. Three ML algorithms were used: XGBoost (XGB) [32], support vector machine (SVM) [33], and random forest (RF) [23], which were used alone or in combinations of two or more, that is, an ensemble of models for improvement of the model performance by fusion of the contents learned by different models and reduction of overfitting problems [34]. Among the various methods, the model averaging method for averaging the predicted values of several models was used in this work. AUC, accuracy, sensitivity, and specificity were used as performance metrics.
Finally, the feature importance of the best model among 7 models (XGB, SVM, RF, ensemble of XGB and SVM, ensemble of XGB & RF, ensemble of SVM and RF and ensemble of XGB and SVM and RF) was extracted from each model. If the best model was an ensemble of more than two models, the average feature importance obtained from each model was used as the feature importance of the ensemble model. Next, the models were retrained and evaluated with input features by adding features one by one, from the most to the least important. This was done to select the best set of features for DSPN prediction based on feature importance, and the performance was better when using the top 69 features for AUC and top 38 features for accuracy rather than all 105 features.

Statistics
To compare the predictability of ML results, traditional statistical methods were also carried out. All statistical analyses were performed with SPSS 26 (IBM, Armonk, NY, USA). The Shapiro-Wilk test was performed to assess the normal distribution of all quantified histological and functional data from each group. Categorical parameters were compared by likelihood ratio, and numerical parameters among groups were compared by one-way analysis of variance (ANOVA) and the Games-Howell post hoc test. Logistic regression was performed using statistically significant parameters and parameters that were identified to be important in previous studies, and the AUC, accuracy, sensitivity, and specificity were analyzed. p-values less than 0.05 were considered to indicate statistical significance.

Baseline Characteristics among the Four Groups
When comparing baseline characteristics among the four groups, disease duration was significantly longer in the confirmed group than in the normal and possible groups (4543.18 ± 2849.75 days and 4464.03 ± 2934.87 days vs. 5686.67 ± 3648.57 days and in the normal, possible, and confirmed groups, respectively), and height was higher in the confirmed group than in the normal group (1.61 ± 0.09 m vs. 1.64 ± 0.09 m in the normal and confirmed groups, respectively). BMI and the initial values of BST and HbA1c were also different between the confirmed group and normal group and between the confirmed group and possible group ( Table 2). The incidence of diabetic retinopathy was higher in the confirmed group (51.6%) than in the other groups (23.1-28.6%). Age; sex; weight; incidence of hypertension and dyslipidemia; smoking habit; past medical history of coronary artery disease, cerebrovascular disease, and stroke; and number of subjects who received diabetes education were not different among the groups ( Table 2). Medications for diabetes control were different among groups; metformin (89.2-94.5%), sulfonylureas (68.1-68.8%), dipeptidyl peptidase-4 inhibitors (66.7-71.4%), and sodium-glucose cotransporter-2 inhibitors (17.2-20.9%) were used by a higher proportion of subjects in the normal and possible groups, whereas the proportion of subjects in the confirmed group who used insulin (65.6%) was higher than that in other groups (Table 2).

Identification of an Appropriate Classification for Prediction Using Machine Learning Analysis
Using ML algorithms, four groups of normal (A), possible (B), probable (C), and confirmed (D) samples were analyzed with various combinations. When comparing all groups separately (A vs. B vs. C vs. D) using the combined analysis of XGB and RF, the AUC was 0.8546, and the accuracy was 60.85% (Table 3). One of the classifications set to three groups (combination of A and B vs. C vs. D) showed the highest AUC value (0.8925) using the same analysis (XGB + RF); however, this classification was not appropriate because the number of group C patients was small (n = 13), which can result in imbalanced results [35]. When looking at the classification that combined group C with other groups, rather than alone, the classification with the combination of A, B and C vs. D showed a higher value of AUC (0.8250) than the other classifications and the highest value of accuracy (74.47%) (Table 3). Therefore, we performed all ML analyses and statistics based on this classification (A + B + C vs. D).

Identification of an Appropriate ML Algorithm for the Prediction of DSPN and Analysis of Predictive Values
When we compared various ML techniques (XGB, SVM, RF, and their combinations), RF showed the best AUC (0.8250) and accuracy (74.47%), and the sensitivity and specificity were also higher (0.7940 and 0.6720, respectively) than those of any other single algorithm or their combination (Table 4). Logistic regression analysis was performed to compare the combination of normal, possible, and probable groups with the confirmed group using meaningful parameters of the following basic characteristics and laboratory data: disease duration, initial value of HbA1c, DM retinopathy, family history of DM, use of metformin and insulin, serum levels of glucose, HDL cholesterol, albumin, and creatinine. The results of logistic regression analysis showed lower AUC (0.6620) and specificity (0.3519) values than RF. The receiver operating characteristic (ROC) curves of each ML algorithm and logistic regression analysis are shown in Figure 3. The AUC of RF was the highest (0.8250) among the 7 ML models, as described earlier, whereas the AUC of logistic regression was the lowest AUC value (0.6620).

Development of a Decision-Making Model Using Influential Features from the RF Algorithm
RF analysis using the classification of the combination of the normal, possible, and

Development of a Decision-Making Model Using Influential Features from the RF Algorithm
RF analysis using the classification of the combination of the normal, possible, and probable groups versus the confirmed group was used to derive influential features, which consisted of clinical data and methods 1 and 3 for laboratory data. When these features are accumulated in the order of the importance score, the AUC and accuracy increase and then reach a maximum value at a certain moment (Figure 4a,b). In the case of AUC, the maximum value was reached when the number of parameters reached 69 (0.8302), and in the case of accuracy, the maximum value was reached when the number of parameters was 38 (76.17%) (Figure 4a,b). From this classification, the average value of HbA1c was identified as the first single discriminator for group determination between the combination of the normal, possible, and probable groups and the confirmed group (Figure 4c). The top 69 influential features are shown in Table 5. The average serum glucose level during the follow-up period was the most important feature (importance score = 0.997768) for determining the group in the classification, and the average values of the International Federation of Clinical Chemistry (IFCC; 0.794161), HbA1c (0.789265), and albumin levels (0.731579) during the follow-up period are shown in order of importance score (Table 5). Table 5. Top 69 influential features in the classification of the combination of the normal, possible, and probable groups versus the confirmed group.

Ranking
Feature Name Importance Score Ranking Feature Name Importance Score

ML Analysis of the Confirmed Group to Identify Demyelinated and Mixed Types of DSPN
We compared the demyelinated subgroup with the mixed subgroup, as shown in electrophysiological studies of the confirmed group, using various ML algorithms and logistic regression analysis (Table 6). ML analysis revealed that the combination of XGB and SVM models showed the highest AUC and accuracy values of 0.5698 and 67.78%, respectively, whereas the statistical method using logistic regression showed a higher AUC value (0.6350). However, the overall AUC values of all ML algorithms and logistic regression analysis were much lower than the AUC value (0.8250) when RF was used to compare the combination of the normal, possible, and probable groups versus the confirmed group, and the specificity was quite low (0 and 0.3889 for RF and logistic regression, respectively) to predict the two subgroups within the confirmed group (Table 6).

Discussion
Interest in machine learning algorithms is widely increasing in the medical field because they can be used to predict disease development and generate semantic interpretations [36]. In the field of endocrinology, the prediction of diabetes is expected to be very useful for preventing disease progression and complications [37]. In this study, we have performed conventional statistics, as well as various ML algorithms to compare predictive power expressed in AUC and accuracy. Logistic regression analysis, a traditional statistical method, has an obvious limitation compared to the ML analysis. Only a small number of clinical and laboratory data (9 variables among over 400 data) were used during the statistical processing, which inevitably resulted in poor AUC whereas ML analysis could include over 100 meaningful data. Classical statistics usually draw population inferences, but become less precise when input variables that exceed the number of subjects, therefore appropriate ML method can help overcome this limitation [22].
As in all other fields, for the results of ML analysis to be more accurate, the input data must have extensive and accurate information. Laboratory data are usually obtained numerous times for a single subject during the follow-up period, and effective processing of meaningful data can have a significant impact on the establishment of predictive models. In this study, we tried various methods to optimize input data during the preprocessing step, especially for standardization of laboratory tests conducted at various time points. First, from the 432 types of laboratory data received for all patients, only 39 datapoints repeatedly obtained for more than half of all patients were filtered out. Then, depending on the timing of the laboratory data received, data were classified into average, initial, and change patterns of each value, and we found that average and changed patterns were meaningful parameters for ML analysis. Through these preprocesses, we are confident that we have increased the reliability of laboratory data and created a more accurate predictive model. When compared to previous studies that made predictive models of DSPN using ML algorithms in diabetic patients (Table 7), they did not explain what time point was used or whether there was any consideration of the amount of change in the laboratory data in addition to the data imputation process that handles missing data [24,25,27,38]. In addition, they did not provide any diagnostic tools, such as decision tree or nomogram, except Dagliati et al. [25]. Various criteria for defining DSPN have been developed, and many of them have been designed to classify the severity of DSPN based on clinical signs and symptoms alone [39] or in combination with physical examination [40,41] or electrophysiological findings [7,10]. Neurological signs, especially sensory abnormalities, are sensitive and specific findings for diagnosing DSPN and have been correlated with electrophysiological findings in previous studies [12,42,43]; however, we found that clinical data alone, which was categorized as normal, possible and probable groups defined in a previous study [7], was not effective in predicting DSPN in T2DM patients. Other studies have revealed that clinical symptoms and signs are too variable and inaccurate [44] and do not correlate well with the development of pathophysiological changes in the peripheral nervous system [13]. On the basis of our results, we confirmed that severity grading based on clinical symptoms and signs is not helpful and that electrophysiological assessment is essential in predicting DSPN. However, small fiber involvement, which is frequently occurs in early DSPN, is not identified by conventional NCS. Therefore, more specialized diagnostic tools such as quantitative sensory testing, skin biopsy, and corneal confocal microscopy are needed to identify small fiber damage [45,46].
We failed to classify the demyelinated and mixed types in the confirmed group in this study. Axonal involvement is frequently observed in DSPN, as is demyelination [17], and even axonal loss, which precedes demyelination, in sural nerves or plantar nerves of DSPN patients might be a primary finding [47,48]. Electrophysiological analysis, which shows decreased conduction velocity of sensory and motor nerves, decreased compound muscle action potential, and prolonged latency of F-wave, is considered to be highly sensitive for early diagnosis of DSPN [16,49], but NSC cannot be used to assess therapeutic effects in diabetic patients [49]. Electromyography can be useful for detecting abnormal spontaneous activities in distal muscles in moderate to severe DSPN [50], although this test is also useful for ruling out other neuropathies, such as radiculopathies, mononeuropathies, or myopathies. In this study, we could not find axonal involvement without demyelination within DSPN patients. In T2DM, segmental demyelination is prominent with a milder axonal involvement whereas axonal loss is more severe in T1DM [51,52]. Initially, we considered abnormal electromyographic findings with abnormal NCS (mixed type) to be advanced or severe type DSPN, and diabetic patients with mixed type DSPN might show abnormal clinical and laboratory findings more frequently than those with demyelinated type DSPN. However, ML analysis and logistic regression did not effectively suggest any difference between the demyelinated and mixed types. Therefore, electrophysiological analysis is necessary to differentiate these two types of diabetic patients.
Numerous ML algorithms have been used to predict DM and diabetic complications such as retinopathy, nephropathy, foot ulceration and DSPN [24][25][26][27][28][29]. XGB is a scalable end-to-end tree boosting system [32] and is more suitable for small sample sizes unless the data are not highly dispersed when predicting glucose variability in T2DM patients [53]. SVM was used for microarray or high-dimensional data and is suitable for predicting DSPN in DM patients with a clinical data-based classification [24] and distinguishing retinopathy between diabetic patients and normal controls [26]. RF is an ensemble of decision trees and can minimize the individual error of trees [23]. RF has shown good performance in predicting the development and classification of DSPN based on clinical symptoms and examinations of type 1 diabetic patients [29]. Logistic regression analysis is a common statistical method used to develop a model for binary outcomes in the medical field [54] and can also be used as a supervised learning technique in ML methods. Even though various ML algorithms have been successfully developed as predictive models for the purpose of preventing the occurrence of diseases or their complications, some recent studies have shown that logistic regression has similar results to ML analysis [55,56], and attempts to combine logistic regression and ML methods also appear to enhance the performance of statistical methods in an automated manner [57]. In our study, the AUC of RF was superior to that of logistic regression when subjects were classified into two groups: confirmed vs. other combinations (Table 4), but the AUC of logistic regression was higher than that of ML algorithms for comparison between the demyelinated and mixed subgroups within the confirmed group ( Table 6). The development of proper hybrid models for statistical and ML algorithms might increase the power of DSPN prediction in future studies.
In previous studies, numerous predisposing factors have been associated with DSPN in diabetic patients, particularly, duration of diabetes and HbA1c in T2DM patients [21,58]; moreover, old age, increased height, obesity, higher body mass index, poor glucose control, alcohol abuse, smoking, hypertension, cardiovascular disease, low level of HDL, dyslipidemia, hypertriglyceridemia, and microalbuminuria have also been shown to be risk factors in previous studies [18][19][20][21][58][59][60][61]. We found that the average values of numerous laboratory datapoints during the follow-up period (serum glucose, IFCC, HbA1c, albumin, and differential counts of lymphocytes and neutrophils) were important predisposing factors, as were clinical data such as height and disease duration ( Table 5). The albumin has important antioxidant and anti-inflammatory properties, and the lower level of serum albumin was associated with the prevalence of DSPN or peripheral nerve dysfunctions in T2DM patients in previous studies [62,63] In our study, average value of HbA1c is the most sensitive node of a decision tree among the influence features, and average differential counts of lymphocytes and neutrophils are the second node (Figure 4c). Although there is no standardized decision-making algorithm for DSPN diagnosis, HbA1c qualifies as an important diagnostic criterion for DPSN because HbA1c a major risk factor for microvascular complications and closely associated with DSPN in T2DM [64] The neutrophil-lymphocyte ratio is an inflammatory marker and an important factor that predicts cardiovascular disease [65] and foot ulcer infection [66] in diabetic patients. Neutrophil level was also the most sensitive node for decision making of DPSN prediction in a previous study [67], and higher neutrophil-lymphocyte ratio might be related to chronic inflammatory process and increase the risk of DSPN [68].
In this study, we analyzed a small-sized sample, especially the probable group (n = 13), which might cause problems for pattern recognition and poor accuracy [69]. Many studies in the medical field often have only a small number of patients. In this study, we tried to increase the accuracy by dividing the patients into ten groups for use as a test set and a tenfold stratified cross validation set to compensate for the small sample size [31], but a more accurate prediction might be achieved with a larger number of diabetic patients. We further plan to perform ML analysis to predict various complications in diabetic patients in a prospective multicenter study and develop an application attached to an existing electronic health record system for easier transfer of patient data that can assist in predicting complications in diabetic patients. In addition, it was difficult to use deep learning model because insufficient sample size can lead to overfitting. If sufficient data is accumulated, it is possible to build deep learning model using time-series laboratory data or to apply a method of transfer learning with DSPN patient using pre-trained models for all diabetic patients.

Conclusions
In this study, we revealed that the ML algorithms, whose AUC values were superior to logistic regression, can be applied to type 2 DM patients to predict DSPN and that the classification depending only on clinical symptoms and signs of suspected DSPN was not appropriate for the application of ML algorithms to develop prediction models. In addition, ML algorithms cannot predict the type of electrophysiological features in DSPN, namely, demyelinated and mixed subgroups. We concluded that ML techniques, especially RF, can predict DSPN effectively when comparing the combination of the normal, possible, and probable groups with the confirmed group of DM patients and that electrophysiological analysis is important for identifying DSPN.  Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available from the corresponding author upon reasonable request.