Predicting CTS Diagnosis and Prognosis Based on Machine Learning Techniques

Carpal tunnel syndrome (CTS) is a clinical disease that occurs due to compression of the median nerve in the carpal tunnel. The determination of the severity of carpal tunnel syndrome is essential to provide appropriate therapeutic interventions. Machine learning (ML)-based modeling can be used to classify diseases, make decisions, and create new therapeutic interventions. It is also used in medical research to implement predictive models. However, despite the growth in medical research based on ML and Deep Learning (DL), CTS research is still relatively scarce. While a few studies have developed models to predict diagnosis of CTS, no ML model has been presented to classify the severity of CTS based on comprehensive clinical data. Therefore, this study developed new classification models for determining CTS severity using ML algorithms. This study included 80 patients with other diseases that have an overlap in symptoms with CTS, such as cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy, and 80 CTS patients who underwent ultrasonography (US)-guided median nerve hydrodissection. CTS severity was classified into mild, moderate, and severe grades. In our study, we aggregated the data from CTS patients and patients with other diseases that have an overlap in symptoms with CTS, such as cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy. The dataset was randomly split into training and test data, at 70% and 30%, respectively. The proposed model achieved promising results of 0.955%, 0.963%, and 0.919% in terms of classification accuracy, precision, and recall, respectively. In addition, we developed a machine learning model that predicts the probability of a patient improving after the hydro-dissection injection process based on the aggregated data after three different months (one, three, and six). The proposed model achieved accuracy after six months of 0.912%, after three months of 0.901%, and after one month 0.877%. The overall performance for predicting the prognosis after six months outperforms the prediction after one and three months. We utilized statistics tests (significance test, Spearman’s correlation test, and two-way ANOVA test) to determine the effect of injection process in CTS treatment. Our data-driven decision support tools can be used to help determine which patients to operate on in order to avoid the associated risks and expenses of surgery.


Overview
Carpal tunnel consists of transverse carpal ligaments and carpal bones. The tunnel is comprised of nine flexor tendons and the median nerve. CTS is the most common entrapment neuropathy and involves pressuring of the median nerve within the carpal tunnel, depending on the severity and duration of neural compression [1]. The impact of carpal tunnel syndrome on productivity, function, and quality of life, as well as the significant costs associated with its management, are associated with a significant socioeconomic burden [2]. The prevalence and severity of CTS increase with age, and women are three times more likely to have CTS than men. Others risk factors include a family history of CTS and a personal history of diabetes, obesity, hypothyroidism, pregnancy, and rheumatoid arthritis. The risk of CTS is also significantly increased by work-related activities that require a high degree of repetition and force or using hand-operated vibratory tools [3][4][5][6].

Problem Statement
The diagnosis of CTS requires listening to the patient's description of the characteristic timing and distribution of the symptoms and examining the hands to find obvious signs. Electrophysiological studies (EPS) are among the para-clinical diagnostic techniques that are most effective for identifying CTS. They are used by clinicians to group patients based on the severity of their conditions and the need for post-operative care. While NCS has been the gold standard for the diagnosis of CTS, there are still some limitations, because some patients with clinically diagnosed CTS show normal findings [7][8][9]. False-negative results can still happen because tests focus primarily on large, myelinated fibers rather than the small fibers that are responsible for pain. Therefore, clinical history, physical examination, and electrophysiological studies are used to make diagnoses [10]. Imaging techniques can reveal information about the median nerve's morphology and any local compressive causes. A growing number of studies use ultrasound to evaluate cross-sectional area (CSA) and median nerve compression at the carpal tunnel. It is currently the most accepted method for diagnosing CTS [11] and is economical compared to other imaging methods, such as magnetic resonance imaging (MRI) [12][13][14][15].
A scoring tool has been developed to check for CTS in asymptomatic people, but the claims that the symptoms are only potential ones are called into question. The Michigan Hand Outcomes Questionnaire (MHQ) and the Boston Carpal Tunnel Questionnaire (BCTQ) are two other questionnaires that have been developed to quantify CTS symptoms based on patient history following diagnosis and treatment [16]. Several studies have utilized such ML models to detect CTS diagnosis as well as monitor progress. They utilize various types of data, such as MRI images, CT images, and clinical and historical data. For example, ref. [17] utilized a DL model to detect ultrasound (US) image features to make CTS diagnoses. The highest performance was 0.95%, with 0.94 in terms of accuracy and recall. Other studies [18,19] utilized a convolutional neural network (CNN) model to detect CTS diagnosis based on CT scan images. The developed model was validated using CT images aggregated from 53 patients and achieved an accuracy of 0.94%. The same model was used in [20][21][22][23], showing promising results. However, these studies have several limitations, including the following: (1) They depended on MRI and CT scan images, which are considered costly and may not be available for diagnosis. (2) The studies did not consider the overlap of CTS symptoms with other diseases, including cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy. (3) Several studies did not consider clinical, personal, and historical data, which have a significant effect in disease diagnosis as well as treatment. In this study, we utilized the ML model to perform CTS diagnosis based on patient historical, personal, and clinical examination data and US images and electrodiagnostics, and then predicted the probability of the patient improving [24][25][26].

Paper Contribution
The main contribution of our study can be summarized in the following points.
(1) Aggregate data about 80 CTS patients include 40 left hands and 40 right hands with different health status (mild, moderate, and severe) and 80 non-CTS patients with overlapping disease symptoms, including cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy. (2) We build a Machine Learning model (bagging using random forest) that can distinguish between CTS and non-CTS patients based on BCTQ data and nerve conduction, and compare the results with several traditional machine learning models. (3) We track and monitor the results after 1, 3, and 6 months through repeating the clinical examination tests and questionnaire. (4) We build a machine learning model that predicts the probability of the patient improving after the hydro-dissection injection process based on the aggregated data after 3 different periods (1, 3, and 6 months). (5) We use statistical tests such as ANOVA, t-test, and z-test to distinguish between patient status before and after the injection process and specify the features that have a significant impact on the probability of improvement.

Paper Organization
The rest of the paper is organized as follows. Section 2 details the literature review. Section 3 describes the data set that is used in the study. Section 4 clarifies the proposed work. The results and discussion are presented in Section 5. The paper concludes and future work is mentioned in Section 6.

Related Work
In this section, we discuss some recent research studies based on artificial intelligence to classify and diagnose CTS. Several attempts have been made to enhance the performance of CTS patients and to predict symptom improvement by utilizing machine learning and deep learning [27]. In Dougho Park et al. [28], seven ML algorithms (Neural Network (NN), Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Classification And Regression Tree, Random Forest (RF), Stochastic Gradient Boosting (SGB), and eXtreme Gradient Boosting (XGB)) were estimated based on 1037 CTS hands with 11 variables each and were retroactively analyzed. In these datasets, CTS was corroborated, and its sharpness was categorized into mild, moderate, and severe by using electrodiagnosis in these datasets. XGB had the highest accuracy in multiclass classification, and its accuracy was 76.6% in test prediction as well as 76.1% during training. Konstantinos I. Tsamis et al. [29] used five machine learning techniques (Logistic Regression (LR), SVM, KNN, Decision Tree (DT), and Naive-Bayes (NB)) to examine the feasibility of automatically identifying median nerve mononeuropathy using conventional electrodiagnostic criteria utilized in clinical practice. Based on the NCS signal data analysis gathered from 38 volunteers, the outcomes of the classifiers were verified through neurophysiological and clinical diagnosis. With the most accurate classifier (SVM), automated classification between patients and controls achieved an accuracy of 0.9513 compared to NCS and 0.8906 compared to clinical diagnosis. The results show the ability of automated identification of carpal tunnel syndrome and can be employed in decision making, ultimately eliminating human error. US is the most popular imaging modality used to diagnose CTS to make up for the lack of nerve electrical inspection. Recently, using medical images, artificial intelligence algorithms have been used to accurately and human-error-free diagnose musculoskeletal diseases. For example, You-Wei Wang et al. [30] applied their study to 50 patients (8 healthy patients and 42 CTS patients), each having two cases (right and left hand). A Deep Similarity Learning (DeepSL) model known as MNT-DeepSL was proposed by the authors. The model's formation was used to trace the problem of median nerve location on US images. The tracing of the median nerve is connected to image representation, input decision rules, a deep learning model called ResNet, and a different layer. The accuracy of the model's performance was 0.9, which allows the physiatrist to locate the median nerve in the continuous US images. A deep and transfer learning system was developed by Issei Shinohara et al. [31] using US images to diagnose CTS accurately based on 60 cases (30 healthy patients and 42 CTS patients) Three DL models (ResNet50, MobileNet_v2, and EfficientNet) popularly used for medical image classification were selected for this study. The accuracy of ResNet-50 and MobileNet_v2 was the highest at 0.90, with a precision of 0.86, recall of 1.00 (all models), and 0.92 for F-measure. R.T. Festen et al. [32] rated two-dimensional median nerve characteristics and mobility accurately based on US images using a medium-sized dataset and U-Net model. Hafane et al. [33] combined a CNN with the probabilistic gradient vector flow (PGVF) method to identify the median nerve. CNN identifies the region of interest (ROI) around the nerve and a dataset collected from US images elicited from 10 videos. The results show a median Dice Similarity Coefficient (DSC) of 0.85. A prediction model was created and verified by Hoogendam L. et al. [20] for the likelihood of patient-reported symptom improvement 6 months following carpal tunnel release (CTR). One statistical algorithm and two machine learning algorithms (LR, RF, and gradient boosting machines) were taken into consideration for training predictors based on a cohort of 2119 patients who underwent a mini-open CTR and completed the BCTQ preoperatively and 6 months postoperatively [34]. In the validation data set, a gradient boosting machine with five predictors had the best calibration and discriminative performance (area under the curve). It had a satisfactory calibration, a sensitivity of 0.77, a specificity of 0.55, and an area under the curve of 0.723. The positive predictive value was 0.50, and the negative predictive value was 0.81. Bowman A et al. [35] used 35 neurons in the hidden layer, a sigmoidal transfer function, and conjugate gradient backpropagation with Powell-Beale restarts. The average AUC for this combination was 0.763 (95% CI 0.758-0.769).
Fariborz Faeghi et al. [12] proposed a diagnostic system for CTS based on radiomic features extracted from MNs in ultrasound images (so-called ultrasomal features). This is known as a Computer Aided Diagnostic (CAD) system, and the radiologists' performance was evaluated. A CAD system can help radiologists accurately diagnose CTS. An SVM classifier was used, and an optimum accuracy of 90.1% was achieved. Haiying Zhou et al. [21] proposed a deep learning framework for carpal tunnel segmentation using MR images, known as Deep CTS. Deep CTS can effectively segment the CTS area and correct the intersection when combining the results. They applied their study to 333 CTS images and achieved an accuracy of 0.63. Conrad J. Harrison et al. [36] developed flowcharts with a machine learning technique entitled chi-squared automatic interaction detection, which could enable clinicians and patients to understand the chances of a patient improving with surgery. They also developed ML algorithms using QuickDASH response data from a regional database and achieved an accuracy of 0.72 and 0.76. A.A. Ardakani et al. [37] used a CTS support vector machine (SVM) and convolutional neural network (CNN). A total of 200 wristbands were included, including 100 CTS wristbands and 100 control wristbands. CNN had the best performance, with an ACC of 0.970, while SVM achieved an ACC of 0.925 in the testing dataset to diagnose CTS.
Despite their adequate performance, these studies have different limitations, which can be summarized as follows.
(1) the building of CTS classification models based on one type of data, which affects model performance. Therefore, in our study, we decided to combine historical data, medical examination data, and NCS and CSA data to provide a robust and accurate model, depending on an ensemble to effectively address various data.

Dataset Collection
Between April 2019 and April 2020, the dataset was obtained retrospectively from the Neurology department at Kaferelshikh University Hospitals in Egypt. In all, 160 patients took part in this study, divided into two groups: (1) 80 patients diagnosed with CTS via ultrasound (US) guided median nerve hydro-dissection, and (2) 80 patients with other diseases that have an overlap in symptoms with CTS, such as cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy, from the Neurology department. The current study was submitted to the IRB at Kaferelshikh University's Faculty of Medicine for approval. All patients were assured confidentiality and personal privacy throughout the trial. Patients felt free to leave the research at any time without consequence. The information acquired was not and will not be used for any other purpose.

Study Cohorts
The dataset for CTS patients was aggregated using the following inclusion criteria: (I) patients between the ages of 20 and 60; (II) CTS manifestations; (III) NCS show delayed latency of the sensory or motor conduction of the median nerve; and (IV) patients did not respond to medical treatment at least three months after the onset of symptoms. Data from healthy normal volunteers was also gathered from Kaferelshikh university hospitals' inpatient and outpatient clinics, which were matched for age and gender. Pregnant women, as well as those with thoracic outlet or brachial plexopathy, were excluded from the study. All of the patients volunteered to take part in the trial.

Aggregated Features
For each patient, the following data were aggregated: Firstly, personal, and historical data: The patient's historical data, including personal data (i.e., age, gender, BMI, occupation, marital status, special habits, family history of similar conditions, and previous surgical and medical problems).
Secondly, medical questionnaire: All patients were evaluated using a computerized CTS sheet, which included all characteristics from the BCTQ (Appendix A). The BCTQ is a patient-reported questionnaire that is used to assess the intensity of symptoms as well as the overall performance of the patient's functions. It discusses the most-often-used tools for diagnosing and evaluating CTS. BCTQ has two models: symptom severity scale (SSS) and functional status scale (FSS). Both modules work independently and can be used either together or separately.
SSS is an eleven-question test that evaluates a variety of criteria, including pain, paresthesia, numbness, weakness, nocturnal symptoms, tingling, and motor skill problems. FSS is an eight-question questionnaire that assesses the overlap between a patient's symptoms and functions through questions on activities such as reading, bathing, carrying a grocery basket, holding a book, and so on. The BCTQ's major strength is how quick, simple, and easy it is to administer. In our dataset, the questionnaire is repeated for all patients after one, three, and six months of treatment to record and track patient status. It should be noted that the score only considers the patient's symptoms from the previous two weeks.
Patients' medical records (pain, paresthesia, numbness, weakness, and nocturnal symptoms) and BCTQ functional status (writing, buttoning, holding, grasping, opening jars, household tasks, bathing, and dressing) as well as pre-intervention and post-intervention data at one, three, and six months were used.
Thirdly, Ultrasonographic examination. A single sonographer is used to determine the CSA. It is calculated on the US machine using the tracing feature (in mm 2 at the distal wrist crease), then directly traced around the inner border of the epineurium with no weaving between each fascicle [38]. This method was shown to be more accurate than the ellipsoid method. CTS is defined as a median nerve area more than 9 mm 2 . In this study, we followed the classifications of El Miedany et al. [39] of the CTS severity scale according to the CSA Diagnostics 2023, 13, 492 6 of 18 (up to 13.0 mm 2 for mild, 13.0 to 15.0 mm 2 for moderate, and more than 15.0 mm 2 for severe) [2]. All measurements were taken three times, with the average result used for statistical analysis. Figure 1 shows the examination was repeated after 1, 3, and 6 months to track the progress of the treatment. All individuals in this trial received an injection of (1 mL lidocaine, 2 mL [8 mg]). vention data at one, three, and six months were used.
Thirdly, Ultrasonographic examination. A single sonographer is used to determine the CSA. It is calculated on the US machine using the tracing feature (in mm 2 at the distal wrist crease), then directly traced around the inner border of the epineurium with no weaving between each fascicle [38]. This method was shown to be more accurate than the ellipsoid method. CTS is defined as a median nerve area more than 9 mm 2 . In this study, we followed the classifications of El Miedany et al. [39] of the CTS severity scale according to the CSA (up to 13.0 mm 2 for mild, 13.0 to 15.0 mm 2 for moderate, and more than 15.0 mm 2 for severe) [2]. All measurements were taken three times, with the average result used for statistical analysis. Figure 1 shows the examination was repeated after 1, 3, and 6 months to track the progress of the treatment. All individuals in this trial received an injection of (1 mL lidocaine, 2 mL [8 mg]). Fourthly, NCS was performed by using an NCS Apparatus (NIHON KOHDIN), model MEB-9400K, Serial number SNI-00833, in the Neurology department, Kaferelshikh university hospitals. It was conducted for all patients in our study. Abnormalities on electrophysiological testing were considered the first criterion standard for CTS diagnosis. The cut-off point utilized in the NCS was the median nerve distal sensory latency of 3.5 milliseconds. Distal motor latency (DML) was increased by 90%, to >4.5 milliseconds onset latency [40]. Figure 2 shows normal ana abnormal of NCS for CTS and control subjects.   Fourthly, NCS was performed by using an NCS Apparatus (NIHON KOHDIN), model MEB-9400K, Serial number SNI-00833, in the Neurology department, Kaferelshikh university hospitals. It was conducted for all patients in our study. Abnormalities on electrophysiological testing were considered the first criterion standard for CTS diagnosis. The cut-off point utilized in the NCS was the median nerve distal sensory latency of 3.5 milliseconds. Distal motor latency (DML) was increased by 90%, to >4.5 milliseconds onset latency [40]. Figure 2 shows normal ana abnormal of NCS for CTS and control subjects.
jars, household tasks, bathing, and dressing) as well as pre-intervention and post-inter-vention data at one, three, and six months were used.
Thirdly, Ultrasonographic examination. A single sonographer is used to determine the CSA. It is calculated on the US machine using the tracing feature (in mm 2 at the distal wrist crease), then directly traced around the inner border of the epineurium with no weaving between each fascicle [38]. This method was shown to be more accurate than the ellipsoid method. CTS is defined as a median nerve area more than 9 mm 2 . In this study, we followed the classifications of El Miedany et al. [39] of the CTS severity scale according to the CSA (up to 13.0 mm 2 for mild, 13.0 to 15.0 mm 2 for moderate, and more than 15.0 mm 2 for severe) [2]. All measurements were taken three times, with the average result used for statistical analysis. Figure 1 shows the examination was repeated after 1, 3, and 6 months to track the progress of the treatment. All individuals in this trial received an injection of (1 mL lidocaine, 2 mL [8 mg]). Fourthly, NCS was performed by using an NCS Apparatus (NIHON KOHDIN), model MEB-9400K, Serial number SNI-00833, in the Neurology department, Kaferelshikh university hospitals. It was conducted for all patients in our study. Abnormalities on electrophysiological testing were considered the first criterion standard for CTS diagnosis. The cut-off point utilized in the NCS was the median nerve distal sensory latency of 3.5 milliseconds. Distal motor latency (DML) was increased by 90%, to >4.5 milliseconds onset latency [40]. Figure 2 shows normal ana abnormal of NCS for CTS and control subjects.

Outlier Detection
Outlier detection is the technique of identifying outliers among normal items. Performing outlier detection during data preparation is considered a critical step, since it has a significant impact on the classification and clustering models' performance. Several techniques are utilized to solve the outlier problem; statistical techniques include proximity-based models, distance-based methods, etc. Despite the good performance of Diagnostics 2023, 13, 492 7 of 18 the above-mentioned methods, in this study we choose to depend on our medical expert to specify and resolve data outliers.

Data Imputation
Missing values is a very common problem, especially with medical data. It could occur due to corruption or collection error. Missing values could negatively affect the classifier performance in terms of bias affect. Several basic methods are used to fill numeric values, such as mean, max, min and others, to fill categorical values such as the most frequent item [41]. In our study we have a small number of missing values in each column, ranging from (2-5) for each column. In order to gain a high accuracy in data after imputation, we choose a variable strategy for handling missing values, known as multivariate imputation by chained equations (MICE) [42,43]. This approach generates n full datasets by substituting n distinct values for missing variables. It then analyses n datasets and pools/combines them to produce a combined resulting dataset. This technique outperforms single imputation methods, but it requires more computational power.

Data Scaling
The goal of data scaling methods for ML is to identify the areas of current machine learning where scalability plays a significant role and should be properly implemented to reduce uncertainty, incorrect results, or increases in cost/processing time. This method of data scaling changes the value of any feature's smallest value to 0, while changing its largest value to 1 [44]. In this study, applied data scaling is based on the following equation:

Proposed Work Proposed Machine Learning Model
The initial stage in bagging is to generate several models with different datasets based on the bootstrap sampling technique [45][46][47]. Each set of samples includes random samples from the original data. Each training sample was generated with the same size; however, some samples repeated in several training samples, while other samples appeared just in one training sample. Thus, if the original data set has N size, the size of each generated set also has the same size.
The second stage is to generate several models by applying the same model on the generated sample sets. In RF, we select random features to construct the optimum split.
Unlike DT, which attempts to split based on the best feature to optimize errors, random forest involves random selection of features to construct the best split. On the other hand, when using bagging, DT always searches for the best feature for splitting. This leads to better correlation, where using bagging with different splitting of features leads to less correlation among subtrees. Therefore, using random forest with bagging results in constructing each tree with random samples, with each split built with a random sample of predictors. As shown in Figure 3

Results
This section is divided into two main sections: the first predicts CTS diagnosis based on several data sources in categories between CTS and non-CTS patients, and the second predicts the improvement of patient health status after the injection process after 1, 3, and 6 months.

Results
This section is divided into two main sections: The first predicts CTS diagnosis based on several data sources in categories between CTS and non-CTS patients, and the second predicts the improvement of patient health status after the injection process after 1, 3, and 6 months.

Evaluation Metrics
The evaluation metrics of the CTS detection scenario include the following: accuracy, F1-score, sensitivity, specificity, and precision. These evaluation measures were computed using the TN (true negative), TP (true positive), FN (false negative), and FP (false positive) calculations. The number of correctly categorized negative and positive instances, respec-tively, is defined as TN and TP. The quantity of incorrectly identified positive and negative instances is also defined as FN and FP, respectively

Predicting CTS Diagnosis
In this section, we use the data from the 160 patients included in this study. The patients were divided as follows: 80 patients diagnosed with CTS with different status (mild, moderate, and severe) and 80 patients without CTS, but having other diseases having an overlap with CTS. The total dataset was randomly divided into training and testing with percentages of 70% and 30%, respectively. As we can see in Table 1, the utilized models were able to differentiate between CTS and non-CTS patients with adequate performance ranging from 0.900% to 0.955%. As we can observe, LR and NB give the same performance: (ACC = 0.900, AUC = 0.924 for LR) and (ACC = 0.903, AUC = 0.921). MLP gives improved performance of about 3% in most evaluation metrics. The same applies for DT, where the proposed algorithm gives the best performance of (ACC = 0.955, AUC = 0.946). Predicting CTS prognosis will help differentiate between the CTS cases and other cases that may have similar symptoms. Figure 4 show the learning and validation curve for the proposed model.

Predicting Prognosis after One Month
In this section, we evaluate the performance of single classifiers from different types, including the linear model and statistical model, in predicting the prognosis after one month of medication. From Table 2, we can observe the following: (1) using LR gives the worst performance, achieving (p = 0.835%, R = 0.805%, F-measure = 0.824%, and ACC = 0.817%), followed by NB. (2) Using the SVC classifier gives adequate performance in terms of different evaluation metrics, achieving (p = 0.844%, R = 0.8159%, F-measure = 0.823%,

Predicting Prognosis after One Month
In this section, we evaluate the performance of single classifiers from different types, including the linear model and statistical model, in predicting the prognosis after one month of medication. From Table 2, we can observe the following: (1) using LR gives the worst performance, achieving (p = 0.835%, R = 0.805%, F-measure = 0.824%, and ACC = 0.817%), followed by NB. (2) Using the SVC classifier gives adequate performance in terms of different evaluation metrics, achieving (p = 0.844%, R = 0.8159%, F-measure = 0.823%, and ACC = 0.833%). Using the MLP classifier gives the best performance in terms of traditional machine learning. It achieves metrics (p = 0.835%, R = 0.814%, F-measure = 0.814%, and ACC = 0.827%). The proposed algorithm improves the performance by 2-5% in terms of different evaluation metrics. It achieves (p = 0.875%, R = 0.876%, F-measure = 0.864%, and ACC = 0.877%). To visualize the performance of our proposed model. Figure 5 shows the learning and validation curve in both training and testing stages.

Predicting Prognosis after Three Months
In this section, we evaluate the performance of in predicting the prognosis after three months of medication. From Table 3, we can observe the following: (1) using LR gives the worst performance, achieving (p = 0.831%, R = 0.825%, F-measure = 0.831%, and ACC = 0.825%). (2) SVC and NB give similar performances; SVC achieved 0.831%, 0.820%, 0.838%, 0.818%, and 0.866% in terms of accuracy, P, R, and F measures, and AUC, respectively; the best performance obtained from DT with max depth = 6 according to different evaluation metrics, achieving (p = 0.855%, R = 0.852%, F-measure = 0.852%, and ACC = 0.900%). The proposed algorithm outperforms other algorithms by about 3-4%. It achieves (p = 0.911%, R = 0.900%, F-measure = 0.898%, and ACC = 0.901%). The model's performance improved in predicting the prognosis in three months over predicting after one month by about 1-4% in most models. Figure 6 shows the learning and validation curve in both training and testing stages. Table 3. Results of the prognosis model after 3 months.

Predicting Prognosis after Three Months
In this section, we evaluate the performance of in predicting the prognosis after three months of medication. From Table 3, we can observe the following: (1) using LR gives the worst performance, achieving (p = 0.831%, R = 0.825%, F-measure = 0.831%, and ACC = 0.825%). (2) SVC and NB give similar performances; SVC achieved 0.831%, 0.820%, 0.838%, 0.818%, and 0.866% in terms of accuracy, P, R, and F measures, and AUC, respectively; the best performance obtained from DT with max depth = 6 according to different evaluation metrics, achieving (p = 0.855%, R = 0.852%, F-measure = 0.852%, and ACC = 0.900%). The proposed algorithm outperforms other algorithms by about 3-4%. It achieves (p = 0.911%, R = 0.900%, F-measure = 0.898%, and ACC = 0.901%). The model's performance improved in predicting the prognosis in three months over predicting after one month by about 1-4% in most models. Figure 6 shows the learning and validation curve in both training and testing stages.

Predicting Prognosis after Six Months
This section investigates the ability of our proposed ML model in predicting the prognosis of CTS after 6 months of medication. Table 4 shows the results of using traditional ML models as well as our proposed model. From that table, we can observe the following.
(1) LR and NB give similar performance in terms of different metrics; NB achieved 0.821%, 0.813, and 0.823% and LR achieved 0.823%, 0.846%, and 0.8012% in terms of accuracy, precision, and recall, respectively. The best performance was obtained from the MLP classifier with one hidden layer according to different evaluation metrics, achieving (p = 0.85%, R = 0.833%, F-measure = 0.844%, and ACC = 0.832%). The proposed algorithm outperforms other algorithms by about 3-4% (p = 0.898%, R = 0.909%, F-measure = 0.898%, and ACC = 0.912%). The overall performance for predicting the prognosis after six months outperforms the prediction after one and three months.
The reason behind that improvement is based on several points, including (1) severity of the syndrome at presentation (decreased motor amplitude); (2) the extent of thenar muscle atrophy, and the patient's primary employment (dentist, computer engineer, housewife, etc.); (3) using hormonal contraception via injection or oral means helps recurrence of entrapment. Obesity (BMI > 25), bilateral entrapment, hypothyroidism, uncontrolled diabetes, and uremia are additional systemic conditions that negatively affect the outcome of CTS treatment. Figure 7 shows the learning curve and the validation curve of the proposed model.

Model
Training Score Testing Score Accuracy Precision Recall F-Measure AUC

Predicting Prognosis after Six Months
This section investigates the ability of our proposed ML model in predicting the prognosis of CTS after 6 months of medication. Table 4 shows the results of using traditional ML models as well as our proposed model. From that table, we can observe the following. (1) LR and NB give similar performance in terms of different metrics; NB achieved 0.821%, 0.813, and 0.823% and LR achieved 0.823%, 0.846%, and 0.8012% in terms of accuracy, precision, and recall, respectively. The best performance was obtained from the MLP classifier with one hidden layer according to different evaluation metrics, achieving (p = 0.85%, R = 0.833%, F-measure = 0.844%, and ACC = 0.832%). The proposed algorithm outperforms other algorithms by about 3-4% (p = 0.898%, R = 0.909%, F-measure = 0.898%, and ACC = 0.912%). The overall performance for predicting the prognosis after six months outperforms the prediction after one and three months. The reason behind that improvement is based on several points, including (1) severity of the syndrome at presentation (decreased motor amplitude); (2) the extent of thenar muscle atrophy, and the patient's primary employment (dentist, computer engineer, housewife, etc.); (3) using hormonal contraception via injection or oral means helps recurrence of entrapment. Obesity (BMI > 25), bilateral entrapment, hypothyroidism, uncontrolled diabetes, and uremia are additional systemic conditions that negatively affect the outcome of CTS treatment. Figure 7 shows the learning curve and the validation curve of the proposed model.

Comparison with Other Work
Carpal tunnel syndrome (CTS) is a clinical disease caused by compression of the median nerve in the carpal tunnel. Identifying the severity of CTS is essential to providing appropriate therapeutic interventions. Few studies have used a private small dataset that includes patient historical data, MRI images, US images, and CT images. Some studies used SVM to classify CTS patients. For example, ref. [28] utilized XGB with sample size 1073 [254(+), 761(−)] and achieved 0.76 in terms of ACC. In [29], the authors used SVM with a sample size of 64 [46(+), 19(−)] and achieved ACC = 0.9513, but the developed model may not be robust due to the small size of the dataset.
In [12], the authors used a sample size of 122 [65(+), 57(−)] and achieved the best performance of 0.901 using SVM. Though this study achieved adequate performance in terms of ACC, it showed inconsistency between true negative (TN) and true positive (TP) values.
Our proposed model utilized bagging with RF in 160 [80(+), 80(−)], achieving 0.955 and 0.946 in terms of ACC and AUC, respectively. Table 5 details the comparison with other studies in CTS classification. There are several reasons that our proposed model is superior to the state of the art: (1) Most studies have tried to categorize between CTS patients and normal patients, which may not make sense from the medical side. In our study, we aggregate the data from CTS patients and patients with other diseases that have an overlapping symptom with CTS, such as cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy. (2) Our proposed model considers historical data, which has a significant effect in disease diagnosis. (3) Our model did not show differences between training and testing and showed adequate consistency between TP and FP, which indicates the consistently of our model. Therefore, the proposed method could be considered as a generalized and robust model that could be used as an alternative to the clinical CTS diagnosis. Table 6 shows the comparison between other studies in predicting prognosis. Very few studies are concerned with predicting improvement during the treatment process. All of these studies build their model on data after patient surgery. For example, [20] used gradient boosting to predict the probability of improvement after surgery based on the data aggregated from 2119 patients. They concluded with a model that could predict the

Comparison with Other Work
Carpal tunnel syndrome (CTS) is a clinical disease caused by compression of the median nerve in the carpal tunnel. Identifying the severity of CTS is essential to providing appropriate therapeutic interventions. Few studies have used a private small dataset that includes patient historical data, MRI images, US images, and CT images. Some studies used SVM to classify CTS patients. For example, ref. [28] utilized XGB with sample size 1073 [254(+), 761(−)] and achieved 0.76 in terms of ACC. In [29], the authors used SVM with a sample size of 64 [46(+), 19(−)] and achieved ACC = 0.9513, but the developed model may not be robust due to the small size of the dataset.
In [12], the authors used a sample size of 122 [65(+), 57(−)] and achieved the best performance of 0.901 using SVM. Though this study achieved adequate performance in terms of ACC, it showed inconsistency between true negative (TN) and true positive (TP) values.
Our proposed model utilized bagging with RF in 160 [80(+), 80(−)], achieving 0.955 and 0.946 in terms of ACC and AUC, respectively. Table 5 details the comparison with other studies in CTS classification. There are several reasons that our proposed model is superior to the state of the art: (1) Most studies have tried to categorize between CTS patients and normal patients, which may not make sense from the medical side. In our study, we aggregate the data from CTS patients and patients with other diseases that have an overlapping symptom with CTS, such as cervical radiculopathysasas, de quervian tendinopathy, and peripheral neuropathy. (2) Our proposed model considers historical data, which has a significant effect in disease diagnosis. (3) Our model did not show differences between training and testing and showed adequate consistency between TP and FP, which indicates the consistently of our model. Therefore, the proposed method could be considered as a generalized and robust model that could be used as an alternative to the clinical CTS diagnosis.  Table 6 shows the comparison between other studies in predicting prognosis. Very few studies are concerned with predicting improvement during the treatment process. All of these studies build their model on data after patient surgery. For example, [20] used gradient boosting to predict the probability of improvement after surgery based on the data aggregated from 2119 patients. They concluded with a model that could predict the progress with an AUC of 0.7229; in [36], the authors used XGB and achieved AUC = 0.791. We chose to make three models, to predict the improvement after one, three, and six months based on the data from questionnaire and the nerve condition. The proposed model achieved promising results in terms of different metrics for predicting after one month, (ACC = 0.875, AUC = 0.839), after three months (ACC = 0.901, AUC = 0.895), and after six months (ACC = V, AUC = 0.903). Accordingly, our model could be utilized to identify patients who may benefit from decompression. Our data-driven decision support tools can be used to help determine which patients to operate on, to avoid the associated risks and expenses of surgery.

Statistical Analysis
First, with regard to family history the statistical test shows that a median abnormality or prior carpal tunnel surgery had a substantial impact on CTS with a positive family history of the condition (chi square = 20.484, p < 0.001). Second, with regard to CSA with a cut-off point of 11 mm 2 , there was a statistically significant rise in CSA cases compared to controls. With a sensitivity of 95% and a specificity of 100%, this is regarded as a superb test for differentiating CTS patients from controls. Table 7 shows that there is a statistically significant interaction between the three US groups diagnosed with CTS (mild, moderate, and severe). Third, with regard to the questionnaire, we discovered a highly significant reduction in symptoms measured by the SSS, FSS, and pain analogue scale, as well as a reduction in CSA, after 1, 3, and 6 months post-injection when we assessed patient status and compared the baseline evaluation with the results after the injection process. Table 8 shows that, in terms of the medical questionnaire, there is a statistically significant interaction between the three groups of diagnosed CTS (mild, moderate, and severe).

Conclusions and Future Work
CTS can be differentiated easily from many other medical disorders based on ML; carpometacarpal arthritis of the thumb NCS shows normal values of terminal latency, conduction velocity, and amplitude. Dequervian neuropathy also shows normal parameter of NCS. Peripheral neuropathy shows abnormality within NCS according to the type of neuropathy: demyelinating or axonal neuropathy prolonged latency, decreased amplitude, and conduction velocity. Finally, the most common disorder confused with CTS is cervical radiculopathy, mainly C6 radiculopathy; this can also be easily differentiated by ML. In this study we develop the model as follows: First, we developed an ML model that could distinguish between CTS patients and non-CTS patients based on BCTQ data and nerve conduction and then compared the results with several machine learning models. Second, a prediction model was established for measuring the improvement as a result of the treatment process among CTS patients after one, three, and six months. The model was trained and validated using CTS patient data. The model displayed reasonable discriminative ability, high sensitivity, and reasonable specificity. In the future, we plan to extend our work to (1) try different hyper parameter tuning algorithms and compare the performance; (2) gather insights from different datasets to ensure comprehensiveness; (3) test our developed model using several real datasets to ensure robustness and generalization ability. Funding: This research received no external funding.

Institutional Review Board Statement:
The study did not require ethical approval. The dataset was obtained retrospectively from the Neurology department at Kaferelshikh University Hospitals in Egypt (Approval code: MKSU 50-12-25).

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available upon request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The