An Artificial Neural Network Approach and a Data Augmentation Algorithm to Systematize the Diagnosis of Deep-Vein Thrombosis by Using Wells’ Criteria

: The use of a back-propagation artiﬁcial neural network (ANN) to systematize the reliability of a Deep Vein Thrombosis (DVT) diagnostic by using Wells’ criteria is introduced herein. In this paper, a new ANN model is proposed to improve the Accuracy when dealing with a highly unbalanced dataset. To create the training dataset, a new data augmentation algorithm based on statistical data known as the prevalence of DVT of real cases reported in literature and from the public hospital is proposed. The above is used to generate one dataset of 10,000 synthetic cases. Each synthetic case has nine risk factors according to Wells’ criteria and also the use of two additional factors, such as gender and age, is proposed. According to interviews with medical specialists, a training scheme was established. In addition, a new algorithm is presented to improve the Accuracy and Sensitivity/Recall. According to the proposed algorithm, two thresholds of decision were found, the ﬁrst one is 0.484, which is to improve Accuracy. The other one is 0.138 to improve Sensitivity/Recall. The Accuracy achieved is 90.99%, which is greater than that obtained with other related machine learning methods. The proposed ANN model was validated performing the k-fold cross validation technique using a dataset with 10,000 synthetic cases. The test was performed by using 59 real cases obtained from a regional hospital, achieving an Accuracy of 98.30%.


Introduction
Venous thromboembolism (VTE), which includes deep-vein thrombosis (DVT) and pulmonary embolism (PE), in countries like the United States of America (USA) may affect up to 900,000 patients per year, with more than 300,000 deaths per year [1]. DVT is a vascular condition in which a venous thrombus breaks off and travels through the bloodstream and, if it reaches the lungs, it might cause a fatal pulmonary embolism (PE) [2][3][4]. VTE occurs for the first time in approximately 100 persons per 100,000 inhabitants per year in USA and rises exponentially from <5 cases per 100,000 persons <15 years old to 500 cases (0.5%) per 100,000 persons at 80 years of age. It is associated with substantial morbidity [4]. The activities of modern life such as trans-oceanic flights [5], the demand Data augmentation is a technique commonly used in M-L to increase the dataset that is used in the learning process. It consists of generating new cases from the original data set without altering the pattern of the data. In medical environments, it is mostly used to increase the image dataset for image based diagnosis; see, for example, [37,[55][56][57]. In [58], they argue that the medical and M-L communities are relying on the promise of artificial intelligence (AI) to transform medicine through enabling more accurate decisions and personalized treatment. However, progress is slow. Legal and ethical issues around patient data without consent and privacy is one of the limiting factors in data sharing, resulting in a significant barrier in accessing routinely collected electronic health records (EHR) by the machine learning community. Then, they proposed a novel framework for generating synthetic data that closely approximates the joint distribution of variables in an original EHR dataset, providing a readily accessible, legally and ethically appropriate solution to support more open data sharing, enabling the development of AI solutions. In addition, in [52], the authors demonstrated that it is possible to augment clinical data to improve the performance of automatic predictive systems. They introduced two methods to create synthetic clinical histories (trajectories) based on existing data; the first one extracts subsequences of trajectories to emphasize the transition in between hospital admissions; the second method benefits from the hierarchical structure of standard diagnosis codes (like ICD-9) trajectories whose characteristics resemble those of real-world clinics. In [59], they argue that M-L has made a significant impact in medicine and cancer research; however, its impact in these areas has been undeniably slower and more limited than in other application domains. A major reason for this has been the lack of availability of patient data to the broader M-L research community, in large part due to patient privacy protection concerns. High-quality, realistic, synthetic datasets can be leveraged to accelerate methodological developments in medicine. By and large, medical data are high dimensional and often categorical. These characteristics pose multiple modeling challenges. They evaluated three classes of synthetic data generation approaches; probabilistic models, classification-based imputation models, and generative adversarial neural networks. However, based on our understanding, little research has been done for data augmentation in non-image based diagnosis in hospital work. Therefore, in this paper, a new data augmentation algorithm is proposed by using a mixed patterns approach. Thus, it takes into account two patterns: (i) a dataset of 59 instances from a public hospital and (ii) the distribution of instances between classes reported in [60].
To the best of our knowledge and based on the reviewed literature, some works report the use of machine-learning techniques as a support tool for the diagnosis of thromboembolic diseases [39][40][41][42][43][44][45][46]. However, there are still open problems to be solved, such as the development of new methods and algorithms to systematize the diagnosis of DVT. In this manner, the aim of this paper is the introduction of a new prediction model by using a back-propagation neural network, based on the Wells' probabilistic method to guide the design of computer systems for support in clinical decision-making for the diagnosis of DVT in primary care. Therefore, the main contribution of this research is that physicians in a hospital's primary care can use a smart tool to improve the diagnosis in the early detection of DVT in lower limbs, the foregoing, based on an intelligent system trained with 10,000 synthetic cases, validated by using k-fold cross validation and the hold-out model. External validation using 59 real cases from a public hospital of Mexico is performed. The above developed with 11 predictors (risk factors) generated by using a new data augmentation algorithm inspired in [61][62][63][64] according to statistical data of real cases reported in [60] and from a public hospital.
The rest of the work is organized as follows: Section 2 presents the protocol currently followed by hospitals to confirm or rule out the diagnosis of DVT. Section 3 describes the proposed method based on a back-propagation type artificial neural network to obtain the Prediction Model that is the basis of a CDSS. Section 4 presents the results obtained in the evaluation of the proposed prediction model and a discussion of the results. Finally, Section 5 summarizes the conclusions of the paper.

Hospital Protocol for DVT Diagnosis
DVT is a condition that, due to the unspecific nature of its symptoms, can be confused with other illnesses. A differential diagnosis is required to confirm the existence of DVT. In many countries, including Mexico, the primary care physicians do not have the equipment or experience to perform imaging tests such as ultrasound or venography, resorting, in the best case, to using probabilistic models. The clinical diagnosis of DVT alone is unreliable; therefore, clinical probability models such as the Wells criteria [18,60,65] and the Oudega Rule [19] have been developed to guide their investigation, diagnosis, and treatment [66]. The most studied and validated model is the one suggested by Wells [67], which classifies patients into three groups according to the probability of having DVT.
To perform a good diagnosis, Wells established a number of risk factors. For this reason, to calculate the probability of suffering from DVT, the factors selected from Table 1 must be taken into consideration, and, if an alternative diagnosis is found at least as likely as DVT, two points must be subtracted from the sum [18,60,65]. Once the sum has been completed and, according to the score obtained, the likelihood of suffering of DVT is classified as low risk (−2 to 0 points), moderate risk (1 to 2 points), and high risk (3 to 8 points) [18,60,65].  [18,60,65].

Clinical Feature Score
Active cancer (patient either receiving treatment for cancer within the 1 previous 6 months or currently receiving palliative treatment) Paralysis, paresis, or recent cast immobilization of the lower extremities 1 Recently bedridden for ≥3 days, or major surgery within the previous 1 12 weeks requiring general or regional anesthesia Localised tenderness along the distribution of the deep venous system 1 Entire leg swollen 1 Calf swollen at least 3 cm larger than that on the asymptomatic side 1 (measured 10 cm below tibial tuberosity) Pitting edema confined to the symptomatic leg 1 Collateral superficial veins (non-varicose) 1 Previously documented deep vein thrombosis 1 Alternative diagnosis at least as likely as deep vein thrombosis −2 In patients with symptoms in both legs, the more symptomatic leg is used Figure 1 shows the protocol to be followed for the diagnosis of DVT [68]. Patients who have a low risk or probability of suffering from DVT have a blood test called D-dimer [45] showing high values due to protein degradation in thromboembolic states. Furthermore, some instruments provide their measurements in FEU (Fibrinogen Equivalent Units) and others in D-DU (D-dimer units) with 500 and 230 threshold values, respectively [69]. When the D-dimer is negative, the diagnosis is ruled out. However, when the D-dimer is greater than the threshold value [70], and when the calculated probability/risk is moderate to high, the recommended initial diagnostic method is a Doppler ultrasound [68,71]. Moreover, compression ultrasonography (CUS) is the first-line imaging test in the diagnostic management of suspected deep vein thrombosis (DVT) of the lower extremity [14]. Although Venography [66,67] is considered the gold standard for the diagnosis of DVT [14], its use in clinical applications is limited, mainly because it is an invasive method that requires an injection with contrast (biomaterial) to observe the displacement of blood flow in the circulatory system, which, in addition to being expensive, is risky for the patient's health, so its use in clinical applications is limited. In contrast, Doppler ultrasound [72,73] is a safe, non-invasive, easy-to-use imaging test, in addition to having a high sensitivity of 88-98% and specificity from 97-100% [74,75]. Therefore, it is one of the fundamental diagnostic methods in multiple disciplines and medical specialties [76]. Ultrasonic equipment bases its operation on the principle of the Doppler Effect to analyze blood flow, and, with it, the diagnosis of thromboembolic diseases is performed.

Proposed Method
In Mexico, it is quite difficult to have access to clinical files or statistical data on the occurrence of certain medical cases, at least not with the detail required to conduct research such as the one reported in [18,45,60,65]. Therefore, this paper proposes using a data augmentation technique [61][62][63][64] based on statistical data of real cases reported in [60] and from a public hospital, the above, with the purpose of making up a new dataset of synthetic cases represented by a matrix with 10,000 cases to be used in the training and validation of the proposed ANN model. Validation/test is performed with the well-known k-fold cross validation method. The external validation/test was performed by using historical data with 59 real cases from a public hospital. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved on 23 June 2020 by the Ethics Committee of Autonomous University of Baja California, Mexico with project identification code POSG/020-1-03. Figure 2 depicts the simplest model of the operation of an ANN, which is the perceptron developed by Frank Rosenblatt [77]. ANN is made of artificial neurons connected to one another emulating the functioning of biological neurons in the brain of a human being. A set of input values (x 1 , x 2 , . . . , x n ) are connected to an artificial neuron, which is switched on by an activation function. Each input value is assigned a weight (w); the products of all the entries and their corresponding weights are added before moving on to the activation function. Unlike the simple perceptron, the multilayer perceptron is made of one or more layers which make it capable of solving nonlinear problems [77,78]. However, there are some algorithms to design the architecture of an ANN, such as the geometric pyramid rule [78,79], the rule based on the mean square error (MSE) [80], and the application of evolutionary algorithms [81]. Actually, a successful ANN configuration depends on the experience of the designer and evaluating different architectures. The number of neurons in the hidden layers used will determine the Accuracy of the ANN model. To select the best ANN model, 100 different ANN architectures were considered and used the rule based on the MSE [80]. Finally, the architecture with the best Accuracy is depicted in Figure 3. ANN are machine-learning algorithms designed to analyze data without a pre-existing hypothesis as to any associations that may exist [45].  Hidden layers

Artificial Neural Network
Output layer Threshold funtion Output Decision Figure 3. Architecture of the proposed back-propagation ANN model.
For a proper classification/prediction, in addition to the network design, a training algorithm is required. In an ANN, the back-propagation algorithm uses the difference between the produced result and the desired result to change the "weights"of the connections among the artificial neurons. The importance of this process is that, as the network is trained, the neurons of the intermediate layers organize themselves in such a way that the different neurons learn to recognize different characteristics of the total input space. A back-propagation ANN works under the supervised learning scheme, so a set of known data are required to train the network describing the values of the input variables (predictors) as well as the expected output for each set [77]. In the research reported by [45], the authors argue that the use of artificial neural network analysis can improve the risk-stratification of patients presenting with suspected DVT. Furthermore, they conclude that this approach may improve the analysis of complex data to support decision-making in other areas of clinical medicine [45]. In [46], the author argues that predictive analytics by deep machine-learning will be the next generation tool to improve health care; he concludes by encouraging the physicians to have an open mind about artificial intelligence and deep machine-learning, and to embrace the application and use of predictive algorithms that undoubtedly will unfold over the next decade. This is one of the key pathways to cost-effective, efficient, and safe health care. In addition, he argues, we must overcome the fear of the black box concept of artificial intelligence, and physicians need to be confident that large, well-managed datasets can produce tools that will improve patient care.

Data Augmentation Algorithm
Data Augmentation consists of artificially increasing the volume of the training dataset by applying several distortions to the original information without altering the spatial pattern of target classes [63]. Usually, the distortions are performed during the training time, which allows for doing it on the fly without saving the new information [63]. Data augmentation, which applies deformations and transformations on annotated training samples to generate new training data, is an elegant solution [61][62][63][64]. The main objective of the data augmentation algorithm is to generate a set of synthetic data that adheres to the data observed in real life [61][62][63][64]. Algorithm 1 describes the proposed data augmentation approach, which is proposed to generate each case that will comprise the set of synthetic data for training and validation of the proposed back-propagation ANN model. Therefore, the first task to be performed is to calculate the percentage of positive and negative cases that are present for each type of risk probability of the occurrence of DVT in addition to the percentage for which each of the factors of the Wells Score was observed [18,45,60,65] in the real cases we had access. To calculate the percentage of suspected cases of DVT in each type of risk proposed by Wells [65], historical data from [60] was taken, where it is mentioned that, of all the cases observed, 19% were diagnosed as DVT, while the remaining 81% had a different diagnosis. Furthermore, in [60], it is mentioned that, in the cases detected as Low Risk, only 5% of the cases were diagnosed as positive for DVT, while 17% were diagnosed in Medium Risk, and 53% as High Risk.
Based on the above, this paper proposes a model of linear equations described by (1) to determine the percentage of cases that were presented in each type of risk.

Low
Medium High Risk Risk where x is Low Risk cases, y is Medium Risk cases, and z is High Risk cases. Solving (1), it is obtained that x = 60.9% was classified as Low Risk cases, y = 13.2% as Medium Risk, and z = 25.9% as High Risk. Subsequently, with this information, the distribution of suspected cases of DVT diagnosed as positive and negative is calculated in each type of risk that was used in a new dataset of 10,000 training and validation cases, which are shown in Table 2. This generates an imbalanced data distribution which could lead to misclassification. However, according to [82], the skill of master diagnosticians was not due to a distinctive reasoning process, but instead depended on a clinician's ability to access knowledge from past experience to generate short lists of possible diagnoses. For this reason, it is preferable to train the predictive model from a dataset where the distribution of suspected and positive cases for DVT are consistent with the way the DVT phenomenon occurs in order to take into account the impact that each one of the symptoms have in the DVT diagnosis, just as physicians do.  Then, the calculation of the distribution of Wells' factors in the proposed dataset is based on the analysis of the real cases observed. For this purpose, each factor in each case is observed in the statistical data of real cases recorded in a Mexican hospital, and then, each one is divided over 280, being the total risk factors of the Wells' score identified in the study as described in Equation (2):

Algorithm 1 Data Augmentation
Therefore, the matrix containing the training dataset is generated row by row, where each row represents a patient, i.e., gender, age, and nine risk factor's of Well's score. Initially, the number of synthetic cases of patients to be generated is determined, and the type of risk and the corresponding diagnosis to each of them is calculated. Likewise, the number of occurrences of each of the risk factors of the Wells score is calculated [60,65] to know how many patients that risk factor will be assigned. In each one of the rows, the amount of Wells risk factors is determined randomly that will be marked as present in each patient. In the event of low risk cases, only one factor is specified in all rows; in cases of medium risk, the rows may contain two or three observed factors, while, in high risk, the observed factors vary in a range of 4 or more. Once the number of risk factors are determined, a random number is generated for each of them, used to indicate the observed risk factor in that patient; if the random number represents a risk factor that has already been marked as observed in the patient, then a new random number is generated until one is found that represents a factor not previously observed in the patient; the corresponding column with the value 1 is activated, and the number of cases to which this factor will be assigned is reduced by one. Finally, the value corresponding to the diagnosis of each patient's case is assigned, which is 0 for negative cases and 1 for positive cases.

Pre-Processing Scheme of the Dataset
From each medical record of a regional hospital, the data that coincided with the first nine of the criteria established by Wells for the detection of DVT was extracted; in addition, this paper proposes to use gender and age, which helps to improve the risk-stratification of patients presenting suspected DVT [45]. From the obtained data, a matrix was designed composed of clinical cases in which the diagnosis of suffering a DVT can be positive or negative. The rows of the matrix represent the clinical cases or patients, while the columns represent the inputs to the expert system, that is, the first nine criteria taken from the Wells model, in addition to the patient's gender and age. Each of the inputs to the system was associated with a real numerical value to form the numerical matrix that would serve as training for the system. Therefore, to identify the patient's gender, zero values were assigned to the male gender and one to the female gender. Then, the clinical cases were grouped using patient age ranges as shown in Table 3. Each interval was coded by a numerical value between 1 and 9.  Table 4 shows that, for each Wells' criteria agreed by the patient or the physician, the value of one is assigned, while, for the absence of the same, the value of zero is assigned.  Table 5 shows the proposed prediction model for DVT Diagnosis. It consists of an input layer with 11 predictors (age, gender, cancer, immobilization, surgery, tenderness, leg swollen, calf swollen, edema, superficial veins, previous DVT, three hidden layers (150-100-50) and an output layer, which is the DVT diagnosis.  Figure 3 depicts the proposed ANN model, the 11 predictors mentioned in Table 5 are taken as input for the proposed ANN model, a set hidden layers and an output layer, which is the diagnosis of DVT in a patient. The proposed ANN model consists of an input layer with 11 neurons, three hidden layers with 150, 100, and 50 neurons, respectively, and an output layer with a neuron indicating the diagnosis result of DVT. The activation function used in the input layer and in the hidden layers corresponds to the hyperbolic tangent (tansig), while a linear (purelin) function was used in the output layer.

Training Process for DVT Diagnostic
Traditionally, medical diagnosis is regarded by physicians as an art and depends heavily on the knowledge and experience of each one of them. The diagnostic process is carried out through a combination of activities performed by physicians, which may include physical examination, interview with the patient, review of clinical history, and interpretation of laboratory analysis, among others. This information is used to relate it to known real and academic medical cases, family history and, in some cases, the opinion of colleagues. However, data science opens up new possibilities for medical diagnosis since it allows the analysis of large amounts of data such as the symptoms associated with patient conditions, which can be used in the diagnosis of people with similar symptoms. Nevertheless, one of the hypotheses that arose during this work considers that cases in which patients who showed similar symptoms but opposite diagnosis, negatively affect diagnostic Accuracy in a Decision Support System based, in this case with the proposed ANN model. For this reason, it was deemed necessary to identify and characterize information analysis and establish a training scheme for the generation of an ANN model that allows for improving the Accuracy in the diagnosis of computer-assisted DVT in primary care. To understand the diagnostic process, interviews were conducted with five physicians of different degrees of specialty, who were consulted on how to confirm the suspicion that a patient is suffering an episode of Deep Venous Thrombosis, placing them in a primary care setting, taking as the only additional tool to their knowledge the probabilistic models, particularly the Wells criteria [18,60,65]. In order to train the ANN and determine the experiments that lead to the optimal configuration of the network, the dataset with 11 predictors (see Table 5) was obtained in accordance with the data augmentation Algorithm 1 described in Section 3.2, this by using statistical data of real cases reported in [60] and from a regional hospital. Figure 4 depicts an overview of the training dataset for the proposed ANN model, and all available cases in the dataset were taken into account along with their original diagnosis. It can be seen that 8151 instances have a probability of deep vein thrombosis (DVT) less than 50%, while 570 instances have a 50% probability of DVT and 1279 instances have a probability of DVT greater than 50%. The hyperparameters used for training the ANN model depicted in Figure 3 were tuned by using the Conjugate Gradient Back-propagation Algorithm with Fletcher-Reeves Restarts, a maximum of 1000 epochs, the learning rate is 1 × 10 −2 , the initialization weights is random and the target error minimum is 0.001 to 0.09. The configuration of the computer is: CPU 2.9 GHz Intel Core i5, RAM 8 GB 1867 MHz DDR3, Intel Iris Graphics 6100 1536 MB, MAC operating system Sierra and Software Matlab. Figure 5 depicts the training performance, and it can be seen that the best training performance, also known as Mean Square Error (MSE) is 0.007 at epoch 1000.

Algorithm to Improve Accuracy/Recall
Algorithm 2 describes the proposed approach to improve the classification threshold to maximize the Accuracy or Sensitivity/Recall of DVT classification. The proposed ANN model is classified through a threshold function, which was defined through an improving process, which consisted of evaluating the Accuracy or Recall of the proposed ANN model with respect to a dataset, taking into account a set of thresholds that had a variation between themselves of 0.001, in a range of 0 to 0.99. For each Threshold, the Accuracy or Recall are obtained.
Thus, for best Accuracy, the threshold function is taken by (3), which indicates when the output of the neural network is greater than or equal to 0.484, then the diagnosis of DVT is considered positive ( f (x) = 1); otherwise, it is negative ( f (x) = 0). However, for best Sensitivity/Recall, the threshold function is taken by (4), which indicates that, when the output of the neural network is greater than or equal to 0.138, then the diagnosis of DVT is considered positive ( f (x) = 1); otherwise, it is negative if x ≥ 0.138 then f (x) = 1; else then f (x) = 0.

Results
This section presents the results obtained from the performance of the proposed method. First, a k-fold cross validation by using the 10,000 synthetic cases from the dataset is presented. Subsequently, an analysis of the results from the perspective of the dataset is performed. In addition, comparison of results versus related work is discussed. Later, the results obtained from the validation/test with statistical data of 59 real cases from a regional hospital are shown. Finally, a usage scenario is presented.

K-Fold Cross Validation
Regarding the test/validation of the prediction model, there are two main methods that are used as selection criteria for a prediction model: (i) the hold-out model and (ii) the k-fold cross validation model. Both share the characteristic of using a percentage of the dataset for training and retaining a portion for validation. Cross validation is highly adopted as a predictive model selection criterion [83]. Basically, it consists of using a portion of the dataset to build the model and hold-out another portion of the dataset to validate it. However, the main difference lies in the way that data are used for the training and validation process, while the hold-out method is carried out only once, the k-fold cross validation method carries out k times (see Figure 6) and the results of the classification in each interaction are averaged. Therefore, in this article, we decided to perform the test/validation using a k-fold cross validation, since it could help validate the entire synthetic dataset and not to just rely on a random selection of the training and hold-out dataset, since one of the important aspects to consider when using neural networks is that the model should be independent from the dataset used for training and validation. For this purpose, the k-fold cross validation method is perhaps the most widely used to validate the degree of Accuracy of a neural network model regardless of dataset. It consists of dividing the data set into k segments, and, during k times, a different segment is chosen to validate the model, while the remaining k-1 segments were used to train the neural network as shown in Figure 6. The Accuracy of the data are taken from the average Accuracy obtained in each iteration. According to [84], the most recommended and most commonly used value of k is 10. Therefore, 10 segments of 1000 data are used in this paper. Subsequently, the artificial neural network is trained and validated with these data blocks. Table 6 shows the confusion matrix of two-class classification [85]; four categories can be observed, (i) positive success (True Positive), which occurs when both the output of the case to be validated and the output estimated by the artificial neural network coincide in a positive diagnosis of DVT, (ii) negative success (True Negative), occurring when both the output of the case to be validated and the output estimated by the artificial neural network coincide in a negative diagnosis of DVT, (iii) False Positive, which occurs when the output of the case to be validated is a negative diagnosis, while the ANN model estimates a positive diagnosis, and (iv) False Negative, occurring when the case is validated as a positive diagnosis, while the ANN model estimates a negative diagnosis. The performance evaluation of the proposed ANN model was initiated by calculating the Sensitivity, Speci f icity, Precision, and Accuracy [85,86]. The Sensitivity, also known as Recall [86], it measures the proportion of positives that are correctly identified as such and can be calculated by (5). Similarly, the Specificity measures the proportion of negatives that are correctly identified as such [86] and can be calculated by (6). The Precision is the proportion of true positives among the positive predictions [86]; it can be calculated by (7) and the Accuracy by using (8), Speci f icity = TrueNegatives FalsePositives + TrueNegatives , Accuracy = TruePositives + TrueNegatives TruePositives + FalsePositives + TrueNegatives + FalseNegatives .
(8) Table 7 shows the results of 10-fold cross-validation obtained using the proposed ANN model; it can be observed that the average Accuracy is 90.51% with a standard deviation (Std. Dev.) of 2.67. The True Positive cases are 12.39%, True Negative cases are 78.12%, while False Positive are 2.86% and False Negative are 6.63%. Furthermore, it can be seen that the standard deviation for False Positive cases is 9.54, while, for False Negatives, it is 23.37. As shown in Table 7, the k-fold evaluation showed an average Accuracy of 90.51%. Therefore, it can be said that the proposed ANN model has a confidence range of 90.51%. In addition, it can be observed that the average specificity is 96.46% with a standard deviation of 1.20, while the average precision is 80.96% with a standard deviation of 7.40. In order to get a stronger validation of the data independency, an additional validation was analyzed using the hold-out model with an 80-20% ratio, obtaining an Accuracy result of 90.35% by using a threshold of 0.484, and a MSE of 0.0714 was obtained. The Sensitivity is 64.30%, the Precision is 81.13%, and the Specificity is 96.47%. Therefore, it can be seen that the results obtained with both validations (k-fold cross validation and hold-out model) are similar.

Results from the Perspective of the Dataset
As mentioned in Section 1, medical diagnosis has a high degree of uncertainty due to the complexity of the biological systems, which can lead to two people with similar characteristics showing different symptoms before the same health condition, or, on the contrary, show opposite health conditions even with the same symptomatology. In terms of the prediction model by using ANNs, we can affirm that the above is similar to clinical cases in which the risk factors representing the input predictors to the system are equal in their entirety, obtaining a different resulting diagnosis as output from the CDSS.
On the other hand, ROC curves and Precision-Recall (PR) curves are the most popular ways to estimate the performance of ANN inference methods [87]. ROC curves do not really emphasize a particular interval of values of this ratio and therefore favor methods that are good for a large range of such values. If one knows for example that the ratio between positives and negatives will be very low when applying the classification model, then one is typically only interested in the bottom-left part of the ROC curve [87]. PR curves, on the other hand, provide a better picture of the performance of a method when the ratio between positives and negatives in the test data are close to the ratio one expects when practically applying the model [87]. Binary classification problems are usually substantially imbalanced in favor of the negative class, as the proportion of interacting pairs among all possible pairs is very small. This speaks in favor of the PR curve over the ROC curve [87]. Therefore, Figure 7 shows the PR curve of the proposed classification model using a threshold of 0.484; it can be seen that the result of varying the pivot that distinguishes a positive case from a negative one. As can be observed, a commitment must be established between reducing cases of False Positives and cases of False Negatives. When minimizing False Positives, False Negatives are increased, and vice versa; in this way, it will be a design decision to determine the appropriate configuration for diagnosis. Regarding the area under the curve (AUC), the resulting values of ROC-AUC is 0.9601 and PR-AUC is 0.9114.      This paper presents an alternative to evaluate the diagnosis of DVT by using an ANN model based on the Wells' score [60]. As a variant, in this paper, the Age and gender of the patients are added to improve the Accuracy of the prediction model. For this reason, an analysis was carried out to know the impact that the inputs have on the final result obtained; in its original form, the Accuracy is 90.99%. Table 8 shows an analysis of the importance in case of missing a risk factor (some input). It can be observed that the input that most impacts the result is Age, since eliminating the age produces an Accuracy of 84.59%, which means a decrease of more than 6% with respect to the original value of 90.99% obtained by the trained model considering all the inputs. This influence is quite different from the average between the other variables, which is 88.17% (Standard Deviation of 0.5%). The above can be explained from the stratification that each variable (input) contributes to the ANN model; practically, all the inputs offer a binary segmentation, so it is not surprising that they have a similar influence on the model's Accuracy, unlike the Age that can assume nine different values, which allows the ANN model to better distinguish due to the stratification that this variable provides.

Results Comparison
To clarify the contribution of this paper, this section shows a comparison among: (i) the results of using the proposed ANN model, (ii) the evaluation of the cases of the dataset by the traditional Wells' criteria and, as shown in Table 9, and (iii) performing another M-L approach [35,36,[88][89][90] by using the same dataset, as shown in Table 10. Table 9 shows, when seeking to maximize Accuracy, the proposed scheme has an average Accuracy of 90.99% and a Specificity 96.31%, it can be observed that Accuracy is 17.17% greater than the traditional method reported by Wells [60] without considering gender and age. However, when seeking to maximize the Sensitivity/Recall, the proposed scheme obtained is 84.01%, the Accuracy is 84.95%, and a Specificity of 85.17%.  Table 10 shows a comparison of results versus other machine-learning approaches. It can be observed that the proposed ANN model presents: Specificity 96.31%, Sensitivity/Recall 68.35%, Precision 81.30%, and an Accuracy of 90.99%; it can be seen that, in most cases, it is higher than the other M-L methods, except that the linear SVM method [35,36,88] has a Specificity of 100%, but a Sensitivity/Recall of 0%, which means that all predicted diagnosis are negative. This is because the proposed ANN model using a threshold to make the best decision in diagnosing DVT and ANN models works better with unbalanced datasets [85,91,92]. Therefore, this is the reason why the ANN model was used in this paper.

External Validation
Once the proposed ANN model was trained and validated with the k-fold cross-validation technique, an external validation was performed using the information from 59 real clinical cases provided by a public hospital during the years of 2017 and 2018. Figure 10 shows the results of evaluating the real clinical cases by using the proposed prediction model. It is observed that only 1 out of 36 cases of DVT were diagnosed as negative, when, in fact, they were positive; that is, an Accuracy of 98.30% was achieved (58 of 59 hits); therefore, there is a diagnostic error of 1.70%.

Usage Scenario
The use of the proposed CDSS uses ANN modelo based on the Wells criteria for the diagnosis of DVT; it is recommended that this intelligent system is complementary to the protocol that is currently used, therefore, as illustrated in Figure 11. It is proposed that the physician performs the physical examination and interviews the patient, and then enters the data obtained into the proposed intelligent system. If the physician and the system show that the user does not have a DVT condition, the patient will not receive treatment. If the system yields a diagnosis of positive DVT, but the doctor determines that it is not, then the patient should go home and watch for the appearance of new symptoms or those already existing becoming more acute. In the case that the physician determines that the physical examination suggests a DVT, but the system diagnoses otherwise, the physician will make the decision leaving as a suggestion that the patient should be under observation. Finally, if both the system and the physician determine that the patient is suffering from DVT, treatment of DVT should be initiated.  Figure 11. Suggested use of the proposed CDSS in primary care modules.

Limitations of the Proposed Approach
The dataset used in the training and validation of the proposed classification model is generated according to the proposed Algorithm 1, and, although it is based on actual occurrences according to statistical data reported by Wells [60], it does not faithfully represent the behavior of the population at risk of suffering from DVT. However, the results with real data suggest an accuracy of 98.30% due to the impact that decisions based on the proposed classification model can have on the health of patients; it is suggested to take these results as evidence of the effectiveness of ANNs in the diagnosis of DVT, and look for a dataset with more real cases to train the ANN model before putting it into practice in real-life.

Conclusions
This paper presented a diagnostic model for suspected DVT cases based on back-propagation ANN. In this study, an ANN model was used to improve the learning Accuracy when dealing with a highly unbalanced dataset. The training was performed according to Wells' criteria and considering age and gender. Because of the small amount of historical data in Mexican hospitals with cases of DVT symptoms, a technique of data augmentation to train the ANN model is proposed, which helps to improve Accuracy in the DVT diagnosis. The k-fold cross validation results show a diagnostic Accuracy of 90.99% of the cases of a synthetic dataset and an Accuracy of 98.30% from 59 real-cases of a Mexican hospital. The above could be achieved because the ANN model makes the diagnosis by using a threshold 0.484 found to maximize Accuracy. On the other hand, with a threshold of 0.138, the proposed ANN model improves the classification of positive cases without significantly affecting the classification of negative cases. It is evident that the development and implementation of a CDSS using the ANN model for the diagnosis of DVT in primary care improves the Accuracy of early diagnosis, thus decreasing the flow of patients arriving at the emergency department. As a direct consequence, morbidity and mortality rates in patients in primary care would be reduced; in addition, less economic resources in medical units would be consumed. The proposed approach improves Accuracy by 17.17% versus diagnosis in a traditional way using Wells' criteria without considering gender and age. The proposed ANN model was compared with other well known machine-learning approaches, and it was observed that the Accuracy obtained is better than the related approaches because they do not use an optimized threshold to make the decision. Therefore, it is concluded that the presented approach contributes significantly to the early diagnosis of DVT through probabilistic models such as the Wells criteria, gender, age, and the use of back-propagation ANN to help physicians make a more reliable or accurate decision when making a DVT diagnosis.
As future work, the proposed method and algorithms can be implemented on embedded systems and FPGAs, the above with the purpose of developing intelligent instrumentation that is portable and reliable.