Prediction of the Uric Acid Component in Nephrolithiasis Using Simple Clinical Information about Metabolic Disorder and Obesity: A Machine Learning-Based Model

There is a great need for a diagnostic tool using simple clinical information collected from patients to diagnose uric acid (UA) stones in nephrolithiasis. We built a predictive model making use of machine learning (ML) methodologies entering simple parameters easily obtained at the initial clinical visit. Socio-demographic, health, and clinical data from two cohorts (A and B), both diagnosed with nephrolithiasis, one between 2012 and 2016 and the other between June and December 2020, were collected before nephrolithiasis treatment. A ML-based model for predicting UA stones in nephrolithiasis was developed using eight simple parameters—sex, age, gout, diabetes mellitus, body mass index, estimated glomerular filtration rate, bacteriuria, and urine pH. Data from Cohort A were used for model training and validation (ratio 3:2), while data from Cohort B were used only for validation. One hundred and forty-six (13.3%) out of 1098 patients in Cohort A and 3 (4.23%) out of 71 patients in Cohort B had pure UA stones. For Cohort A, our model achieved a validation AUC (area under ROC curve) of 0.842, with 0.8475 sensitivity and 0.748 specificity. For Cohort B, our model achieved 0.936 AUC, with 1.0 sensitivity, and 0.912 specificity. This ML-based model provides a convenient and reliable method for diagnosing urolithiasis. Using only eight readily available clinical parameters, including information about metabolic disorder and obesity, it distinguished pure uric acid stones from other stones before treatment.


Introduction
Urolithiasis is a common disease worldwide. The prevalence and incidence of these urinary stones are increasing [1]. This rise in global prevalence is a major contributing factor to increases in healthcare costs associated with nephrolithiasis [2]. Moreover, recurrence rates range between 50% and 80%, depending on the type of stone [3].
In urolithiasis, uric acid (UA) stones account for 10% to 15% of all stones [4]. Unlike other types of stones, uric acid stones have been associated with several features of metabolic syndrome and nutrient partitioning disorders including diabetes mellitus and obesity [5][6][7][8][9]. In addition, unlike other stones which require surgery, most uric acid stones can be treated conservatively [10,11]. Thus, it is important to be able to tell the difference between uric acid and non-uric acid stone diseases prior to surgery. Uric acid stones are usually diagnosed through the analysis of stones after they have been retrieved via surgery or medical expulsive therapy. Although recent studies have reported that dual-energy computed tomography (CT) can be an accurate test for distinguishing uric acid from non-uric acid stone diseases prior to treatment [12], these studies have used different cutoffs and CT machine types [13,14], making replication difficult. In addition, dual-energy CT produces a relatively high radiation dose and the equipment is prohibitively expensive for most clinics around the world.
Previously, we used structural equation modeling to construct a model for the pathway analysis of UA nephrolithiasis identifying several simple clinical factors affecting UA stone disease [6]. Among these clinical factors, we found eight variables, namely sex, age, gout, diabetes mellitus (DM), body mass index (BMI), estimated glomerular filtration rate (eGFR), bacteriuria, and urine pH, that have indirect and direct effects on the formation of uric acid stones [6,7]. The aim of this study was to build a reliable diagnostic model making use of machine learning using these variables including information about metabolic disorder and obesity to predict the uric acid component in nephrolithiasis and test it. The eight variables chosen were simple clinical factors easily obtained during the patient's first clinical visit.

Selection and Description of Participants
In this retrospective study, we collected data for two cohorts (A and B), patients diagnosed with nephrolithiasis at Kaohsiung Medical University Hospital, Kaohsiung, Taiwan between January 2012 and December 2016 and between June 2020 and December 2020, respectively. Cohort A was a group of patients we enrolled in a previous study [6,7]. The participant selection criteria for Cohort B were the same as those for Cohort A. We included patients diagnosed with nephrolithiasis who had received genitourinary surgery (ureteroscopic lithotripsy, percutaneous nephrolithotomy, or open nephrolithotomy). We excluded any of these patients if they had genitourinary tract tumors, kidney transplants, genitourinary tract anomalies, recurrent nephrolithiasis, mixed stones containing more than one stone component, or renal replacement therapy, such as hemodialysis. Patients younger than 18 years and those without detailed medical records were also excluded.
This study was approved by the Kaohsiung Medical University Chung-Ho Memorial Hospital Institutional Review Board (KMUHIRB-E(I)-20210061) and was conducted according to the principles of the Declaration of Helsinki. The informed consent requirement was waived by Kaohsiung Medical University Chung-Ho Memorial Hospital Institutional Review Board (KMUHIRB-E(I)-20210061) due to the retrospective nature of this study and the minimal risk involved.

Data Description and Processing
In addition to the response variable, eight variables were used for data analysis and prediction model building. These included two sociodemographic variables (age and sex), three health-related variables (BMI and the comorbidities DM and gout), and three clinical variables (urine pH and bacteriuria, and eGFR). Socio-demographic, health, and clinical information was collected prior to treatment for nephrolithiasis.

Response variable
Stones were analyzed using infrared spectroscopy performed by a medical technologist and confirmed by two urologists. They were defined as 1 (uric acid stone) or 0 (other stone).

Socio-demographic characteristics
Age and gender were recorded. Age was entered as a numeric variable and gender as a binary variable, 1 (male) or 0 (female).

Health information
Health information items were patient BMIs, and whether or not the patient had a history of DM or gout. BMI was entered as a numeric variable with one decimal fraction. DM was entered as 1 (having a history) or 0 (having no history). Gout was recorded similarly, 1 (having a history) or 0 (having no history).

Clinical information
Urinalysis results, including bacteriuria and urine pH, were entered. Urine pH was entered as a numeric variable with one decimal fraction, while bacteriuria was entered as 1 (having bacteriuria) or 0 (having no bacteriuria). eGFR was calculated using the isotope dilution mass spectrometry traceable

Model Development, Fitting, and Evaluation
A model for predicting uric acid stones in nephrolithiasis was developed using machine learning methodologies [16]. The mathematical representation of this model can be expressed as follows: where x represents the input vector of the eight decision variables, and y the model output, which takes a value between 0 and 1. The model output is expressed as the likelihood that stones were composed of uric acid. The decision variables x 1 , . . . , x 8 carry the values of "gender", "age", "eGFR", "urine pH", "BMI", "DM", "gout" and "bacteriuria", respectively. The function f (x) has the following form: where a 1 , · · · , a 6 , b are model parameters, and each f i (·) function is a nonlinear function generated by a fully connected two-layer neural network. Each neural network had an output dimension of 20 for the first layer and 1 for the second layer. More specifically, functions f 1 to f 6 have the following forms: where each of A 1,1 to A 8,1 is a 20-by-1 vector variable which contains 20 parameters, each of A 1,2 to A 6,2 is a 1-by-20 vector variable which also contains 20 parameters, and each of B 1,2 to B 6,2 is a scalar parameter; σ i,1 and σ i,2 , i = 1, . . . , 6 are standard nonlinear activation functions commonly used in building a neural network. In total, the nonlinear function f (x) contains 293 trainable model parameters. The model was built using the Python programming language and its built-in libraries commonly used for machine learning practice. The variables were trained by applying the built-in optimization engine with a weighted binary cross entropy function as its objective.
Cohort A data were used for model training and model validation, while Cohort B data were only used for validation. Group A data were randomly divided into training (60%) and validation (40%) sets. We calculated the prevalence of patients with pure uric acid and non-uric acid stone in both training and validation datasets. There was very little statistical difference between training and validation sets with regard to sociodemographic characteristics, medical history, and bacteriuria.
The model parameters were trained based on the training dataset (Cohort A) and predictive power was evaluated using the validation datasets (Cohorts A and B), using an area under the curve (AUC) in the receiver operating characteristic (ROC) analysis. The optimal cutoff point on the ROC curve was determined based on Youden's index [17], which in turn was used to calculate the sensitivity, specificity, positive predictive value, and negative predictive value for the prediction models applied to the validation data. Finally, the prediction model was applied to data obtained from the Cohort B to determine whether the uric-acid stone patients in Cohort B could be correctly identified by the model.

Characteristics of Study Samples
Cohort A consisted of 1098 patients, 146 (13.3%) with pure uric acid stones and 952 (86.7%) with non-uric acid stones (Tables 1 and 2). Cohort B consisted of 71 patients, including three male patients with pure uric acid stones (Tables 3 and 4). Statistical similarity between patients with non-uric acid nephrolithiasis in Cohort A and those in B was observed. It is also observed that the mean values of the chosen variables (or the "percentages" in the case of binary variables) are statistically different between patients with pure uric acid stones and those with non-uric acid stones. Student's t-test was applied to verify how significant these differences are, and the resulting p-values are also reported in Tables 1-4. All patients in Cohort A and Cohort B did not take certain medications, such as potassium citrate, sodium citrate, sodium bicarbonate, potassium acid phosphate, and acetazolamide, which could impact urine pH one year prior to the study data collection.

Model Performance
The model we developed exhibited good discriminatory power in predicting who would have pure uric acid nephrolithiasis, which it achieved using only a small amount of information that can be or is often collected at the initial patient clinical visit. The model achieved a validation AUC of 0.842 (95% CI, 0.800-0.885) when applied to the Cohort A validation dataset. The optimal cut-off value, which was determined by Youden's index for pure uric acid nephrolithiasis, was 0.470. Using this cut-off value, the model had a sensitivity of 0.848 (95% CI, 0.756-0.939) and a specificity of 0.748 (95% CI, 0.704-0.792). The positive predictive value (PPV) was 0.343 (95% CI, 0.266-0.419) and the negative predictive value (NPV) was 0.969 (95% CI, 0.950-0.989). The overall correction ratio was 76.13%.
The model was also applied to patients in Cohort B. The AUC of the ROC curve was found to be 0.936 (95% CI, 0.854-1.0). The optimal cut-off value for this ROC curve was 0.688. Using this cut-off value, the model had a sensitivity of 1.0 (95% CI, 1.0-1.0) and a specificity of 0.912 (95% CI, 0.844-0.979). The PPV was 0.333 (95% CI, 0.025-0.641) and the NPV was 1.0 (95% CI, 1.0-1.0). The overall correction ratio was 91.55%. Even when a cut-off value of 0.470 obtained from Cohort A was applied, the model was able to correctly predict all three pure uric acid nephrolithiasis patients in Cohort B, achieving a 1.0 sensitivity, though the specificity was reduced to 0.647 and the overall correction ratio reduced to 66.2%.
Note that the validation set from the cohort group A has 13.41% uric acid stone patients and 86.59% non-uric acid stone patients. Therefore, applying a "random guess" predictive mechanism (i.e., those with 50% sensitivity and 50% specificity) on this set of patients would result in a PPV of 0.1341 and a NPV of 0.8659. Likewise, for patients in cohort B, the PPV and NPV would be 0.0423 and 0.9577, respectively. Our model achieved a PPV of 0.343 and an NPV of 0.969 for the validation set from the cohort group A, and a PPV of 0.333 and an NPV of 1.0 for the cohort group B. Although there is still room for improvement, these results clearly demonstrate the predictive power of our model.
The validation ROC curves of the model are shown in Figure 1.

Discussion
In urolithiasis disease, uric acid stones account for 10% to 15% of all urinary stones. Because the chemical properties of uric acid stones are very different from those of other stones (particularly calcium-containing stones), the treatment and prevention of uric acid stones could be improved with a diagnostic approach that can exclude non-uric acid stones [10,11]. This study showed that eight simple clinical variables can be used to build a diagnostic model to distinguish pure uric acid stones from non-uric acid ones with very good accuracy. These variables, including age, sex, past medical history, routine urine analysis, and eGFR, can be easily obtained during the first visit to an emergency room or clinic. These findings can help first-line medical staff to distinguish uric acid stones from non-uric acid stones when patients are diagnosed as having nephrolithiasis. This knowledge could make possible the timely oral administration of chemolysis agents to dissolve the uric acid stones.
A meta-analysis of studies investigating the prediction of uric acid stones using dual-energy CT recently showed that this approach had a highly accurate prediction rate [12]. However, the studies reviewed used different methods and machines. For example, in an early retrospective study on CT image prediction for uric acid stones, Ascenti et al. used 120 kV dual-energy CT and post-processing software (Syngo Kidney Stone, Siemens Healthcare) to predict uric acid stones [18]. Zhang et al. performed a prospective study in 2016 using dual-source dual-energy computed tomography to predict the stone components [19]. There are some problems with the use of CT machines. First, the CT machines used in previous studies were different, with some not commonly used in clinical practice. Second, radiation exposure should be kept to a minimum. In low-dose non-contrast abdominal CT for ureter stone diagnosis, some studies limited the radiation dose to be no more than 3 mSv, while others have limited it to only 1-1.5 mSv [20]. A relatively high radiation dose (8-10 mSv) was used by Zhang et al. to analyze stone components [19]. Third, dual-energy CT machines are more expensive than single-energy CT machines. Most abdominal CT scans are single-energy. Although single-energy CT scans were used to predict uric acid stones in 2021, many sophisticated post-CT image processing methods, such as Laplacian filtering, were required [21]. Thus, image-based diagnostic tools may be impractical in clinical use. We believe our model and methodology, which use only simple clinical variables, is easier to perform and avoids subjecting patients to radiation. The model can also be easily applied in clinics or even in areas where health professionals and medical resources, including CT machines, are not widely available.
It is generally believed that having a large number of decision variables is necessary to successfully build a predictive model based on machine learning algorithms. However, having many decision variables often makes it difficult to determine possible causality [22]. In a recent study, Kazemi et al. used more than 40 variables to build diagnostic tools based on machine learning to predict the stone components [23]. Among these variables, some were not confirmed to be related to the formation of stones, such as marital status and smoking. In a previous study, we used a structural equation model to analyze the relationship between uric acid stones and variables that had some evidence for urolithiasis formation [6,7]. We found eight variables including information about metabolic disorder and obesity to be related to the development of uric acid nephrolithiasis. In this study, we used these eight variables to build a diagnostic model based on machine learning methodologies. This study found that our model had good predictive power and worked well. Compared with previous methods [23], the model we use here requires fewer computational resources and can be used after the initial clinical visit of the patient to provide a rapid and reliable alternative diagnostic tool for the identification of pure uric acid nephrolithiasis.
Although the AUCs of our diagnostic model for the two groups of patients were relatively high, there were still some false-positive outcomes. Some studies have proposed that uric acid crystals form the nidus for the deposition of other stone components [24][25][26]. Additionally, one epidemiological study found patients with hyperuricosuria and gouty diathesis to have a higher incidence of non-uric acid nephrolithiasis [27]. These findings might explain the false positive results. Our model also predicted uric acid stones in some patients who were finally diagnosed with non-uric acid stones. We found that these patients had similar etiologies and clinical characteristics to patients with uric acid nephrolithiasis, so even when misdiagnosed by our model, these patients would more than likely benefit from the same treatment and nephrolithiasis prevention measures.
This study has some limitations. First, the urine pH in this study may have some bias. It is known that the urine pH may be affected by some medications and dietary habits, which were not explicitly considered in the study. This is a subject to be investigated in future studies. Moreover, as we have pointed out in the previous section, the lower prevalence of pure uric stones naturally leads to the tendency of low PPV and high NPV. Our model more than doubled the PPV from a random guess, and yet only reached around 34%. There is room for further improvement, and this will be a focus of future studies. Finally, this study did not assess the efficiency of chemolysis medical treatment in patients who were predicted to have uric acid stones and we did not assess the stone recurrence. In the future, a prospective study can be conducted to evaluate the efficiency of chemolysis treatment on patients predicted to have uric acid stones by our current model. More advanced models based on machine learning methodologies might be further developed to predict uric acid stone recurrence.

Conclusions
In summary, this ML-based model provides a simple, convenient and reliable method for the diagnosis of uric acid stones. With only eight easily obtained clinical parameters including information about metabolic disorder and obesity, this ML-based model can distinguish pure uric acid stones from other stones without advanced equipment before urolithiasis treatment. This diagnostic approach may help to optimize timely treatment strategies for urolithiasis.