You are currently viewing a new version of our website. To view the old version click .
Pathogens
  • Article
  • Open Access

Published: 12 October 2025

Application of Machine Learning Algorithms in Urinary Tract Infections Diagnosis Based on Non-Microbiological Parameters

,
,
,
and
1
Servicio de Medicina Preventiva y Salud Publica, Hospital Universitario Virgen de las Nieves, 18014 Granada, Spain
2
PhD Program in Clinical Medicine and Public Health, University of Granada, 18016 Granada, Spain
3
Department of Microbiology, School of Medicine, University of Granada, 18016 Granada, Spain
4
Instituto de Investigacion Biosanitaria ibs, 18012 Granada, Spain
Pathogens2025, 14(10), 1034;https://doi.org/10.3390/pathogens14101034 
(registering DOI)
This article belongs to the Section Epidemiology of Infectious Diseases

Abstract

Urinary tract infections (UTIs) are among the most common pathologies, with a high incidence in women and hospitalized patients. Their diagnosis is based on the presence of clinical symptoms and signs in addition to the detection of microorganisms in urine trough urine cultures, a time-consuming and resource-intensive test. The goal was to optimize UTI detection through artificial intelligence (machine learning) using non-microbiological laboratory parameters, thereby reducing unnecessary cultures and expediting diagnosis. A total of 4283 urine cultures from patients with suspected UTIs were analyzed in the Microbiology Laboratory of the University Hospital Virgen de las Nieves (Granada, Spain) between 2016 and 2020. Various machine learning algorithms were applied to predict positive urine cultures and the type of isolated microorganism. Random Forest demonstrated the best performance, achieving an accuracy (percentage of correct positive and negative classifications) of 82.2% and an area under the ROC curve of 87.1%. Moreover, the Tree algorithm successfully predicted the presence of Gram-negative bacilli in urine cultures with an accuracy of 79.0%. Among the most relevant predictive variables were the presence of leukocytes and nitrites in the urine dipstick test, along with elevated white cells count, monocyte count, lymphocyte percentage in blood and creatinine levels. The integration of AI algorithms and non-microbiological parameters within the diagnostic and management pathways of UTI holds considerable promise. However, further validation with clinical data is required for integration into hospital practice.

1. Introduction

Urinary tract infections (UTIs), both complicated and uncomplicated, are highly prevalent in the general population, with a higher incidence in women, children under two years old, catheterized individuals, and patients with diabetes mellitus []. Globally, in 2019, the age-standardized incidence rate was 404.61 UTI cases per million inhabitants, with 30.12 cases per million in North America and 31.83 cases per million in Western Europe []. In Spain, the 2024 Nosocomial Infection Prevalence Study (EPINE) estimated the prevalence of UTIs at 20.13% among hospitalized patients, with Escherichia coli being the most frequently isolated microorganism []. In our local setting, prevalence rises to 23.53%, with E. coli responsible for 26.67% of UTIs [].
The diagnosis of UTIs is based on the presence of clinical symptoms and signs in addition to the detection of microorganisms in urine trough urine cultures. In the era of Artificial Intelligence (AI), new opportunities arise to develop tools that optimize healthcare resource utilization, reduce diagnostic test turnaround times and associated costs, and enhance patient safety and satisfaction. AI not only improves outcome prediction by adapting existing resources to clinical demands but also enables the detection of potential anomalies in patient care processes. These techniques foster new approaches to information management in general and UTI diagnostics in particular, which would be unfeasible with conventional methods. AI facilitates the integration of multiple real-time data sources, aiding in both diagnosis and treatment decision-making [].
The application of AI in disease diagnostics represents a significant breakthrough in modern medicine, enabling faster, more accurate, and more efficient diagnoses. Algorithms such as Random Forest, artificial neural networks, and gradient boosting have demonstrated their ability to identify complex patterns in clinical and laboratory data, surpassing traditional methods in sensitivity and specificity []. Additionally, AI-driven applications allow for the processing and analysis of structured information, such as anonymized clinical data, through statistical models or machine learning approaches that complement medical decision-making. These tools offer predictive insights, prioritize cases based on input data, and support evidence-based clinical decisions []. However, AI remains in an exploratory phase, requiring further studies analyzing large, heterogeneous, and prospective datasets to reach full-scale implementation []. While promising, these advancements necessitate additional validation before being fully integrated into routine clinical practice [].
UTIs are a common pathology that demands substantial healthcare resources for diagnosis. AI models could not only optimize UTI detection but also reduce unnecessary or duplicated urine cultures and alleviate laboratory workloads. The diagnostic process could potentially be streamlined through automated algorithms and advanced machine learning techniques, which allow for UTI modeling while also identifying correlations with other analytical parameters. The objective of this study was to define, apply, and validate different classification algorithms to model UTI diagnosis and identify the most frequently implicated microbial groups using machine learning techniques based on non-microbiological laboratory parameters.

2. Materials and Methods

2.1. Population, Sample and Source of Information

A cross-sectional descriptive study was conducted at University Hospital Virgen de las Nieves (Granada, Spain), a reference healthcare center serving a population of approximately 334,000 inhabitants. The hospital houses several specialized laboratories, including Microbiology, Hematology, and Biochemistry. The study population consisted of patients with suspected UTIs referred from Specialized Care, including hospitalized patients as well as those from Outpatient Clinics and the Emergency Department.
A case was defined as any urine culture performed in the Microbiology Laboratory between January 2016 and December 2020, regardless of patient age or sex, provided that documented results from a urine dipstick test and basic hematological parameters (red blood cell, white blood cell, and platelet counts) were available (Laboratoire Sysmex, Sant Just Desvern, Barcelona, Spain). Urine cultures were excluded if the sample origin was unclear or if patient age and sex were not recorded.
Urine sample culturing followed the laboratory’s standardized protocol. A calibrated 1-μL inoculating loop was used, and samples were cultured on UriSelect4 chromogenic medium (Bio-Rad, Barcelona, Spain) and incubated at 36 ± 1 °C for 24 h. For patients attended in the Nephrology Department, an additional blood agar plate (BD, Madrid, Spain) was incubated in CO2. Colony counts were evaluated using the following cut-off values: in urine samples obtained by spontaneous miction or from a permanent catheter, UTI was defined by a count of ≥100,000 CFU (colony forming units)/mL or count of >10,000 CFU/mL for a single microorganism; in urine samples obtained from a temporary urethral catheter a count of ≥10,000 CFU/mL for one or two microorganisms was considered significant, and urine cultures with the growth of >2 microorganisms were excluded [,]. Microorganisms were subsequently identified using MALDI-TOF mass spectrometry (Biotyper, Bruker Daltonics, Billerica, MA, USA) and/or the MicroScan WalkAway system (Beckman-Coulter, Brea, CA, USA).

2.2. Variables and Statistical Analysis

The response variables were positive urine culture (yes/no) and the type of isolated microorganism (Gram-negative bacillus, Gram-positive coccus, yeast, others), later dichotomized as Gram-negative bacillus/others. The independent variables included were those indicated in Table 1. Only hematological and biochemical variables with less than 10% missing values were included in the analysis.
Table 1. List of the 42 variables included in the study.
The statistical analysis followed several steps:
  • Descriptive Analysis: Qualitative variables were summarized using frequencies and percentages, while quantitative variables were described using means and standard deviations (for symmetric distributions) or medians and interquartile ranges (for asymmetric distributions).
  • Estimation of the Percentage of Positive Urine Cultures (positive urine cultures divided by the total number of urine cultures). This was accompanied by a 95% confidence interval.
  • Bivariate Analysis: Associations between quantitative variables and urine culture results were tested using Student’s t-test (for equal variances) or Welch’s t-test (for unequal variances). Associations with qualitative variables were assessed using Pearson’s chi-square test. Statistical significance was set at p < 0.05.
  • Machine Learning Algorithm Application:
    • Variable Recoding: All categorical variables were previously converted to dichotomous (positive-negative), except for months and sample type that were recoded as dummy variables.
    • Dataset Splitting: The dataset was divided into a training set (80%) for model development and a test set (20%) for validation, ensuring sufficient data for both stages to enhance model generalizability [].
    • Missing Data Imputation: Missing values were imputed in the training set using the k-Nearest Neighbors (kNN) method, which identifies the k closest neighbors based on dataset variables and imputes missing values using the mean. Subsequently, imputation was performed on the test set using the imputation model learned on the training set.
    • Variable Normalization: To ensure all variables contributed equally to the model and to mitigate differences in measurement scales, quantitative values were rescaled between 0 and 1.
    • Pearson Correlation Analysis: Highly correlated variables (r > 0.7) were removed to reduce redundancy.
    • Variable Importance Assessment: The most relevant variables for the training set were selected using the Random Forest method with cross validation with 10 folds.
    • Machine Learning Algorithm Implementation: The following widely used algorithms were applied (the cross-validation method with 10 iterations of resampling was used for training). All the algorithms were initialized with the same seed to avoid reproducibility:
      Logistic Regression—A classification model predicting binary outcomes using a logistic function.
      k-Nearest Neighbors (kNN)—Classifies a data point based on the majority class of its k nearest neighbors using a distance metric.
      Decision Trees—Splits data into successive decisions based on feature values, forming a tree structure.
      Support Vector Machine (SVM)—Identifies the hyperplane that maximizes class separation.
      Random Forest—An ensemble method averaging predictions from multiple decision trees.
      Gradient Boosting Machine (GBM)—A sequential model where each iteration corrects the errors of the previous one.
      Artificial Neural Network (NNet)—A multi-layered model that processes data hierarchically through interconnected nodes (neurons).
    • Model Prediction and Validation: Performance was evaluated using a confusion matrix (true positives, true negatives, false positives, and false negatives) in the test set.
    • Accuracy Calculation: Defined as the proportion of correct predictions over the total number of predictions.
    • Sensitivity, Specificity, and ROC Curve Analysis: Performance comparisons among models were conducted using receiver operating characteristic (ROC) curves, with sensitivity and specificity reported alongside 95% confidence intervals.
The statistical analysis was conducted using RStudio (version 2023.06.1+524), and the machine learning algorithms were implemented via the “caret”package (version 7.0.1) in R. The caret package (classification and regression training) includes functions to perform complex methods for classification and regression of variables (https://topepo.github.io/caret/index.html) (accessed on 3 February 2025).

2.3. Ethical Considerations

Urine cultures and analytical parameters were obtained as part of standardized hospital procedures and were not linked to additional research interventions. Consequently, as this was a retrospective study, informed patient consent was not required for data analysis. Patient records were anonymized using an identification number, and data were processed in aggregate to prevent patient identification. The study received approval from the provincial ethics committee on 21 December 2020 (reference code 1671-N-20). A data management plan was developed to define the life cycle of data, its processing and management in accordance with EU Regulation 2016/679 on the protection of individuals with regard to the processing of personal data (GDPR).

3. Results

Between January 2016 and December 2020, a total of 6747 urine samples for urine culture were received in the Microbiology laboratory, of which 66 were discarded due to coding issues. After verifying the availability of documented results from a urine dipstick test, as well as hematological and biochemical results, and excluding cases with more than 10% missing values, the final sample consisted of 4283 urine cultures. Of the 42 variables included in the study, 36 had a range of missing values between 0.3% and 7.5%.
Table 2 presents the values obtained for the study variables, expressed as mean ± standard deviation for quantitative variables and as percentages for qualitative variables. The mean age of the patients was 56.4 ± 23.1 years. Of the total urine cultures, 4108 (95.9%) were from adults (aged 18 years or older), and 51.5% were from female patients. The majority of samples were collected as spontaneous miction (83.8%), during January (9.6%), and originated from community settings (emergency services and outpatient clinics) in 91.9% of cases. The proportion of urine cultures with a glucose level of 0 mg/dL was 90.4%, while ketone bodies, proteins, and nitrites had values of 0 mg/dL in 80.0%, 44.2%, and 83.1% of cases, respectively. Additionally, the absence of red and white blood cells was observed in 29.8% and 36.4% of the samples, respectively.
Table 2. Values obtained for all study variables, bivariate analysis according to positive or negative urine culture, and results of the comparison tests for two samples (Student’s t-test, Welch’s test, and Chi-square test).
A total of 38.6% of urine cultures were positive (95% CI: 37.2–40.1%), with a single microorganism isolated in 86.8% of these cases and two microorganisms in the remaining cases. Gram-negative bacilli were identified in 68.8% of positive cultures, with E. coli being the most frequently isolated bacterium, while Gram-positive cocci were found in 20%, with Enterococcus faecalis being the most prevalent species. In the bivariate analysis (Table 2), significant differences were observed between positive and negative urine cultures for all study variables (p < 0.05), except for the month of sample collection, glucose, bilirubin, and ketone bodies in the urine dipstick test, as well as lymphocyte count, mean corpuscular and platelet volume in the red and platelet series, and the hemolytic, icteric, and lipemic indices in the hematological analysis.
Among positive urine cultures, higher values were observed for patient age, female sex, samples collected during the summer months and December, those obtained from permanent and temporary urethral catheterization, and those of community origin. Significantly higher values were also found for pH, hematuria (≥2+), protein levels (≥1 mg/dL), the presence of nitrites, leukocytes (≥3+), creatinine, basophil count, immature granulocyte count and percentage, leukocyte count, lymphocyte percentage, monocyte count, neutrophil count and percentage, platelet count, and red cell distribution width. The remaining variables were significantly higher in negative urine cultures.
The dataset was split into 80% training data (3427 records) and 20% test data (856 records), with both subsets maintaining a positive urine culture rate of 38.6%, identical to the full dataset.
A correlation analysis (Figure 1) identified seven variables (neutrophil count, neutrophil percentage, hemoglobin, hematocrit, eosinophil percentage, mean corpuscular hemoglobin and immature granulocyte count) that were highly correlated with others (significant correlation >0.7) and were excluded from subsequent analyses. After applying the variable importance algorithm using “ref”with Random Forest, all variables were selected as important, indicating that none were discarded due to lack of relevance in the model.
Figure 1. Correlation between highly correlated variables and the rest of variables. v1: age; v2: glucose; v3: bilirrubin; v4: ketone bodies; v5: density; v6: pH; v7: hematuria; v8: proteins; v9: urobilinogen; v10: leukocytes; v11: creatinine; v12: basophil percentage; v13: basophil count; v14: eosinophil count; v15: immature granulocyte percentage; v16: white blood cell count; v17: lymphocyte percentage; v18: lymphocyte count; v19: monocyte percentage; v20: monocyte count; v21: platelet count; v22: mean platelet volume; v23: mean corpuscular hemoglobin concentration; v24: red cell distribution with index; v25: red blood cell count; v26: mean corpuscular volume; v27: hemolytic index; v28: icteric index; v29: lipemic index; v30: sex; v31: origin; v32: nitrites; v33: temporary urethral catheter v34: permanent catheter; from v35 to v45: January to November.
The seven selected machine learning algorithms were trained on the training set with the following parameters: k = 9 in kNN, mtry = 2 in RF, sigma = 0.01530869 and C = 0.5 in SVM, n.trees = 150, interaction.depth = 3, shrinkage = 0.1 and n.minobsinnode = 10 in GBM. The percentage of minority group values was almost 39%, therefore it was not considered an unbalanced study. After applying them to the test set for validation, the Random Forest and Gradient Boosting Machine algorithms demonstrated the best performance, with accuracies of 82.2% and 78.2%, respectively. Meanwhile, the Support Vector Machine algorithm, neural network and logistic regression yielded more modest results, with accuracies around 77%. The decision tree models and kNN also performed well overall, although their accuracy was slightly lower (Table 3).
Table 3. Application of the algorithms on the test set in the determination of positive urine cultures, according to decreasing order of accuracy.
The ROC curves (Figure 2) generated for each model generally indicated high discriminative power, with an area under the curve (AUC) exceeding 87% for Random Forest and over 83% for Gradient Boost Machine, Support Vector Machine, Neural Network and Logistic Regression (Table 3). In comparison, the kNN and Decision Tree models had lower AUC values, suggesting a reduced ability to distinguish between positive and negative urine cultures.
Figure 2. ROC curve of the algorithms applied in the detection of positive urine cultures. GBM: Gradient Boost Machine; kNN: k Nearest Neighbourg; Logistic: Logistic Regresion; NNet: Neural Network; RF: Random Forest; SVM: Support Vector Machine; Tree: Decision Tree. Dashed line: baseline.
The analysis of variable importance in the Random Forest model, which had the highest discriminative power, identified the most influential predictors of positive urine cultures as the presence of leukocytes and nitrites and density in the urine dipstick test, along with elevated white cells count, monocyte count, lymphocyte percentage in blood and creatinine levels. Conversely, the least important variables were sample origin, glucose and month of year.
Among the positive urine culture samples (n = 1655), the isolated microorganism variable was dichotomized into two groups: Gram-negative bacilli and others. The percentage of isolates in the Gram-negative bacilli group was 66.8% (CI95%: 64.6–69.1%), mostly located in the first isolate. A bivariate analysis was performed to compare predictive variables, identifying 18 significant variables. Lower mean values were observed in the Gram-negative bacilli group for age, number of isolation, basophil percentage, lymphocyte percentage, monocyte percentage, eosinophil count and percentage, red cell distribution width, hospital origin, male sex, and urine samples obtained via permanent bladder catheterization (p < 0.05).
After imputing missing values to the training sample and subsequently eliminating the most correlated variables, the variable importance procedure was applied to determine the most significant variables for the application of the algorithms. The resulting variables are the 15 indicated in Table 4.
Table 4. Results of the application of the algorithms on the test set in the determination of Gram-negative bacilli according to decreasing order of accuracy (n = 1655).
The seven machine learning algorithms were trained in the train set and applied to the test set to predict whether the isolated microorganism was a Gram-negative bacillus. The estimated parameters were as follows: k = 5 in kNN, mtry = 8 in RF, sigma = 0.06761723 and C = 0.25 in SVM, n.trees = 50, interaction.depth = 2, shrinkage = 0.1 and n.minobsinnode = 10 in GBM. The best predictive performance was achieved by the Tree algorithm (77.3% accuracy), with a sensitivity of 90.5% (Table 4). Neural Network algorithm, GBM algorithm, Logistic Regression and Support Vector Machine provided the same accuracy (76.4%) and similar sensitivities (86–92%). The most important predictive variables in this model were number of isolation, red cell distribution width index, eosinophil count and monocyte count, while origin, proteins (dipstick test) and mean platelet volume were the least important.

4. Discussion

The present study highlights the advantage of applying machine learning algorithms to a set of urine samples to automate the detection of positive urine cultures solely based on laboratory parameters. The study demonstrated that the Random Forest algorithm yielded the most satisfactory results, with an area under the ROC curve (AUC) of 87.10%, showing good specificity in detecting negative urine cultures (92.76%) and moderate sensitivity for detecting positive cases (65.56%). The novelty of the study is that only laboratory parameters were used without including other types of clinical variables, risk factors, patient history or treatments, a fact that has been extensively studied by most of the existing publications on the subject that were found, which are cited in this article. Biochemical and hematological variables scarcely studied in the literature have been included, none of them in our geographical area, with a high prevalence of UTI, which adds relevance and interest to the findings, since with a more limited set of data, results similar to some of the published studies that do consider them are obtained.
Various studies have applied machine learning techniques for UTI detection based on different variables. A study conducted in the United Kingdom, where UTIs were one of the five leading causes of hospital admissions (accounting for approximately 9%), reported an accuracy of 85% in correctly classifying UTIs using variables related to sleep patterns []. Ozkan et al. developed an algorithm with 98% accuracy for diagnosing UTIs based on medical history, clinical examination, and ultrasound techniques []. Aydin et al. found an accuracy of 91% in a pediatric study, linking UTIs with the presence of nitrites, leukocytes, catheterization, blood samples, and gender []. The work by Gadalla et al. identified a model in women using clinical variables, biomarkers, and urine turbidity, achieving 70% accuracy []. In an emergency department setting, a maximum accuracy of 87% was obtained []. Finally, a study with a larger sample size than the previous ones predicted UTIs with 98% accuracy using hemogram variables, along with patient age and gender [].
In hospitalized geriatric patients, a relationship has been found between UTIs and bloodstream bacteremia, after evaluating the utility of parameters such as white blood cell count, creatinine concentration, and erythrocyte sedimentation rate (ESR), which significantly increase infection severity. Bacteremia can lead to severe complications requiring more intensive care. This underscores the importance of accurately assessing laboratory parameters in the management of UTIs in older adults []. A correlation has also been observed between hematological parameters and urine culture results in pregnant women, with higher white blood cell levels in cases of asymptomatic bacteriuria []. In patients with permanent urinary catheters, the presence of leukocytes and red blood cells in urine may reflect colonization rather than true infection. Differentiating between these conditions requires a thorough clinical assessment, including evaluation of symptoms, inflammatory markers, and response to therapy, variables not addressed in the present study. Therefore, our findings, based solely on laboratory parameters, should be interpreted with caution in this subgroup, and always in conjunction with clinical judgment to avoid potential misclassification.
Although urine density showed statistically significant differences between groups, the absolute difference was minimal (1.017 vs. 1.018) and likely lacks clinical relevance. This finding, while statistically robust due to the large sample size, should be interpreted with caution, as it does not reflect a physiologically meaningful difference in the context of urinary tract infection diagnosis. On the other hand, the non-linear distribution of hematuria, proteinuria, and leukocyturia observed in this study may reflect the influence of multiple coexisting conditions and the limitations of single-marker interpretation. Mild alterations may be related to non-infectious etiologies, while higher grades are more often associated with true urinary tract infections. These findings highlight the importance of interpreting urinalysis results in the broader clinical context and avoiding over-reliance on isolated parameters.
The present study shows similar results to those reported by Taylor [] who, using 37 variables, demonstrated that the Random Forest algorithm was the most accurate, with an AUC of 87.4%. The main difference between both studies is that Taylor’s included variables from medical history, as well as patient signs and symptoms. Similarly, in the study by Nyman [], the Random Forest algorithm stood out among the applied methods, reducing the workload of cultures by 46% by identifying negative samples, while maintaining a sensitivity of 95% and a specificity of 72%, based on 33 predictive variables derived from flow cytometry. Likewise, Dedeené [] identified a logistic regression model using flow cytometry parameters, achieving an AUC of 85.8%, whereas applying a neural network resulted in an AUC of 88.4%. Del Ben [] developed a machine learning model based on decision trees to improve urine sample screening and reduce microbiology laboratory workload. The model achieved a sensitivity of 94.5% and allowed for the identification of negative samples, further reducing unnecessary cultures by 16%, with an estimated financial impact of EUR 40,000 annually. Although our study obtained more moderate sensitivity values, in terms of accuracy they are comparable to those cited, with the advantage of having analyzed only laboratory variables, which are easier to obtain in our environment than clinical variables. These findings suggest that applied methodologies have the potential to optimize clinical diagnostic processes, saving both time and significant costs.
Regarding the prediction of the microorganism responsible for the UTI, our results suggest that five of the seven algorithms used (Tree, Neural Network, GBM and SVM) can be applied to predict that the causative agent belongs to the Gram-negative bacilli group (including E. coli). Using these algorithms, an AUC of 78.5–82.1% was obtained, with a sensitivity around 90% in all cases, although with moderate specificity. Our figures are slightly higher than those reported by Choi [,], who obtained an AUC of approximately 74% using automated urine tests, while AI models, particularly XGBoost, achieved a high ROC AUC (>90%). It should be noted that in Choi’s study, samples from more than 50,000 patients were used, about ten times more than those included in this study, providing much greater statistical power and consistency to their results. Machine learning algorithms are specifically designed to work with large datasets; although they can function with smaller ones, they may overfit the training data, reducing their ability to generalize to new cases. Additionally, the variability in smaller datasets may be insufficient to capture complex patterns []. Despite including a small sample in our study, the high sensitivity values obtained show that the algorithms could be used as a preliminary screening tool in the detection of Gram-negative bacilli, including E. coli.
With regard to the number of isolates in positive ITUs, some studies show that the isolation of two microorganisms is often labeled as contamination and therefore many models exclude them. At the methodological level, there are studies that specifically predict mixed growth as a sign of contamination, showing that it is an important source of false positives if not modeled separately. At the same time, recent reviews highlight that some polymicrobial cultures may represent real infections and that systematically excluding them can introduce bias and lead to the loss of clinically relevant cases, so they propose reconsidering their inclusion in the diagnosis []. In our study, most monomicrobial cultures were caused by Gram-negative bacilli. Likewise, in polymicrobial cultures, at least one of the isolates was a Gram-negative bacillus, supporting their predominant role in urinary tract infections and highlighting their relevance in the prediction of these pathogens. Microbiological confirmation of the causative microorganism and its antimicrobial susceptibility profile is essential for the rational selection of antibiotic therapy, prevention of resistance, and individualized management of urinary tract infections. The use of the proposed algorithms should always be complemented by microbiological and clinical interpretation, ensuring maximum efficacy and safety for the patient.
Beyond this limitation, the main drawback of the present study is the exclusive consideration of laboratory variables for developing the algorithms. The hospitalization and emergency information systems at our hospital are separate from the laboratory systems, making it highly complex to cross-reference databases for obtaining other types of clinical or hospital admission variables. Nevertheless, very satisfactory results have been obtained, which would likely have been even better had we been able to complement the study with these additional variables. According to clinical guidelines, the first urine sample obtained in suspected urinary tract infection should be used for both biochemical analysis and urine culture, ensuring proper collection technique to minimize contamination. Burton demonstrated that by applying machine learning algorithms, laboratory workload can be reduced by up to 41% [].
The algorithms found have shown satisfactory results for the overall sample, but when particularized to positive urine cultures, they have turned out to be very modest. Therefore, it is necessary to expand the sample and/or set of predictor variables in order to achieve more satisfactory predictions in this group.
The main strength, on the other hand, is that by predicting the positivity or negativity of urine cultures using only laboratory parameters obtained from routine work, the cost of the determinations is affordable, and the necessary information for making predictions is much simpler and faster to collect.
Exploratory models based on machine learning have proven to be highly valuable tools in the field of medical diagnostics in general and in the detection of positive urine cultures in particular. These models allow for the analysis of large volumes of clinical and/or laboratory data, identifying patterns that might go unnoticed with traditional approaches. By learning from historical data, algorithms can improve the accuracy of UTI detection by evaluating multiple variables simultaneously, optimizing the diagnostic process, reducing false negatives, and facilitating earlier intervention. The ability to adjust models to local datasets or specific populations further enhances their utility, as they can be tailored to particular patient characteristics, improving personalized treatment.
Finally, the integration of AI algorithms and non-microbiological parameters within the diagnostic and management pathways of UTI holds considerable promise. Nevertheless, it is imperative to acknowledge that such tools must be regarded as complementary rather than substitutive, as microbiological confirmation of the causative pathogen and its antimicrobial susceptibility remains the cornerstone of safe and rational management. The advancement of this field is contingent upon the integration of innovative computational models with conventional microbiological methodologies. This integration will promote the development of more personalized, efficient, and evidence-based clinical strategies.

5. Conclusions

The use of machine learning models has proven effective in predicting UTIs with high-moderate sensitivity and specificity, based on laboratory parameters. The presence of leukocytes and nitrites in the urine dipstick test, along with elevated white cells count, monocyte count, lymphocyte percentage in blood and creatinine levels, were identified as key markers for diagnosing these infections. These models are easy to implement in healthcare settings, as they can be integrated into routine clinical software applications, providing clinicians with a valuable tool to diagnose UTIs and determine their most likely etiology.
This study is exploratory, as a first step towards the detection of UTIs using machine learning models, so the conclusions are an advance in the application of these models. The interpretations of these results should be made with caution, in the sense that confirmatory studies with more samples and more variables are needed to validate the results obtained.

Author Contributions

Conceptualization, M.M.R.d.Á., A.S.-P., J.M.N.-M. and J.G.-F.; methodology, M.M.R.d.Á.; software, M.M.R.d.Á.; validation, M.M.R.d.Á.; formal analysis, M.M.R.d.Á. and C.B.-R.; investigation, M.M.R.d.Á.; resources, J.M.N.-M. and J.G.-F.; data curation, M.M.R.d.Á.; writing—original draft preparation, M.M.R.d.Á., A.S.-P., J.G.-F. and C.B.-R.; writing—review and editing, all authors; visualization, M.M.R.d.Á. and A.S.-P.; supervision, A.S.-P. and J.G.-F.; project administration, J.G.-F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received external funding from Research Group CTS-521 of University of Granada-Junta de Andalucia, Spain.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the Granada Provincial Ethics Committee (Spain) (protocol code 1671-N-20 and date of approval 21 December 2020).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

To Mª Carmen Olvera Porcel and Rubén Alba Ruiz from the Research Management and Support Unit of the Hospital Universitario Virgen de las Nieves (Granada, Spain), for technical support and bibliographic search.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
UTIUrinary Tract Infection
AIArtificial Intelligent
EPINENosocomial Infection Prevalence Study (Spain)
CFUColony Forming Units
RBCRed Blood Cell Series
WBCWhite Blood Cell Series
PSPlatelet Series
RFRandom Forest
SVMSupport Vector Machine
GBMGradient Boost Machine
NNetNeural Network
LRLogistic Regression
TreeDecision Tree
kNNk Nearest-Neighbourg
CI95%95% Confidence Interval
ROCReceiver Operating Characteristic curve
AUCArea Under ROC curve

References

  1. Medina Polo, J.; Arribi Vilela, A.; Candel González, F.J.; Salinas Casado, J. Actualización de la Infección Urinaria en Urología. Available online: https://www.aeu.es/UserFiles/files/ManualInfeccionesUrinarias.pdf (accessed on 8 October 2024).
  2. Yang, X.; Chen, H.; Zheng, Y.; Qu, S.; Wang, H.; Yi, F. Disease burden and long-term trends of urinary tract infections: A worldwide report. Front. Public Health 2022, 10, 888205. [Google Scholar] [CrossRef]
  3. Sociedad Española de Medicina Preventiva, Salud Pública y Gestión Sanitaria. Prevalencia de Infecciones (Relacionadas con la Asistencia Sanitaria y Comunitarias) y uso de Antimicrobianos en Hospitales de Agudos; Sociedad Española de Medicina Preventiva, Salud Pública y Gestión Sanitaria: Madrid, Spain, 2024. [Google Scholar]
  4. Sociedad Española de Medicina Preventiva, Salud Pública y Gestión Sanitaria. Informe Preliminar Epine Año 2024; H. Universitario Virgen de las Nieves: Granada, Spain, 2024. [Google Scholar]
  5. Dai, Y.; Sullivan, B.; Montout, A.; Dillon, A.; Waller, C.; Acs, P.; Denholm, R.; Williams, P.; Hay, A.D.; Santos-Rodriguez, R.; et al. Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset. arXiv 2025, arXiv:2411.17645. [Google Scholar] [CrossRef]
  6. Park, J.I.; Bliss, D.Z.; Chi, C.-L.; Delaney, C.W.; Westra, B.L. Knowledge Discovery with Machine Learning for Hospital-Acquired Catheter-Associated Urinary Tract Infections. Comput. Inform. Nurs. 2020, 38, 28–35. [Google Scholar] [CrossRef] [PubMed]
  7. Ruiz, R.B.; Velásquez, J.D. Inteligencia artificial al servicio de la salud del futuro. Rev. Med. Clin. Condes 2022, 34, 84–91. [Google Scholar] [CrossRef]
  8. Naik, N.; Talyshinskii, A.; Shetty, D.K.; Hameed, B.M.Z.; Zhankina, R.; Somani, B.K. Smart Diagnosis of Urinary Tract Infections: Is Artificial Intelligence the Fast-Lane Solution? Curr. Urol. Rep. 2024, 25, 37–47. [Google Scholar] [CrossRef]
  9. Goździkiewicz, N.; Zwolińska, D.; Polak-Jonkisz, D. The Use of Artificial Intelligence Algorithms in the Diagnosis of Urinary Tract Infections-A Literature Review. J. Clin. Med. 2022, 11, 2734. [Google Scholar] [CrossRef]
  10. Rojo, M.D.; Bautista, M.F.; Gutiérrez-Fernández, J. Procedimiento Normalizado de Trabajo. Cultivo Cuantitativo de Orina para Estudio de Microorganismos Aerobios/Facultativos de Crecimiento Rápido (Vol. 8). PNT-OR-01. Available online, 2015. Available online: https://doi.org/10.6084/m9.figshare.1149879 (accessed on 4 March 2025).
  11. Rodríguez Del Águila, M.M.; Sorlózano-Puerto, A.; Fernández-Sierra, M.A.; Navarro Marí, J.M.; Gutiérrez Fernández, J. Características sociodemográficas y factores de riesgo asociados a las bacteriurias significativas en un área de salud del sudeste español [Sociodemographic characteristics and risk factors associated to significative bacteriuria in a Spanish health area]. Rev. Española Quimioter. 2022, 35, 382–391. [Google Scholar] [CrossRef] [PubMed]
  12. Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation. Departmental Technical Reports (CS). 2018. Available online: https://scholarworks.utep.edu/cs_techrep/1209 (accessed on 17 March 2025).
  13. Enshaeifar, S.; Zoha, A.; Skillman, S.; Markides, A.; Acton, S.T.; Elsaleh, T.; Kenny, M.; Rostill, H.; Nilforooshan, R.; Barnaghi, P. Machine learning methods for detecting urinary tract infection and analysing daily living activities in people with dementia. PLoS ONE 2019, 14, e0209909. [Google Scholar] [CrossRef] [PubMed]
  14. Ozkan, I.A.; Koklu, M.; Sert, I.U. Diagnosis of urinary tract infection based on artificial intelligence methods. Comput. Methods Programs Biomed. 2018, 166, 51–59. [Google Scholar] [CrossRef]
  15. Aydin, D.B.; Er, O. A new proposal for early stage diagnosis of urinary tract infection using computers aid systems. Sak. Univ. J. Comput. Inf. Sci. 2018, 1, 1–9. [Google Scholar]
  16. Gadalla, A.A.H.; Friberg, I.M.; Kift-Morgan, A.; Zhang, J.; Eberl, M.; Topley, N.; Weeks, I.; Cuff, S.; Wootton, M.; Gal, M.; et al. Identification of clinical and urine biomarkers for uncomplicated urinary tract infection using machine learning algorithms. Sci. Rep. 2019, 9, 19694. [Google Scholar] [CrossRef]
  17. Taylor, R.A.; Moore, C.L.; Cheung, K.-H.; Brandt, C. Predicting urinary tract infections in the emergency department with machine learning. PLoS ONE 2018, 13, e0194085. [Google Scholar] [CrossRef] [PubMed]
  18. Burton, R.J.; Albur, M.; Eberl, M.; Cuff, S.M. Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC Med. Inform. Decis. Mak. 2019, 19, 171. [Google Scholar] [CrossRef]
  19. Tartar, A.S.; Balin, S.O. Geriatric urinary tract infections: The value of laboratory parameters in estimating the need for bacteremia and Intensive Care Unit. Pak. J. Med. Sci. 2019, 35, 215–219. [Google Scholar] [CrossRef] [PubMed]
  20. Sukri, M. Correlation between Hematologic Parameters and Asymptomatic Bacteriuria in the Full-Term Pregnant Patients in Dr. Kariadi Hospital, Semarang, Indonesia. Diponegoro Int. Med. J. 2022, 3, 7–13. [Google Scholar] [CrossRef]
  21. Nyman, J. Machine Learning Approaches for Detection of Urinary Tract Infections. OpenAIRE—Explore. Available online: https://explore.openaire.eu/search/publication?articleId=od_______264::cd9221b3567c3df8acb3b5d09a4a89b1 (accessed on 9 June 2023).
  22. Dedeene, L.; Van Elslande, J.; Dewitte, J.; Martens, G.; De Laere, E.; De Jaeger, P.; De Smet, D. An artificial intelligence-driven support tool for prediction of urine culture test results. Clin. Chim. Acta 2024, 562, 119854. [Google Scholar] [CrossRef]
  23. Del Ben, F.; Da Col, G.; Cobârzan, D.; Turetta, M.; Rubin, D.; Buttazzi, P.; Antico, A. A fully interpretable machine learning model for increasing the effectiveness of urine screening. Am. J. Clin. Pathol. 2023, 160, 620–632. [Google Scholar] [CrossRef]
  24. Choi, M.H.; Kim, D.; Bae, H.G.; Kim, A.-R.; Lee, M.; Lee, K.; Lee, K.-R.; Jeong, S.H. Predictive performance of urinalysis for urine culture results according to causative microorganisms: An integrated analysis with artificial intelligence. J. Clin. Microbiol. 2024, 62, e0117524. [Google Scholar] [CrossRef]
  25. Choi, M.H.; Kim, D.; Park, Y.; Jeong, S.H. Development and validation of artificial intelligence models to predict urinary tract infections and secondary bloodstream infections in adult patients. J. Infect. Public Health 2024, 17, 10–17. [Google Scholar] [CrossRef]
  26. Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
  27. Moreland, R.B.; Brubaker, L.; Wolfe, A.J. Polymicrobial urine cultures: Reconciling contamination with the urobiome while recognizing the pathogens. Front. Cell. Infect. Microbiol. 2025, 15, 1562687. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.