Brave New World of Artificial Intelligence: Its Use in Antimicrobial Stewardship—A Systematic Review

Antimicrobial resistance (AMR) is a growing public health problem in the One Health dimension. Artificial intelligence (AI) is emerging in healthcare, since it is helpful to deal with large amounts of data and as a prediction tool. This systematic review explores the use of AI in antimicrobial stewardship programs (ASPs) and summarizes the predictive performance of machine learning (ML) algorithms, compared with clinical decisions, in inpatients and outpatients who need antimicrobial prescriptions. This review includes eighteen observational studies from PubMed, Scopus, and Web of Science. The exclusion criteria comprised studies conducted only in vitro, not addressing infectious diseases, or not referencing the use of AI models as predictors. Data such as study type, year of publication, number of patients, study objective, ML algorithms used, features, and predictors were extracted from the included publications. All studies concluded that ML algorithms were useful to assist antimicrobial stewardship teams in multiple tasks such as identifying inappropriate prescribing practices, choosing the appropriate antibiotic therapy, or predicting AMR. The most extracted performance metric was AUC, which ranged from 0.64 to 0.992. Despite the risks and ethical concerns that AI raises, it can play a positive and promising role in ASP.


Introduction
One Health is, according to the One Health High-Level Expert Panel, "an integrated, unifying approach that aims to sustainably balance and optimise the health of people, animals and ecosystems" [1].This inextricable link between these actors applies to various fields of health and, inherently, to the growth of antimicrobial resistance (AMR).
AMR is a growing public health problem due to its effect in reducing the effectiveness of antimicrobial therapy and increasing the severity, incidence, and cost of infection [2].AMR's emergence, evolution, and spread stem from (i) the widespread and inadequate antimicrobial use in animals and clinical practice, (ii) contaminated environments, (iii) and insufficient infection control measures [3].This increases the threat of the emergence of super-resistant bacteria [4].The rapid development and dissemination of the mechanisms of resistance through antibiotic resistance genes (ARGs) to antibiotics used in the clinical setting, adding to the slow and infrequent access to new antimicrobials in recent years, makes AMR one of the most severe threats to global public health in the 21st century.
AMR levels are detected by antimicrobial susceptibility testing (AST).However, this method involves culture of the microorganisms, which can take 2-5 days.This delay in the prescription of the most effective antimicrobials leads to the prolongation of empiric therapy, contributing to the rise of AMR, so measures must be taken to combat this, including improved communication and education about the topic, adequate hygiene for infection control, surveillance practices, antimicrobial stewardship, swifter methods for AMR identification, and the use vaccines and bacteriophages [2,3].
Antimicrobial stewardship programs (ASPs) are a set of interventions aimed at optimizing the use of antimicrobials and, therefore, reducing costs, improving therapeutic outcomes, and reducing AMR [5].ASPs were introduced in 1974 by McGowan and Finland [6], are applied to human healthcare, animal health, and the environment, and involve the optimal selection, dosage, and duration of therapy as well as the control of its use, which can be achieved with programs that recommend the appropriate adjustments.Typically, an ASP may involve pharmacists and infectious diseases physicians, and the tools available for these teams include limiting formularies, restricting certain classes of antimicrobials, cycling of antibiotics, decision support, and staff education about the optimal antimicrobial considering the patient [5].These interventions are primarily used in hospital settings such as in intensive care units (ICUs), pediatrics, and neutropenic patients [7][8][9].Still, efforts should be made for their application in outpatient settings to achieve a significant impact on the reduction of AMR [10].The measurement of the impacts of ASPs can be categorized into antibiotic use, process and quality measures, costs, and clinical outcome measures, with the latter being the most relevant focus in practice [11].There are challenges in implementing ASPs, including a lack of motivation for change and awareness, a lack of oversight and control of antimicrobial use in many countries, and over-the-counter therapy [12].
Artificial intelligence (AI) began developing in the 1950s, and its first use in healthcare was in the form of expert systems, which were based on rules provided by medical experts, but were never applied in practice [13].Machine learning (ML) was developed to overcome the limitation of expert systems that need a large number of rules captured, since ML can find new rules from the data provided, based on their quality and volume [13], benefitting mainly from the enormous amount of health data gathered after the implementation of electronic health records.As some real patient situations are more complex and heterogeneous than a single guideline or the experience of an expert, ML can be a tool used to help decision-making in these situations, since it can analyze a great number of electronic records in a way similar to experts' logical deduction.ML algorithms can be supervised or unsupervised, and some examples include support vector machines, artificial neural networks, random forests, decision trees, and logistic regression [10,14].Previous studies have shown that this technology has been used in numerous healthcare fields, including infectious diseases [13].It has been proved to be useful in prediction [15] and early detection [16] of sepsis, diagnosis of infection [17], prediction of treatment success [18], prediction of antimicrobial resistance [19], and treatment selection [20], meaning that it may be an effective tool to put into practice in antimicrobial stewardship teams, bettering their programs.
This systematic review aims to explore the use of AI in ASPs and summarizes the predictive performance of ML algorithms used in antimicrobial stewardship, compared with clinical and antimicrobial stewardship teams' decisions, in inpatients and outpatients who need antimicrobial prescription.Studies were selected and screened from January 2010 until December 2022 in the electronic bibliographic databases of PubMed, Scopus, and Web of Science by using a combination of terms such as artificial intelligence, antimicrobial resistance, and stewardship.The protocol of this review was registered in the PROSPERO database (CRD42023470594).

Characteristics of the Included Studies
A total of 4658 citations were identified from the three databases and, after removing the duplicates, 2839 were eligible for screening.A total of 1086 articles were assessed for eligibility and eighteen [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37] were included in this systematic review (Figure 1).Most studies were excluded because they did not study the application of machine learning models nor their predictive performance or because they were not applied to hospital inpatients and outpatients with infections, such as studies in vitro or regarding drug development.

Characteristics of the Included Studies
A total of 4658 citations were identified from the three databases and, after removing the duplicates, 2839 were eligible for screening.A total of 1086 articles were assessed for eligibility and eighteen [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37] were included in this systematic review (Figure 1).Most studies were excluded because they did not study the application of machine learning models nor their predictive performance or because they were not applied to hospital inpatients and outpatients with infections, such as studies in vitro or regarding drug development.Characteristics of the eighteen included studies are available in Table 1.All the studies were published since 2016 and in English.One of the studies is an abstract presentation at a congress in video format [37].One of the studies was from a low-/middle-income country [36], with the rest being from high-income countries.The number of features included in the machine learning algorithms ranged from 6 to 788.The patients included were from different settings; one (5.5%)study was designed for outpatients [35], and two were only applied to ICU patients [26,29].The number of patients ranged from 48 (on a validation set) to 382,943.Two [20,34] of the studies had a prospective design, with the remaining being retrospective observational studies.Characteristics of the eighteen included studies are available in Table 1.All the studies were published since 2016 and in English.One of the studies is an abstract presentation at a congress in video format [37].One of the studies was from a low-/middle-income country [36], with the rest being from high-income countries.The number of features included in the machine learning algorithms ranged from 6 to 788.The patients included were from different settings; one (5.5%)study was designed for outpatients [35], and two were only applied to ICU patients [26,29].The number of patients ranged from 48 (on a validation set) to 382,943.Two [20,34] of the studies had a prospective design, with the remaining being retrospective observational studies.

Risk of Bias/Quality Assessment
All the studies were rated as being of "fair quality" by the NIH Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies; fourteen studies were rated as 57.1% and four [22,26,32,35] were rated as 64.3%.The participation rate, variation in amount or level of exposure, and loss to follow-up criteria were not applied to any of the studies.Only one study [35] provided a sample size justification or power description.No study reported information about blinding the assessors, and only three studies [22,26,32] met the criterion on the statistical adjustments of potential confounding variables.The answer to each of the fourteen criteria, as well as the quality rating, are available in Table 3.The features included in the algorithms were divided into the following groups: demographics, adult patients, pediatric patients, clinical, laboratory/microbiological, comorbidities, type of infection, and ICU.The most used features were demographical followed by laboratory/microbiological.Information about the features used in each study is available in Table 2.The most common validation method was k-fold cross-validation (fivefold and tenfold) to avoid overfitting.Not all included studies provided information about handling missing data or methods to avoid overfitting, and two studies did not reference the model validation method [20,37].Corbin C.K. et al. [22] replicated the process on an external validation cohort in Boston.

Risk of Bias/Quality Assessment
All the studies were rated as being of "fair quality" by the NIH Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies; fourteen studies were rated as 57.1% and four [22,26,32,35] were rated as 64.3%.The participation rate, variation in amount or level of exposure, and loss to follow-up criteria were not applied to any of the studies.Only one study [35] provided a sample size justification or power description.No study reported information about blinding the assessors, and only three studies [22,26,32] met the criterion on the statistical adjustments of potential confounding variables.The answer to each of the fourteen criteria, as well as the quality rating, are available in Table 3.
The risk of bias (ROB) and the applicability for model prediction of the eighteen included studies were also assessed by PROBAST (Table 4).Only two studies were ranked as being of "low concern" in the analysis domain [22,24]; six studies were defined as being of "unclear concern" [20,21,23,25,26,29], and ten were ranked as being of "high concern" [27,28,[30][31][32][33][34][35][36][37].In these studies, no information was provided regarding how missing data had been handled.Overall, only one study was rated as having a low ROB [24].Regarding applicability, one study ranked as being of "high concern" and as having a high ROB due to the lack of participant information and lack of definition of the inclusion and exclusion criteria [27].

Predictive Performance of Artificial Intelligence Algorithms
The most evaluated performance metric was AUC, which ranged from 0.64 to 0.992 (the highest value was obtained by the multilayer perceptron).This algorithm also achieved the highest sensitivity (0.967) and specificity (0.992) for auditing appropriate surgical antimicrobial prophylaxis.The highest precision was achieved by the gradient boosted tree, with an average precision of 0.99 for the selection of vancomycin + meropenem.The other main results are available in Table 1.All the studies concluded that ML algorithms were useful to assist antimicrobial stewardship teams in multiple tasks such as identifying inappropriate prescribing practices [20], choosing the appropriate antibiotic therapy [22,23,34,36], auditing surgical antimicrobial prophylaxis [24], predicting personal risk of treatment-induced emergence of resistance [25], estimating patient outcomes under the contrasting scenarios of stopping or continuing antibiotic treatment [26], predicting AMR [27], and identifying patients at low risk of bacterial infections [29].
Regarding the choice of the most appropriate antibiotic therapy, the model with the best performance was random forest, with an area under the curve of 0.80 (95% CI 0.66-0.94)for the prediction of susceptibility to ceftriaxone, 0.74 (0.59-0.89) for ampicillin and gentamicin, and 0.85 (0.70-1.00) for susceptibility to neither [36].
Logistic regression achieved a 67% reduction in second-line antibiotics relative to clinicians and an 18% reduction in inappropriate antibiotic therapy [35].

Main Findings
A systematic review of the utility of AI in antimicrobial stewardship for inpatients and outpatients who needed antimicrobial decisions was conducted, and eighteen studies were included.Logistic regression and random forest were the most used algorithms.AUC was the most common predictive performance measure, and the highest value was obtained by the multilayer perceptron [24].The most studied application of AI in ASPs was the use of AI for choosing the appropriate antibiotic therapy.In one study, the algorithm used was a semi-supervised decision support system [21]; the remaining algorithms applied supervised ML algorithms, which are generally used to make predictions.All the studies concluded that AI algorithms can help choose the best antimicrobial therapy, benefiting, for example, the control of AMR.These results are aligned with what has been found about AI use in infectious diseases, since other systematic reviews summarize its applicability in antimicrobial susceptibility testing [14], predicting antimicrobial resistance [38], prediction of treatment success, diagnosis of infection, and prediction of sepsis [13].

AI and Antimicrobial Stewardship
Although AI can be helpful in addressing the large amount of data gathered nowadays and performing repetitive tasks, there are some risks and ethical concerns that must be considered, for example, the possibility of the algorithm making associations between features and outcomes that are not relevant or are without physiological/clinical rationale, the blind obedience/overdependence on AI, liability, or accountability in case of mistakes [39].Clinical decisions are complex and include factors about the patient, the disease, the economy, or the environment, so the algorithm should not uniquely make the final decision."Black box" is an aspect of AI that raises concerns, since these algorithms cannot explain the underlying mechanism to generate outputs, and we may not know the source of data input.This has a significant impact on transparency and trust [40,41].In response to the rise of AI health technologies, the WHO published six regulatory areas of AI for health, including the transparency of development processes, external data validation, cybersecurity, and data protection [42].The WHO emphasizes the need for collaboration between regulators, patients, healthcare professionals, industry, and governments to ensure the compliance of AI models with regulation.The application of AI on antimicrobial stewardship programs is still very limited, as seen in the few studies included in this systematic review.The methodological heterogeneity and the reduced number of diseases in which AI has been applied on ASPs restrict the widespread use of ML models in antimicrobial stewardship.Tools based on AI for this purpose are still in a development phase before they can be safely implemented in healthcare.
Addressing the perception among some clinicians that the use of AI in antimicrobial stewardship is more of a mirage than a reality necessitates a clear discussion on its evident benefits.Implementing AI requires a calculated investment in technology and skilled data analysts, with the scale dependent on each hospital's needs.A thorough cost/benefit analysis is vital, showcasing the expenses and expected advancements in healthcare efficiency and patient care quality.Embracing AI, despite initial doubts, is crucial for the evolution of antimicrobial stewardship, moving the perception of antimicrobial stewardship from skepticism to accepted implementation.

Limitations of the Studies Included
The research on AI applications in ASP is mostly from high-income countries, which can introduce bias on the algorithms and inequalities in healthcare because it does not represent the entire population [43].This may happen because low-and middle-income countries may face more challenges to implement systems allowing for the collection of large amounts of structured health data, access to health is scarcer, and the financial support for implementing AI algorithms needs to be improved.Efforts should be made to include data from these populations in training and validation datasets.
There needs to be a publicly recognized tool for quality and risk assessment of ML prediction models.PROBAST and the National Institute of Health (NIH) Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies were used for a more complete assessment.For most studies, there was a lack of information about sample size justification or power description and a poor description of the statistical adjustment of confounders.This is a concern, since AI algorithms can provide biased results if the information input is subject to uncontrolled biases.Bolton et al. [26] consider that the models may have "learned" the association between less severe patients receiving fewer antibiotics and, therefore, having a shorter ICU length of stay, causing some confounding.PROBAST assessment of ROB raises concerns, since only one study was ranked as "low concern".This is mainly due to the analysis domain, as not all the included studies provided information about the handling of missing data or the methods to avoid overfitting, and two studies [20,37] did not report information on validation methods.One of the studies performed external validation of the model [22], which raises concerns about the generalizability of the algorithms used in the other studies.G. Eickelberg and colleagues [29] state that their future research will focus on external validation and clinical utility assessment of the models.The lack of participant information and the definition of the inclusion and exclusion criteria also raise concerns about the applicability and biases of the study's conclusions [27].It is relevant to note that providing participants' information can minimize or highlight biases that can influence the application of the algorithms in specific populations in which they were not studied.Kanjilal et al. [35] admit this limitation in their study.The features selected for the ML algorithms were adequate, since they gather information that influences therapy decisions and patients' outcomes, mainly raising low concerns.Studies on AI use in health should provide all the features included so there is more transparency and understanding of the processes involved.This will allow for analysis of whether the features have a medical reasoning behind the clinical outcome.

Limitations of the Review
There are some limitations to this review.The literature search was limited to PubMed, Web of Science, and Scopus articles, with no other bibliographic databases having been searched.Although this is a recent research topic, this information can be quickly complemented with more recent data.
Publication bias is a possible limitation of this review, since it is likely that the studies with more favorable results have higher chances of being accepted for publication.Due to the diversity of the included studies (including differences in outcomes, assessed features, and the algorithms used), we could not perform meta-analysis.
It must be kept in mind that the AI algorithms are not implemented to substitute the healthcare professionals who make up antimicrobial stewardship teams but rather to assist in decision-making, mainly when a considerable amount of health data are gathered every day.
In the future, it would be interesting to research the integration of AI in ASPs, its adoption by healthcare professionals, usability and applicability, and their knowledge about the potential of using AI as a tool [44].

Materials and Methods
The systematic review was carried out in accordance with the Cochrane Handbook for Systematic Reviews of Interventions [45]; in addition, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [46] checklist for the review (Supplementary Materials, Table S1).

Data Source and Search Strategy
The electronic bibliographic databases of PubMed, Scopus, and Web of Science were searched using a combination of MeSH terms and/or keywords regarding broad domains such as artificial intelligence, antimicrobial resistance, and stewardship.For this search strategy, the following query was used: ("artificial intelligence" OR "machine learning" OR "deep learning") AND ("antibiotic resistance" OR "antibiotic resistant" OR "antifungal resistance" OR "antifungal resistant" OR "antimicrobial resistance" OR "antimicrobial resistant" OR "antibiotic susceptibility" OR "antifungal susceptibility" OR "antimicrobial susceptibility" OR "drug resistance" OR "drug resistant").Additionally, and to avoid any bibliography loss, the terms ("artificial intelligence" OR "machine learning" OR "deep learning") AND (stewardship) were included.Studies were selected and screened from January 2010 until December 2022, when the search results were last consulted.The search included all publication types except reviews or systematic reviews, and no language restrictions were applied.

Eligibility Criteria
Studies were included in this review if they assessed the performance of artificial intelligence models in ASP applied to hospital inpatients and outpatients with infections that needed antimicrobial treatment.We excluded (1) studies conducted only in vitro; (2) studies addressing non-infectious diseases such as cancer, epilepsy, or other neurologic diseases; (3) studies addressing the application of AI in food or animal production, drug development, disease diagnostic or survival or studies focusing on HIV, parasitic diseases, or tuberculosis; and (4) studies not focusing on bacterial infections.
This review intended to study the performance of AI algorithms for antimicrobial stewardship.The question being addressed can be expressed as follows: P: Inpatients and outpatients who need an antimicrobial prescription; I: Machine learning models used in antimicrobial stewardship; C: Clinical or antimicrobial stewardship teams' decision; O: Predictive performance of ML algorithms (area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), etc.).

Data Extraction and Synthesis
The extracted studies were uploaded to EndNote TM 20 and Rayyan software [47] for duplicate removal, quality assessment, and further selection.Studies were selected first by title and abstract screening and then by full text reading.Both processes were independently performed by two reviewers (RPS and SCO) in a blinded, standardized manner.Eighteen studies were included in the systematic review (Figure 1).
A form was developed to extract the data from the included studies uniformly and consistently.We retrieved data on the study type, year of publication, country, study time frame, target population (demographic data), number of patients, hospital setting, type of infection, study objective, ML algorithms used, training data sets, number of features, data source (clinical and/or laboratory data), predictors, performance validation and metrics (AUC, sensitivity, specificity, and clinical outcome.Two authors (RPS and SCO) extracted data from primary studies independently.

Risk of Bias (ROB) Assessment
To evaluate the risk of bias of the studies included in this review, the National Institute of Health (NIH) Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies and PROBAST (a tool to assess the risk of bias and applicability of prediction model studies) were used [48][49][50].A three-point scale was used to grade the potential source of bias as good, fair, or poor.Regarding PROBAST, the risk of bias and applicability were assessed focusing on four domains (participants, predictors, outcomes, and analysis), which were evaluated for each included study.The risk of bias was defined as "high risk/concern" if the item's answer was "No" or "Probably no" and "Unclear risk" if relevant information was absent.No studies were excluded based on quality.ROB assessment was performed independently by all authors.

Data Analysis
The predictive performance of the AI algorithms was extracted as some of these metrics: area under the curve, specificity, sensitivity, precision, accuracy (Table 1).
A meta-analysis was not conducted, due to the heterogeneity between the populations, algorithms, features, and aim of the studies included.

Conclusions
This systematic review focuses on various tasks where AI can be a supplemental tool for antimicrobial stewardship teams, benefiting the patient and the healthcare providers.It can assist in the identification of inappropriate prescriptions, the choice of appropriate antibiotic therapy, or the estimation of patient outcomes.This is essential in the One Health dimension, because preventing AMR and multiresistant microorganisms in humans interdependently benefits the health of animals, plants, and ecosystems.The supervised machine learning module of antimicrobial prescription surveillance systems and random forest could be useful tools for guiding the most appropriate antibiotic therapy.AI can assist antimicrobial stewardship teams, aiming at better control of AMR; thus, AI can be a valuable tool against this growing global health issue.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Figure 1 .
Figure 1.PRISMA flowchart representing the systematic search of the relevant studies.

Figure 1 .
Figure 1.PRISMA flowchart representing the systematic search of the relevant studies.

Figure 3 .
Figure 3. Frequency of the most used performance metrics (AUC-area under the curve; PPVpositive predictive value; NPV-negative predictive value; TP-true positive; FP-false positive; FN-false negative; TN-true negative).

Figure 3 .
Figure 3. Frequency of the most used performance metrics (AUC-area under the curve; PPV-positive predictive value; NPV-negative predictive value; TP-true positive; FP-false positive; FN-false negative; TN-true negative).

Table 1 .
Characteristics of the included studies.

Table 3 .
Risk of bias assessment of the included studies by NIH Quality Assessment Tool for Obser-

Table 2 .
Characteristics of the features of the included studies.

Table 3 .
Risk of bias assessment of the included studies by NIH Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies.
Notes: NA-not applicable; NR-not reported.