Next Article in Journal
Ventilator-Associated Tracheobronchitis: To Treat or Not to Treat?
Previous Article in Journal
Overview of Evidence of Antimicrobial Use and Antimicrobial Resistance in the Food Chain
Open AccessArticle

Using Machine Learning Techniques to Aid Empirical Antibiotic Therapy Decisions in the Intensive Care Unit of a General Hospital in Greece

1
School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
2
IT Department, Sismanogleio General Hospital, 15126 Marousi, Greece
3
Department of Quality Control, Research and Continuing Education, Sismanogleio General Hospital, 15126 Marousi, Greece
4
Intensive Care Unit, Sismanogleio General Hospital, 15126 Marousi, Greece
5
Microbiology Laboratory, Sismanogleio General Hospital, 15126 Marousi, Greece
6
2nd Internal Medicine Department, Sismanogleio General Hospital, 15126 Marousi, Greece
7
1st Internal Medicine Department, Sismanogleio General Hospital, 15126 Marousi, Greece
8
1st Surgery Department, Sismanogleio General Hospital, 15126 Marousi, Greece
*
Author to whom correspondence should be addressed.
Antibiotics 2020, 9(2), 50; https://doi.org/10.3390/antibiotics9020050
Received: 8 January 2020 / Revised: 26 January 2020 / Accepted: 27 January 2020 / Published: 31 January 2020
(This article belongs to the Section Antibiotics Use and Antimicrobial Stewardship)

Abstract

Hospital-acquired infections, particularly in the critical care setting, have become increasingly common during the last decade, with Gram-negative bacterial infections presenting the highest incidence among them. Multi-drug-resistant (MDR) Gram-negative infections are associated with high morbidity and mortality with significant direct and indirect costs resulting from long hospitalization due to antibiotic failure. Time is critical to identifying bacteria and their resistance to antibiotics due to the critical health status of patients in the intensive care unit (ICU). As common antibiotic resistance tests require more than 24 h after the sample is collected to determine sensitivity in specific antibiotics, we suggest applying machine learning (ML) techniques to assist the clinician in determining whether bacteria are resistant to individual antimicrobials by knowing only a sample’s Gram stain, site of infection, and patient demographics. In our single center study, we compared the performance of eight machine learning algorithms to assess antibiotic susceptibility predictions. The demographic characteristics of the patients are considered for this study, as well as data from cultures and susceptibility testing. Applying machine learning algorithms to patient antimicrobial susceptibility data, readily available, solely from the Microbiology Laboratory without any of the patient’s clinical data, even in resource-limited hospital settings, can provide informative antibiotic susceptibility predictions to aid clinicians in selecting appropriate empirical antibiotic therapy. These strategies, when used as a decision support tool, have the potential to improve empiric therapy selection and reduce the antimicrobial resistance burden.
Keywords: antibiotic resistance; antimicrobial resistance; intensive care unit; ICU; machine learning; prediction; artificial intelligence; ML techniques antibiotic resistance; antimicrobial resistance; intensive care unit; ICU; machine learning; prediction; artificial intelligence; ML techniques

1. Introduction

The rapid emergence of antibiotic-resistant infections during the last decade constitutes a worldwide problem with increasing health and economic costs [1]. As stated in a recently published European Centre for Disease Prevention and Control (ECDC) study, about 33,000 people die each year as a direct consequence of an infection due to bacteria resistant to antibiotics [2]. Healthcare associated infections (HAIs) account for the major burden of these multidrug-resistant infections, while last-line treatments, such as carbapenems and colistin, become less effective, eliminating the available therapeutic options [3].
Data from the European Antimicrobial Resistance Surveillance Network (EARS-Net) suggest that in 2015, Greece was among the countries with the greatest burden of infections due to antibiotic-resistant bacteria in the EU and European Economic Area (EEA) [4], with carbapenem- and colistin-resistant infections presenting the major problem [2,4,5]. The Hellenic Center for Disease Control and Prevention (HCDCP) in 2014 reported a mean incidence of 0.48 per 1000 patient-days, and a crude 28-day mortality rate of 34.4%, caused by carbapenem-resistant Gram-negative pathogens in acute care hospitals in Greece [6].
In our recent study [7], we compared the resistance levels of Pseudomonas aeruginosa, Acinetobacter baumannii, and Klebsiella pneumoniae isolates between the intensive care unit (ICU) and other facilities in two consecutive years (2017 and 2018), in one of the largest public tertiary hospitals in Greece, to implement more effective strategies for the reduction of multidrug resistance. By using the same antimicrobial susceptibility dataset from the Microbiology Laboratory, we proposed a methodology [8] that enables clinicians to select the most appropriate antibiotic based on statistically significant sensitivity results, which are specific for their own department.
Many hospitals focus on early detection of serious infections, especially in ICUs. It has been shown that the earlier the proper antibiotic treatment starts, the lower the mortality rate [9,10]. From a clinical point of view, the detection of antimicrobial resistance before culture and sensitivity results are available will reduce the time required to take important actions, such as isolating the patient or initiating appropriate empirical therapy.
Advances in artificial intelligence (AI) have transformed the healthcare innovation environment, contributing to improved health outcomes while reducing healthcare costs. AI is now calling to explore new possibilities in healthcare that were previously regarded as not feasible. For example, due to the digitization of health records, mining of unstructured medical data is now possible and, using this, clinicians can readily make various evidence-based decisions.
Machine learning (ML) techniques could be used to establish a clinical decision support system to aid clinicians to make effective choices. The scientific literature review shows promising results in the use of ML techniques in healthcare, particularly in antimicrobial resistance research [11,12,13,14]. In this article, we propose using ML techniques to predict antimicrobial resistance based only on data available in the hospital information system of the Microbiology Laboratory, such as the type of sample, Gram stain, and previous antibiotic susceptibility testing together with patient demographics (age/gender).

2. Methodology and Results

We analyzed, in a 2-year period (2017 and 2018), the data of the Microbiology Laboratory from ICU patients in a public tertiary hospital in Greece. The dataset of 23,067 instances contain the attributes of gender (binary), age (numerical), type of sample (categorical), Gram stain (binary), antibiotics (categorical), and finally the class attribute, which in our case is the antimicrobial susceptibility (binary). The samples examined were blood, tracheobronchial aspirates/ bronchoalveolar lavage fluid, urine, skin/wounds/soft tissue specimens, intravascular catheters, and pleural and peritoneal fluid. In the present study, clinical data of the patients, such as the source of infection acquisition (e.g., community or hospital acquired), and the presence of active infection or colonization, have not been included. The following table (Table 1) includes simple summary statistics of our dataset.
Among the many existing machine learning systems, we have chosen to use (in this study) the WEKA—Data Mining Software in Java Workbench [15]. It is one of the most popular open-source machine learning toolkits and contains a wide range of learning algorithms.
To assess the performance of the final model [16], some data must be set aside and not used during training so that we may compare what is known about these data to what our algorithms will predict. This is the test set. If we use all of our data to train a model, and then use the same data for testing, we run the risk of learning tiny details, which will be of little use with new data.
A good way to make the most of our data is to use all of our data for training as well as for testing, but not at the same time. To do this, we divide our data into a number of equal-sized subsets, called folds. For each fold, we remove it from the training set, build a model on the other folds, and then test on the withheld portion. If we have k folds, then this is called k-fold cross-validation. Cross-validation is widely regarded as a reliable way to assess the quality of results from machine learning techniques when data are all in one set. In our analysis, we have used 10-fold cross-validation.
To find the best classifier, we consider the following quantities, as reported by WEKA [17,18]:
(a)
TP Rate: rate of true positives (instances correctly classified as a given class);
(b)
FP Rate: rate of false positives (instances falsely classified as a given class);
(c)
Precision: the proportion of instances that are truly of a class divided by the total instances classified as that class;
(d)
Recall: the proportion of instances classified as a given class divided by the actual total in that class (equivalent to TP rate);
(e)
F-Measure: a general indicator of the quality of the model;
(f)
MMC: a correlation coefficient calculated from all four values of the confusion matrix.
(g)
Area under the Receiver Operating Characteristics (ROC) curve (AUC): a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The accuracy of the test depends on how well the test separates the group being tested into those with and without the disease in question. Accuracy is measured by the area under the ROC curve;
(h)
The Precision-Recall Plot (PRC) plot shows the relationship between precision and sensitivity.

2.1. LIBLINEAR-L2-Regularized L1- and L2-loss Support Vector Classification (SVC)

LIBLINEAR is an open-source library for large-scale linear classification [19]. It supports logistic regression and linear support vector machines.
Given training vectors x i R n ,   i = 1 , l in two classes, and a vector y R l such that y i = { 1 , 1 } , a linear classifier generates a weight vector w as the model. The decision function is
s g n ( w T x )
L2-regularized L2-loss SVC solves the following primal problem:
min w 1 2 w T w + C i = 1 l ( max ( 0 ,   1 y i w T x i ) ) 2
and its dual form is:
min a   1 2 a T Q ¯ a e T a
subject   to   0 a i U ,   i = 1 , l
where e is the vector of all ones, Q ¯ = Q + D ,   D is a diagonal matrix, and Q i j = y i y j x i T x j
For   L 2 - loss   SVC ,   U =   and   D i i = 1 2 C ,   i
The results of applying this technique are shown in the following table (Table 2).

2.2. LIBSVM C-Support Vector Classification

Support Vector Machines (SVMs) are a set of related supervised learning methods, which are popular for performing classification, regression, and other learning tasks. LIBSVM [20] is an integrated software for SVMs classification. One of the SVM formulations of LIBSVM is the C-Support Vector Classification. Given training vectors x i R n ,   i = 1 , l , in two classes, and a vector y R l such that y i = { 1 , 1 } , C-SVC [21,22] solves the following primal optimization problem.
min w , b , ξ   1 2 w T w + C i = 1 l ξ i
subject   to   y i ( w T ϕ ( x i ) + b ) 1 ξ i ,   ξ i 0 ,   i = 1 , l
where ϕ ( x i ) maps x i into a higher-dimensional space, and C > 0 is the regularization parameter. The corresponding dual form is:
min a   1 2 a T Q ¯ a e T a
subject   to   y T a = 0 ,   0 a i C ,   i = 1 , l
where e is the vector of all ones, Q is an l by l positive semidefinite matrix,
Q i j = y i y j ( ϕ ( x i ) T ϕ ( x j ) )
by using the primal-dual relationship, the optimal w satisfies
w = i = 1 l y i a i   ϕ ( x i )
and the decision function is
s g n   ( w T ϕ ( x i ) + b ) = s g n   ( i = 1 l y i a i   ( ϕ ( x i ) T ϕ ( x ) ) + b )
The results of applying this technique are shown in the following table (Table 3).

2.3. Sequential Minimal Optimization (SMO)

Sequential Minimal Optimization (SMO) [23] is a simple algorithm that quickly solves the SVM quadratic programming (QP) optimization problem without extra matrix storage and without invoking an iterative numerical routine for each sub-problem. SMO chooses to solve, at every step, the smallest possible problem of optimization. The smallest possible problem of optimization involves two Lagrange multipliers for the standard SVM QP problem because the Lagrange multipliers must obey a linear constraint of equality. SMO selects two Lagrange multipliers to optimize together at each step, finds the optimal values for these multipliers, and updates the SVM to reflect the new optimal values [24].
The results of applying this technique are shown in the following table (Table 4).

2.4. Instance-Based Learning (k-Nearest Neighbors)

Instance-based learning approaches [25], such as the k-nearest neighbors (kNN) algorithm, adopt a straightforward approach to estimate real or discrete-valued target functions [26,27]. Predicting the output of a new input vector involves collecting and aggregating outputs from similar instances from the saved training data. Unlike many other techniques that create only one local approximation to the target function, an important advantage of instance-based algorithms is that the model can build a new approximation to the target function for each new query instance. This gives instance-based algorithms the ability to capture very complicated relationships between attributes and outputs. If the target variable depends only on a few of the attributes, this can cause very similar instances to be predicted at a large distance [28,29].
The results of applying this technique are shown in the following tables (Table 5 and Table 6).

2.5. J48

The classification algorithm J48 is the implementation of the Quinlan C4.5 algorithm [30]. C4.5 uses the gain ratio for feature selection and to construct the decision tree. The C4.5 algorithm for building decision trees is implemented in WEKA as a classifier called J48. C4.5 can be referred to as the statistic classifier. It handles both continuous and discrete features. The C4.5 algorithm is widely used because of its quick classification and high precision.
The results of applying this technique are shown in the following table (Table 7).

2.6. Random Forest

The random forest machine learner is a meta-learner, meaning, consisting of many individual learners (trees). The random forest uses multiple random tree classifications to vote on an overall classification for the given set of inputs. In general, each individual machine learner vote is given equal weight. In Breiman’s later work [31], this algorithm was modified to perform both unweighted and weighted voting. The forest chooses the individual classification that contains the most votes.
A random forest is a classifier consisting of a collection of tree-structured classifiers { h ( x , Θ κ ) ,   κ = 1 , } where the { Θ κ } are independent, identically distributed random vectors, and each tree casts a unit vote for the most popular class at input x .
The results of applying this technique are shown in the following table (Table 8).

2.7. RIPPER

RIPPER [32] is an acronym for repeated incremental pruning to produce error reduction. Classes are analyzed in increasing size and use incremental reduced-error pruning to produce an initial set of rules for the class. This adds an extra stop condition that depends on the description length (DL) of the examples and the set of rules [33]. The formula of description length (DL) takes into account the number of bits required to send a set of examples with respect to a set of rules, the number of bits required to send a rule with k conditions, and the number of bits needed to send the integer k—times an arbitrary factor of 50 percent, to compensate for potential inconsistency in the attributes.
The results of applying this technique are shown in the following table (Table 9).

2.8. Multilayer Perceptron (MLP)

A classifier that uses backpropagation to learn a multi-layer perceptron to classify instances. MLP is an artificial neural network model that maps input data to a set of suitable outputs [15,16]. This type of neural network is known as a supervised network because, in order to learn, it needs a desired output. The goal of this type of network is to create a model that correctly maps the input to the output using historical data, so that when the desired output is unknown, the model can be used to generate the output. This consists of multiple layers of nodes in a directed graph, as its name suggests, with each layer fully connected to the next. The network can be built by hand or set up using a simple heuristic. The nodes in this network are all sigmoid.
The results of applying this technique are shown in the following table (Table 10).
According to Table 11, considering the weighted average values, it can be seen that Multilayer perceptron and J48 (C4.5) algorithms outperform other models, with respect to the ROC area, with values of 0.726 and 0.724, respectively. RIPPER is the best at F-measure value with a value of 0.678.

3. Discussion

It is well known that the ICU environment presents the greatest burden of multidrug-resistant infections among hospital wards. As time is critical, rapid confirmation of the pathogen and its susceptibility profile warrants tailored and effective therapy and increases the chance of a favorable outcome [9,10].
Recently, machine learning (ML) algorithms have been proposed to predict antibiotic resistance phenotypes based on genomic features analysis with promising results [34,35]. The implementation of these techniques is nevertheless more expensive and complicated compared to standard antibiotic susceptibility testing.
The aim of the present study was to investigate whether readily available susceptibility data from the Microbiology Department, together with simple demographic data, could be used in an algorithm to predict antibiotic resistance and guide antibiotic empirical prescription in critically ill patients in a timely and cost-effective manner.
The methods proposed in this paper will allow us to anticipate culture and sensitivity results from the Microbiology Laboratory. The early detection of patients at high risk for resistance to one or more families of antibiotics may lead to useful knowledge of the patient and hospital ecosystem, and subsequent better management of the healthcare resources. Firstly, it could support the physician in selecting the appropriate empiric therapy as an immediate benefit. On the other hand, targeted empirical therapy may limit antibiotic misuse and, over time, reduce the prevalence of antibiotic-resistant bacteria. In addition, patients with multidrug-resistant infections could be isolated to prevent potential outbreaks of resistant bacteria, and thus, avoid inadvertent spread to other ICU patients. Such intervention will result in lower mortality, lower workload, lower hospital costs, and a decrease in infections during ICU stays.
Our methodology is based solely on data of the Microbiology Laboratory that already exists in the hospital’s Laboratory Information System. Similar studies [11,12,13,14] use ML techniques to predict antimicrobial susceptibility with many more attributes, including clinical data of the patients, and other useful information related to the domain examined. The purpose of our study is to present a low-cost approach that may be used in any ICU, requiring only the existence of an elementary information system of the Microbiology Laboratory (sometimes that could be a simple database). Among the various ML models examined, the best performance achieved was 0.726, which means that we can predict susceptibility to a specific antibiotic with an accuracy of 72.6%, based solely on the source of the specimen and the presumed site of infection, the Gram stain of the pathogen, and previous susceptibility data. Of course, the performance of the techniques that we present in this study will be substantially improved if the antimicrobial susceptibility datasets include the patient’s clinical information as well. Additionally, we also note that, had this research been conducted with the view of actually providing information that would be integrated into the clinician’s everyday practice, a more professional data processing package would have been required and, substantially, more studies would have to be conducted to boost the statistical confidence of our results. For example, a more thorough line of investigation could have aimed to assess (and, subsequently, control) the degree of bias possibly introduced due to the existence of multiple samples from a given patient, since this raises the possibility that patients with one resistant organism (or with an organism with resistance to a specific antibiotic) will have other resistant organisms (or the same resistance mechanisms in multiple species of bacteria) due to shared (unmeasured) risk factors, and/or horizontal gene transfer. While there do exist techniques, like boosting, which can reduce bias, we expect to examine them in future work and, at this stage, as it stands, we consider our results promising from the point of view of demonstrating the apparent feasibility and relative ease with which readily available data can be utilized to provide rule-of-thumb actionable information to time-pressed clinicians. Thus, the key message from our investigation is that, even with the most elementary data, one can take several steps towards improving the ICU performance.

4. Materials and Methods

This study examines the performance of eight machine learning models based on data of the Microbiology Laboratory from ICU patients in a public tertiary hospital in Greece. It is a general 12-bed ICU with mixed medical and surgical cases.

4.1. Samples-Source of Isolates

During the two years (January 2017–December 2018), a total of 888 clinical samples from 345 ICU patients were included in this study and processed by the Microbiology Laboratory according to established protocols [36,37,38]. The types of samples examined and their percentages are presented in Section 2 (Table 1). Blood cultures were incubated in the BacT/Alert system (bioMerieux). Isolation and identification of pathogens were carried out according to classical microbiological procedures [39].

4.2. Antimicrobial Susceptibility Data

Antimicrobial susceptibility testing was performed by the MicroScan system (Siemens), according to Clinical and Laboratory Standards Institute (CLSI) guidelines [40,41] and the results were confirmed, when necessary, using a gradient minimum inhibitory concentration (MIC) determining method following the manufacturer’s guidelines (e.g., the E-test bioMerieux, Sweden). MICs of colistin retested via microtiter plates (SensiTestColistin, Liofilchem). Sensitivity and resistance breakpoints for the antibiotics were determined according to CLSI interpretive criteria [40,41] and for tigecycline and fusidic acid, according to Eucast ones [42]. Escherichia coli ATCC 25922 strain, Pseudomonas aeruginosa ATCC 27853, and Staphylococcus aureus ATCC 29213 and ATCC 25923 were used as quality control strains for susceptibility testing.
The phenotypic detection of the production of extended-spectrum beta-lactamases (ESBL) was performed by the double-disk synergy test (DDST), according to CLSI guidelines [40]. Metallo-beta-lactamases (MBL) and carbapenemases (KPC) were detected phenotypically by (a) the modified odge test [40], (b) the combined disk test, with a meropenem (MER) disk alone, a MER disk plus phenyl boronic acid (PBA), a MER disk plus EDTA, and a MER disk plus PBA and EDTA, as described by Tsakris et al. [43], and c) the NG CARBA 5 immunochromatographic assay, targeting KPC-, NDM-, VIM-, and IMP-type and OXA-48-like carbapenemases, following the manufacturer’s guidelines (data presented at 29th European Congress of Clinical Microbiology & Infectious Diseases (ECCMID) 2019 [44]. P. aeruginosa strains were tested phenotypically for MBL, either by a combined disk test using the imipenem (IPM) disk, and IPM plus EDTA, as described by Yong et al. [45], or by an IPM-EDTA double-disk synergy test (DDST), as described by Lee et al. [46]. All strains that phenotypically produced more than one or no carbapenemases, the oxa producers, and all those tested with NG CARBA 5, were subject to PCR for blaNDM, blaVIM, blaKPC, and blaOXA-48 genes. They were also examined for the presence of the plasmid-mediated mcr-1 gene for colistin-resistance (data presented at 28th and 29th ECCMID 2019 [44,47]).

4.3. Bacterial Pathogens and Antibiotics

The resistance for P. aeruginosa was measured based on the following antibiotics: amikacin, aztreonam, cefepime, ceftazidime, ciprofloxacin, colistin, gentamicin, imipenem, meropenem, doripenem, piperacillin/tazobactam, tobramycin, and levofloxacin. P. aeruginosa strains presented the highest resistance rates to gentamycin (57.97%) and cefepime (56.67%), followed by fluoroquinolones (55.11%) and carbapenems (55.02%) [7].
The resistance for A. baumannii was measured based on the following antibiotics: amikacin, ampicillin/sulbactam, cefepime, cefotaxime, ceftazidime, ciprofloxacin, colistin, gentamicin, imipenem, levofloxacin, meropenem, minocycline, tobramycin, trimethoprim/sulfamethoxazole, tetracycline, and tigecycline. A high resistance rate of over 80% of A. baumannii isolates to most classes of antibiotics was identified, with the lowest resistance rates reported to colistin (53.37%) [7].
The resistance for K. pneumoniae was measured based on the following antibiotics: amikacin, amoxicillin/clavulanic acid, ampicillin/sulbactam, cefepime, cefotaxime, cefoxitin, ceftazidime, cefuroxime, ciprofloxacin, colistin, ertapenem, gentamicin, imipenem, meropenem, piperacillin/tazobactam, tetracycline, tobramycin, trimethoprim/sulfamethoxazole, levofloxacin, moxifloxacin, and tigecycline. The highest resistance rate was reported to older beta lactams/beta lactamase inhibitors (amp/sulb 85.98%), fluoroquinolones (up to 83.04%), carbapenems (up to 81.44%) and third generation cephalosporins (up to 81.61%) [7].
The resistance for Achromobacter xylosoxidans was measured based on the following antibiotics: amikacin, aztreonam, ceftazidime, gentamicin, imipenem, meropenem, minocycline, piperacillin/tazobactam, tobramycin, levofloxacin, and trimethoprim/sulfamethoxazole.
The resistance for Enterobacter aerogenes was measured based on the following antibiotics: amikacin, amoxicillin/ clavulanic acid, ampicillin/sulbactam, cefepime, cefotaxime, cefoxitin, ceftazidime, cefuroxime, ciprofloxacin, colistin, ertapenem, gentamicin, imipenem, levofloxacin, meropenem, moxifloxacin, piperacillin/tazobactam, tetracycline, tobramycin, and trimethoprim/sulfamethoxazole.
The resistance for Enterobacter cloacae was measured based on the following antibiotics: amikacin, amoxicillin/ clavulanic acid, ampicillin/sulbactam, aztreonam, cefepime, cefotaxime, cefoxitin, ceftazidime, ceftriaxone, cefuroxime, ciprofloxacin, colistin, ertapenem, gentamicin, imipenem, levofloxacin, meropenem, moxifloxacin, piperacillin/tazobactam, tetracycline, tobramycin, and trimethoprim/sulfamethoxazole.
The resistance for E. coli was measured based on the following antibiotics: amikacin, amoxicillin/clavulanic acid, ampicillin, ampicillin/sulbactam, aztreonam, cefepime, cefotaxime, cefoxitin, ceftazidime, ceftriaxone, cefuroxime, ciprofloxacin, colistin, ertapenem, gentamicin, imipenem, levofloxacin, meropenem, moxifloxacin, piperacillin/tazobactam, tetracycline, tobramycin, and trimethoprim/sulfamethoxazole.
The resistance for Proteus mirabilis was measured based on the following antibiotics: amikacin, amoxicillin/clavulanic acid, ampicillin, ampicillin/sulbactam, aztreonam, cefepime, cefotaxime, cefoxitin, ceftazidime, ceftriaxone, cefuroxime, ciprofloxacin, ertapenem, gentamicin, imipenem, levofloxacin, meropenem, moxifloxacin, piperacillin/tazobactam, tobramycin, and trimethoprim/sulfamethoxazole.
The resistance for Stenotrophomonas maltophilia was measured based on the following antibiotics: ceftazidime, levofloxacin, minocycline, and trimethoprim/sulfamethoxazole.
The resistance for Enterococcus faecalis was measured based on the following antibiotics: amoxicillin/clavulanic acid, ampicillin, ciprofloxacin, daptomycin, gentamicin500, levofloxacin, linezolid, pristinamycin, quinupristin/dalfopristin, rifampin, streptomycin 1000, teicoplanin, tetracycline, and vancomycin.
The resistance for Enterococcus faecium was measured based on the following antibiotics: amoxicillin/clavulanic acid, ampicillin, ciprofloxacin, daptomycin, gentamicin 500, levofloxacin, linezolid, pristinamycin, quinupristin/dalfopristin, rifampin, streptomycin 1000, teicoplanin, tetracycline, and vancomycin.
The resistance for S. aureus was measured based on the following antibiotics: ceftaroline, ciprofloxacin, clindamycin, daptomycin, erythromycin, fusidic acid, gentamicin, levofloxacin, linezolid, oxacillin, penicillin, pristinamycin, quinupristin/dalfopristin, rifampin, teicoplanin, tetracycline, tobramycin, trimethoprim/sulfamethoxazole, and vancomycin. Methicillin-resistance (MRSA) was found in 21.12% of the total S. aureus isolates.
Our research focuses only on the antibiotics mentioned above since there is an adequate number of samples for these for deducing reliable conclusions for the models that were examined. The incidence of multidrug (MDR) or extensively drug resistant (XDR) bacteria was not examined in the present study. In the present study, bacteria were assigned as sensitive or resistant against each antibiotic tested. As mentioned in Section 4.2, phenotypical detection of ESBL and KPC production was performed, but these results were not included in the dataset used in ML models.

5. Conclusions

In this paper, we evaluated a collection of very popular learning classifiers on an ICU antimicrobial susceptibility dataset. The best results achieve an F-measure of 0.678 with the RIPPER algorithm and an ROC area of 0.726, with the Multilayer perceptron classifier. The experimental results demonstrate that, especially, the Multilayer perceptron and J48 (C4.5) algorithms are suitable models for ICU antimicrobial susceptibility data sets with the evaluation of ROC Area results. The decision to use one of these techniques as an assistant depends mainly on whether the ICU places a premium on the accuracy or explainability, though there do exist approaches that attempt to bridge these preferences. Given the fact that the algorithms presented contain only a few variables, retrieved solely from the Microbiology Department without adjuvant clinical data, the best performances achieved were not high enough to characterize our techniques widely applicable. Despite limitations of the study, our primary goal was to take advantage of these data using ML techniques and possibly create an inexpensive ancillary tool to aid the clinician in identifying patients carrying antibiotic-resistant bacteria and guide proper therapy with greater confidence in situations where there is significant uncertainty and a crucial decision needs to be taken.
In future work, we will focus on enriching our datasets with clinical attributes as well as investigating the configurations, which would improve the algorithms’ performances.

Author Contributions

Conceptualization, G.F., A.S., K.V., M.M., and D.K.; methodology, A.S., G.F., D.K., M.M., E.L., N.S., C.C., M.L., A.V., K.V., K.A., and S.M.; software, G.F., E.L.; validation, A.S., G.F., D.K., M.M., E.L., N.S., C.C., A.V., and S.M.; formal analysis, A.S., G.F., M.M., E.L., N.S., M.L., C.C., A.V., K.A., and S.M.; investigation, A.S., M.M., G.F., D.K., C.C., M.L., E.L., N.S., A.V., K.A., and S.M.; resources, S.P., G.F., M.M., and K.V.; data curation, G.F., E.L., S.P., and A.V.; writing—original draft preparation, G.F., A.S., D.K., N.S., M.M., M.L., C.C., M.M., A.V., E.L., K.A., and S.M.; writing—review and editing, G.F., D.K., A.S., N.S., M.L., C.C., M.M., A.V., K.A., and S.M.; visualization, G.F. and E.L.; supervision, G.F., A.S., K.V., M.M., and S.P.; project administration, G.F., S.P., A.S., M.M., and K.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the four anonymous reviewers whose comments/suggestions helped improve and clarify this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gandra, S.; Barter, D.M.; Laxminarayan, R. Economic burden of antibiotic resistance: How much do we really know? Clin. Microbiol. Infect. 2014, 20, 973–979. [Google Scholar] [CrossRef]
  2. Cassini, A.; Högberg, L.D.; Plachouras, D.; Quattrocchi, A.; Hoxha, A.; Simonsen, G.S.; Colomb-Cotinat, M.; Kretzschmar, M.E.; Devleesschauwer, B.; Cecchini, M.; et al. Burden of AMR Collaborative Group Attributable deaths and disability-adjusted life-years caused by infections with antibiotic-resistant bacteria in the EU and the European Economic Area in 2015: A population-level modelling analysis. Lancet Infect. Dis. 2019, 19, 56–66. [Google Scholar] [CrossRef]
  3. Potron, A.; Poirel, L.; Nordmann, P. Emerging broad-spectrum resistance in Pseudomonas aeruginosa and Acinetobacter baumannii: Mechanisms and epidemiology. Int. J. Antimicrob. Agents 2015, 45, 568–585. [Google Scholar] [CrossRef]
  4. European Centre for Disease Prevention and Control. Antimicrobial Resistance Surveillance in Europe 2015; Annual Report of the European Antimicrobial Resistance Surveillance Network (EARS-Net); ECDC: Stockholm, Sweden, 2017.
  5. Albiger, B.; Glasner, C.; Struelens, M.J.; Grundmann, H.; Monnet, D.L. European Survey of Carbapenemase-Producing Enterobacteriaceae working group Carbapenemase-producing Enterobacteriaceae in Europe: Assessment by national experts from 38 countries. Euro Surveill. 2015, 20, 30062. [Google Scholar] [CrossRef] [PubMed]
  6. Maltezou, H.C.; Kontopidou, F.; Dedoukou, X.; Katerelos, P.; Gourgoulis, G.M.; Tsonou, P.; Maragos, A.; Gargalianos, P.; Gikas, A.; Gogos, C.; et al. Working Group for the National Action Plan to Combat Infections due to Carbapenem-Resistant, Gram-Negative Pathogens in Acute-Care Hospitals in Greece. Action Plan to combat infections due to carbapenem-resistant, Gram-negative pathogens in acute-care hospitals in Greece. J. Glob. Antimicrob. Resist. 2014, 2, 11–16. [Google Scholar] [PubMed]
  7. Feretzakis, G.; Loupelis, E.; Sakagianni, A.; Skarmoutsou, N.; Michelidou, S.; Velentza, A.; Martsoukou, M.; Valakis, K.; Petropoulou, S.; Koutalas, E. A 2-Year Single-Centre Audit on Antibiotic Resistance of Pseudomonas aeruginosa, Acinetobacter baumannii and Klebsiella pneumoniae Strains from an Intensive Care Unit and Other Wards in a General Public Hospital in Greece. Antibiotics 2019, 8, 62. [Google Scholar] [CrossRef] [PubMed]
  8. Feretzakis, G.; Loupelis, E.; Petropoulou, S.; Christopoulos, C.; Lada, M.; Martsoukou, M.; Skarmoutsou, N.; Sakagianni, A.; Michelidou, S.; Velentza, A.; et al. Using Microbiological Data Analysis to Tackle Antibiotic Resistance of Klebsiella Pneumoniae. In Proceedings of the 18th International Conference on Informatics, Management, and Technology in Healthcare (ICIMTH), Athens, Greece, 5–7 July 2019; IOS Press: Amsterdam, The Netherlands, 2019; Volume 262, pp. 180–183. [Google Scholar] [CrossRef]
  9. Sterling, S.; Miller, W.; Pryor, J.; Puskarich, M.; Jones, A. The Impact of Timing of Antibiotics on Outcomes in Severe Sepsis and Septic Shock: A Systematic Review and Meta-Analysis. Crit. Care Med. 2015, 43, 1907–1915. [Google Scholar] [CrossRef]
  10. Sherwin, R.; Winters, M.; Vilke, G.; Wardi, G. Does Early and Appropriate Antibiotic Administration Improve Mortality in Emergency Department Patients with Severe Sepsis or Septic Shock? J. Emerg. Med. 2017, 53, 588–595. [Google Scholar] [CrossRef]
  11. Martínez-Agüero, S.; Mora-Jiménez, I.; Lérida-García, J.; Álvarez-Rodríguez, J.; Soguero-Ruiz, C. Machine Learning Techniques to Identify Antimicrobial Resistance in the Intensive Care Unit. Entropy 2019, 21, 603. [Google Scholar] [CrossRef]
  12. Oonsivilai, M.; Mo, Y.; Luangasanatip, N.; Lubell, Y.; Miliya, T.; Tan, P.; Cooper, B.S. Using machine learning to guide targeted and locally-tailored empiric antibiotic prescribing in a childrens hospital in Cambodia. Wellcome Open Res. 2018, 3, 131. [Google Scholar] [CrossRef]
  13. Revuelta-Zamorano, P.; Sánchez, A.; Rojo-Álvarez, J.; Álvarez Rodríguez, J.; Ramos-López, J.; Soguero-Ruiz, C. Prediction of Healthcare Associated Infections in an Intensive Care Unit Using Machine Learning and Big Data Tools. In Proceedings of the XIV Mediterranean Conference on Medical and Biological Engineering and Computing; Springer: Cham, Switzerland, 2016; pp. 840–845. [Google Scholar]
  14. Martínez-Agüero, S.; Lérida-García, J.; Álvarez Rodríguez, J.; Mora-Jiménez, I.; Soguero-Ruiz, C. Estudio de la evolución temporal de la resistenciaantimicrobiana de gérmenesen la unidad de cuidadosintensivos. In Proceedings of the XXXVI CongresoAnual de la Sociedad Española de IngenieríaBiomédica (CASEIB 2018), Ciudad Real, Spain, 21–23 November 2018. [Google Scholar]
  15. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. The WEKA data mining software. ACM SIGKDD Explor. Newslett. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  16. Smith, T.C.; Frank, E. Introducing Machine Learning Concepts with WEKA. Methods Mol. Biol. Stat. Genomics 2016, 353–378. [Google Scholar] [CrossRef]
  17. Kasperczuk, A.; Dardzińska, A. Comparative Evaluation of the Different Data Mining Techniques Used for the Medical Database. Acta Mech. Autom. 2016, 10, 233–238. [Google Scholar] [CrossRef]
  18. Han, J.; Pei, J.; Yin, Y. Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 2000, 29, 1–12. [Google Scholar] [CrossRef]
  19. Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.R.; Lin, C.J. LIBLINEAR: A library for large linear classification. J Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
  20. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. Available online: http://www.csie.ntu.edu.tw/~cjlin/libsvm (accessed on 24 January 2020). [CrossRef]
  21. Boser, B.; Guyon, I.; Vapnik, V. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory; Haussler, D., Ed.; ACM Press: New York, NY, USA, 1992. [Google Scholar]
  22. Cortes, C.; Vapnik, V. Support-vector network. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  23. Platt, J. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods—Support Vector Learning; Schoelkopf, B., Burges, C., Smola, A., Eds.; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  24. Keerthi, S.; Shevade, S.; Bhattacharyya, C.; Murthy, K. Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Comput. 2001, 13, 637–649. [Google Scholar] [CrossRef]
  25. Ramyaa, R.; Hosseini, O.; Krishnan, G.P.; Krishnan, S. Phenotyping Women Based on Dietary Macronutrients, Physical Activity, and Body Weight Using Machine Learning Tools. Nutrients 2019, 11, 1681. [Google Scholar] [CrossRef]
  26. Mitchell, T.M. Machine Learning, 1st ed.; McGraw-Hill Inc.: New York, NY, USA, 1997. [Google Scholar]
  27. Agnar, A.; Enric, P. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun. 1994, 7, 39–59. [Google Scholar]
  28. Aha, D.; Kibler, D.; Albert, M. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef]
  29. Aha, D.; Kibler, D. Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI 89), Detroit, MI, USA, 20–25 August 1989; pp. 794–799. [Google Scholar]
  30. Quinlan, J.R. C4.5. Programs for Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993. [Google Scholar]
  31. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  32. Cohen, W. Fast Effective Rule Induction. In Proceedings of the Twelfth International Conference on Machine Learning (ICML95), Tahoe City, California, USA, 9–12 July 1995; Prieditis, A., Russel, S., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 115–123. [Google Scholar]
  33. Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Amsterdam, The Netherlands, 2017. [Google Scholar]
  34. Moradigaravand, D.; Palm, M.; Farewell, A.; Mustonen, V.; Warringer, J.; Parts, L. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput. Biol. 2018, 14, e1006258. [Google Scholar] [CrossRef] [PubMed]
  35. Nguyen, M.; Long, S.W.; Mcdermott, P.F.; Olsen, R.J.; Olson, R.; Stevens, R.L.; Davis, J.J. Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella. J. Clin. Microbiol. 2018, 57. [Google Scholar] [CrossRef] [PubMed]
  36. Balows, A.; HauslerjR, W.J.; Herrmann, K.L.; Isenberg, H.D.; Shadomy, H.J. Manual of Clinical Microbiology, 5th ed.; American Society for Microbiology: Washington, DC, USA, 1991. [Google Scholar]
  37. Isenberg, H. Antimicrobial susceptibility testing. Clin. Microbiol. Proced. Handb. 2004, 2, 1–5. [Google Scholar]
  38. Murray, P.; Baron, E.J.; Jorgensen, J.; Pfaller, M.; Yolken, R. Manual of Clinical Microbiology, 8th ed.; American Society of Microbiology Press: Washington, DC, USA, 2005. [Google Scholar]
  39. Jorgensen, J.; Pfaller, M.; Carroll, K.; Funke, G.; Landry, M.L.; Richter, S.; Warnock, D. Manual of Clinical Microbiology, 11th ed.; American Society of Microbiology Press: Washington, DC, USA, 2015. [Google Scholar]
  40. Clinical and Laboratory Standards Institute. Performance Standards for Antimicrobial Susceptibility Testing, 26th ed.; CLSI: Wayne, PA, USA, 2016. [Google Scholar]
  41. Clinical and Laboratory Standards Institute. Performance Standards for Antimicrobial Susceptibility Testing, 27th ed.; CLSI: Wayne, PA, USA, 2017. [Google Scholar]
  42. The European Committee on Antimicrobial Susceptibility Testing. Clinical Breakpoints for Bacteria; EUCAST: Copenhagen, Denmark, 2016. [Google Scholar]
  43. Tsakris, A.; Poulou, A.; Pournaras, S.; Voulgari, E.; Vrioni, G.; Themeli-Digalaki, K.; Petropoulou, D.; Sofianou, D. A simple phenotypic method for the differentiation of metallo-β-lactamases and class A KPC carbapenemases in Enterobacteriaceae clinical isolates. J. Antimicrob. Chemother. 2010, 65, 1664–1671. [Google Scholar] [CrossRef]
  44. Skarmoutsou, N.; Adamou, D.; Tryfinopoulou, K.; Xirokosta, P.; Mylona, E.; Giakkoupi, P.; Karadimas, K.; Zervogianni, A.; Martsoukou, M. Performance of NG-Test CARBA 5 immunochromatographic assay for the detection of carbapenemases among multidrug-resistant clinical strains in Greece. In Proceedings of the 29th European Congress of Clinical Microbiology & Infectious Diseases (ECCMID 2019), Amsterdam, The Netherlands, 13–16 April 2019. [Google Scholar]
  45. Yong, D.; Lee, K.; Yum, J.H.; Shin, H.B.; Rossolini, G.M.; Chong, Y. Imipenem-EDTA disk method for differentiation of metallobeta—Lactamase-producing clinical isolates of Pseudomonas spp. And Acinetobacter spp. J. Clinmicrobiol. 2002, 40, 3798–3801. [Google Scholar]
  46. Lee, K.; Lim, Y.S.; Yong, D.; Yum, J.H.; Chong, Y. Evaluation of the Hodge test and the imipenem-EDTA double-disk synergy test for differentiating metallo-beta-lactamase-producing isolates of Pseudomonas spp. and Acinetobacter spp. J. Clinmicrobiol. 2003, 41, 4623–4629. [Google Scholar]
  47. Flountzi, A.; Giakkoupi, P.; Tryfinopoulou, K.; Pappa, O.; Vatopoulos, A.; Martsoukou, M.; Skarmoutsou, N.; Lebessi, E.; Charisiadou, A.E.; Chatzivasileiou, E.; et al. Investigation of Klebsiella pneumoniae clinical isolates from 2016 onwards for the putative presence of the plasmid-mediated mcr-1 gene for colistin resistance. In Proceedings of the 29th European Congress of Clinical Microbiology & Infectious Diseases (ECCMID 2019), Amsterdam, The Netherlands, 13–16 April 2019. [Google Scholar]
Table 1. Simple summary statistics of the dataset.
Table 1. Simple summary statistics of the dataset.
Age (Years)GenderGram StainClass
Mean61.68Male (44%)Positive (16.20%)Resistant (51.20%)
St.Dev.19.66Female (56%)Negative (83.80%)Sensitive (48.80%)
Range80
Type of Samples
Blood (7.38%)Tracheobronchial (60.90%)Urine (20.40%)Peritoneal (2.01%)
Tissue (5.29%)Catheters (3.76%)Pleural (0.26%)
Table 2. Detailed accuracy by class of LIBLINEAR (10-fold cross-validation).
Table 2. Detailed accuracy by class of LIBLINEAR (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.5930.4560.5770.5930.5850.1370.5680.550R
0.5440.4070.5600.5440.5520.1370.5680.527S
Weighted Avg.0.5690.4320.5690.5690.5690.1370.5680.539
Table 3. Detailed accuracy by class of LIBSVM (10-fold cross-validation).
Table 3. Detailed accuracy by class of LIBSVM (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7520.4330.6460.7520.6950.3250.6600.613R
0.5670.2480.6860.5670.6210.3250.6600.600S
Weighted Avg.0.6620.3430.6650.6620.6590.3250.6600.607
Table 4. Detailed accuracy by class of SMO (10-fold cross-validation).
Table 4. Detailed accuracy by class of SMO (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7870.4700.6380.7870.7050.3290.6590.611R
0.5300.2130.7040.5300.6050.3290.6590.602S
Weighted Avg.0.6620.3440.6700.6620.6560.3290.6590.607
Table 5. Detailed accuracy by class of lB1 (1 nearest neighbor) (10-fold cross-validation).
Table 5. Detailed accuracy by class of lB1 (1 nearest neighbor) (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7270.4480.6300.7270.6750.2830.6820.656R
0.5520.2730.6580.5520.6000.2830.6820.666S
Weighted Avg.0.6410.3630.6440.6410.6390.2830.6820.661
Table 6. Detailed accuracy by class of lB5 (5 nearest neighbor) (10-fold cross-validation).
Table 6. Detailed accuracy by class of lB5 (5 nearest neighbor) (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7260.4320.6380.7260.6790.2980.7110.687R
0.5680.2740.6640.5680.6120.2980.7110.717S
Weighted Avg.0.6490.3550.6510.6490.6470.2980.7110.702
Table 7. Detailed accuracy by class of J48 (10-fold cross-validation).
Table 7. Detailed accuracy by class of J48 (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7650.4270.6530.7650.7050.3450.7240.696R
0.5730.2350.6990.5730.6300.3450.7240.733S
Weighted Avg.0.6710.3330.6760.6710.6680.3450.7240.714
Table 8. Detailed accuracy by class of random forest (10-fold cross-validation).
Table 8. Detailed accuracy by class of random forest (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.6740.3960.6410.6740.6570.2790.7030.681R
0.6040.3260.6380.6040.6210.2790.7030.717S
Weighted Avg.0.6400.3620.6400.6400.6390.2790.7030.698
Table 9. Detailed accuracy by class of RIPPER (10-fold cross-validation).
Table 9. Detailed accuracy by class of RIPPER (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7350.3800.6700.7350.7010.3580.6990.653R
0.6200.2650.6910.6200.6530.3580.6990.694S
Weighted Avg.0.6790.3240.6800.6790.6780.3580.6990.673
Table 10. Detailed accuracy by class of Multilayer perceptron (10-fold cross-validation).
Table 10. Detailed accuracy by class of Multilayer perceptron (10-fold cross-validation).
MeasureTP RateFP RatePrecisionRecallF-MeasureMCCROC AreaPRC AreaClass
0.7060.3800.6610.7060.6830.3270.7260.706R
0.6200.2940.6680.6200.6430.3270.7260.743S
Weighted Avg.0.6640.3380.6640.6640.6630.3270.7260.724
Table 11. Weighted average values of F-Measure and Receiver Operating Characteristics (ROC) area for all methods (10-fold cross-validation).
Table 11. Weighted average values of F-Measure and Receiver Operating Characteristics (ROC) area for all methods (10-fold cross-validation).
TechniqueF-MeasureROC Area
LIBLINEAR0.5690.568
LIBSVM0.6590.660
SMO0.6560.660
kNN-50.6470.711
J480.6390.724
Random Forest0.6390.703
RIPPER0.6780.699
Multilayer perceptron0.6630.726
Back to TopTop