Guiding Efficient, Effective, and Patient-Oriented Electrolyte Replacement in Critical Care: An Artificial Intelligence Reinforcement Learning Approach

Both provider- and protocol-driven electrolyte replacement have been linked to the over-prescription of ubiquitous electrolytes. Here, we describe the development and retrospective validation of a data-driven clinical decision support tool that uses reinforcement learning (RL) algorithms to recommend patient-tailored electrolyte replacement policies for ICU patients. We used electronic health records (EHR) data that originated from two institutions (UPHS; MIMIC-IV). The tool uses a set of patient characteristics, such as their physiological and pharmacological state, a pre-defined set of possible repletion actions, and a set of clinical goals to present clinicians with a recommendation for the route and dose of an electrolyte. RL-driven electrolyte repletion substantially reduces the frequency of magnesium and potassium replacements (up to 60%), adjusts the timing of interventions in all three electrolytes considered (potassium, magnesium, and phosphate), and shifts them towards orally administered repletion over intravenous replacement. This shift in recommended treatment limits risk of the potentially harmful effects of over-repletion and implies monetary savings. Overall, the RL-driven electrolyte repletion recommendations reduce excess electrolyte replacements and improve the safety, precision, efficacy, and cost of each electrolyte repletion event, while showing robust performance across patient cohorts and hospital systems.


Introduction
The process of evaluating clinical data in the intensive care unit (ICU) to make diagnostic or therapeutic decisions is highly demanding, repetitive, and often requires over 100 decisions per day on average per provider [1,2]. This approach is almost always reactive and often not patient-centric [3][4][5][6][7]. The high stakes and pace of ICU operations put a strain on providers, leading to the frequent reliance on cognitive shortcuts [2,[6][7][8][9]. Prior experience and legal or ethical expectations further influence clinical decision making, along with the dynamics between different care providers and the availability of personnel, resources, or procedural constraints [10,11]. The delegation of the decision-making process to standardized protocols is often employed with the hope of improving outcomes Table 1. Selected 52 clinical features from patient EHRs based on their influence on electrolyte levels. We also included imputed measurements at each 6 h interval for a number of key vitals and labs.

Static
Age, Gender, Weight, Floor/ICU Vitals Heart rate, Respiratory rate, Temperature, O2 saturation pulse oximetry (SpO2), Urine output, Non-invasive blood pressure (systolic, diastolic) Labs Each hospital visit was divided into 6 h intervals to reflect the frequency with which staff may be reasonably able to react to automated recommendations. Clinically nonviable outliers in measured patient vitals and lab values were filtered out, and the mean of remaining measurements within a given six-hour interval was taken as representative of the value at this time step. Missing values were imputed with the last measurement for up to 48 h and otherwise imputed with the population mean value of each lab or vital sign. Each hospital visit was divided into 6 h intervals to reflect the frequency with which staff may be reasonably able to react to automated recommendations. Clinically nonviable outliers in measured patient vitals and lab values were filtered out, and the mean of remaining measurements within a given six-hour interval was taken as representative of the value at this time step. Missing values were imputed with the last measurement for up to 48 h and otherwise imputed with the population mean value of each lab or vital sign.

Model Framework
The task of electrolyte repletion during patient visits to the ICU was modeled as a Markov decision process (MDP), M = <S, A, P, R, γ> [31]. Over a sequence of discrete time steps at 6 h intervals, we observed the patient in some state in S, chose a treatment action from set A, and observed a stochastic transition to a new patient state (according to probability distribution P). Feedback from the transition was in the form of reward R.
The 6 h interval was chosen to mimic hospital workflow. Our objective was to learn an optimal policy π, mapping from a state in a continuous space S to an action in a discrete set A that maximizes the total discounted reward collected over the patient visit, where discount factor γ determines the relative importance of immediate versus distant rewards. Details of the protocol are included in Appendix A [30].
In defining the clinical condition of the patient in our model, we incorporated a total of 52 factors based on their relevance to or potential influence on electrolyte homeostasis in the patient (Table 1) [19,20]. We also included the administration of intravenous (IV) and oral (PO) electrolytes, and other potentially relevant medications administered over the past 6 h interval. To define the actionable AI events (action space A), we allowed for dosage rates in line with standard clinical practice ( Table 2). The dosing of these drugs was considered at one of six possible rates: 0-10 mEq/h infused over 1, 2, or 3 h; 10-20 mEq/h over 2, 4, or 6 h, or some combination of both intravenous and oral supplements. Repletion rates and doses were chosen in the same way for magnesium (Mg) and phosphates (P) ( Table 2). The AI performance was guided by: (i) a penalty for electrolyte levels above the reference range, (ii) a penalty for electrolyte levels below this range, (iii) the corresponding effective cost of PO repletion, and (iv) cost of repletion. The AI reward function was a weighted sum of these four conditions relevant to the current patient condition, the immediate action advised by AI, and the next state (Appendix A). The aim of our RL algorithm was learning a policy that would maximize the cumulative reward or, equivalently, minimize the total accumulated penalties over the course of the patient's admission.

Model Training
Data from the 13,234 hospital visits obtained from the UPHS dataset after applying our exclusion criteria were randomly split into 7000 visits in the training set to learn an optimal repletion policy, and 6164 in the test set to evaluate our learned policy on held-out data. By setting the sampling interval at 6 h and creating one-step transition samples of the form <state, action, reward, next state>, we produced a total of 54,228 samples in the training set for the potassium sub-cohort, 59,775 for magnesium, and 15,863 for phosphate.
Fitted Q-iteration (FQI), a data-efficient algorithm for offline reinforcement learning, was used to learn optimal treatment policies from these sets of patient state transitions [32]. The FQI algorithm learns a Q-value function, which is an estimate of the long-term rewards of each available action at a given patient state from the training data. Then, on our test data, we can use the learned Q-value function to choose the action that maximizes the rewards at a given patient state to identify the optimal treatment policy [33,34].
For each electrolyte repletion task, the learned policy first decides whether to administer a supplement and if so, by what route (oral, intravenous, or both). The second and third steps determine the most appropriate dosage and infusion time for oral or intravenous repletion, respectively. A retrospective off-policy evaluation (OPE) of the learned policy was performed using a frequency analysis of action recommendations, a qualitative analysis of the policy on patient trajectories, and fitted-Q evaluation (FQE), a state-of-the-art approach to estimating the expected accumulated reward of the learned policies [35].

Validation on MIMIC-IV
We extracted 40,000 adult ICU patients from MIMIC-IV to validate our RL algorithm [36]. The data include deidentified hospital patients admitted to one of the critical care units of the Beth Israel Deaconess Medical Center between 2008 and 2019. We used 40,000 unique critical care visits for our validation. As with the UPHS data, we split the visits into 32,000 for the training set and 8000 for the test set to evaluate our learned policy. After filtering, this data yielded a total of 54,228 samples in the training set for the potassium sub-cohort, 59,775 for magnesium, and 15,863 for phosphate. We also followed a similar imputation protocol when the exact value of a lab or vital was unknown. When training our AI algorithm, we used a set of 63 covariates to represent patient state. Our reward function is identical to the one used on the UPHS dataset, where rewards accumulate when the patient is within the reference range for a given electrolyte.

Financial Modeling
Financial modeling was carried out using the attached workflow, drawing upon prior work (Appendix B) [24]. The salaries were taken from a U.S. job site [37,38]. The prices of the medication were set using the Lexicon [39]. The prices of laboratory tests were obtained from the CMS schedule for the year 2020 [40]. In general, the lowest bracket was applied uniformly where estimates for wages, lab, and salaries were incorporated into the modeling. The time spent on tasks were estimated using observation and staff input.

Patterns in Historical Provider Behavior
In analyzing repletion patterns in terms of the distribution of pre-and post-repletion electrolyte measurements, we found that the large majority (73% potassium, 88% magnesium, and 38% phosphate) of replacements were ordered while electrolyte levels were either within or above the reference range ( Figure 2). In fact, potassium and magnesium were over-repleted at a rate of 4.4% and 1.4%. Phosphate was rarely over-treated by comparison, with just 0.6% of repletion events occurring above the target phosphate range. In addition, replacement at low electrolyte levels often failed to bring post-repletion values into the reference range ( Figure 2).

AI-Driven Repletion Recommendations
We used inverse reinforcement learning (IRL, Appendix A), to estimate the relative weights in the reward function of each of four variables-IV repletion cost, PO repletion cost, abnormally high, and abnormally low electrolyte values-for historical UPHS data in the case of potassium (K) and magnesium (Mg). Surprisingly, we estimate small nega-

AI-Driven Repletion Recommendations
We used inverse reinforcement learning (IRL, Appendix A), to estimate the relative weights in the reward function of each of four variables-IV repletion cost, PO repletion cost, abnormally high, and abnormally low electrolyte values-for historical UPHS data in the case of potassium (K) and magnesium (Mg). Surprisingly, we estimate small negative weights on both the cost of IV and the cost of PO repletion driving historical policy (Table 3). We compare this with the same weights chosen for training an AI-driven repletion protocol and demonstrate that this represents a substantial shift in weights relative to historical behavior, suggesting a more cost-aware repletion protocol (Table 3). Table 3. Weights of four variables driving electrolyte repletion (IV repletion cost, PO cost, abnormally high, and abnormally low electrolyte values) in the historical dataset and after application of reinforcement learning (RL) algorithm showed substantial changes. Consequently, the learned RL protocol using this IRL-learned reward function led to policies that recommended less frequent repletion in the case of potassium and magnesium, with reductions of 61.7% and 63.9%, respectively ( Figure 3). The RL-based system also showed a preference towards orally administered repletion for all three electrolytes considered, with higher doses of oral potassium replacement and higher doses of intravenous repletion for all three electrolytes when this route was chosen by the system. Compared to historical data, instances of intravenous potassium replacement dropped by 75% and oral replacement dropped by 50% (Figure 3).  Our optimal policy recommended repletion only when potassium was below the threshold of the reference range, and intravenous replacement only when the patient was significantly hypokalemic (data not shown). We can study the learned policy for a single patient visit to explain the behavior of the policy. The learned policy recommends fewer replacement interventions when electrolyte level is normal and more frequent repletion when the patient's electrolyte level is low (Figure 4). The AI-driven protocol favored K-PO (oral repletion), recommending K-IV (intravenous repletion) only when potassium levels were far below the reference range, and also tended to recommend repletion more promptly following a hypokalemic event. Our optimal policy recommended repletion only when potassium was below the threshold of the reference range, and intravenous replacement only when the patient was significantly hypokalemic (data not shown). We can study the learned policy for a single patient visit to explain the behavior of the policy. The learned policy recommends fewer replacement interventions when electrolyte level is normal and more frequent repletion when the patient's electrolyte level is low (Figure 4). The AI-driven protocol favored K-PO (oral repletion), recommending K-IV (intravenous repletion) only when potassium levels were far below the reference range, and also tended to recommend repletion more promptly following a hypokalemic event.

Historical Policy Drivers AI Policy Drivers
significantly hypokalemic (data not shown). We can study the learned policy for a sing patient visit to explain the behavior of the policy. The learned policy recommends few replacement interventions when electrolyte level is normal and more frequent repletio when the patient's electrolyte level is low (Figure 4). The AI-driven protocol favored PO (oral repletion), recommending K-IV (intravenous repletion) only when potassiu levels were far below the reference range, and also tended to recommend repletion mo promptly following a hypokalemic event. In order to quantify how our system compares with the performance of historic behavior with respect to our weighted reward function, we used Fitted-Q evaluatio (FQE). The Q-value provides a measure of policy effectiveness. Plotting the distributio of values (that is, expected accumulated rewards) for the set of all pairs of patient stat and actions in the data, we found that the average reward for the learned RL protocol w higher than that for the historical data in the case of all three electrolyte policies (Figu 5). This difference is especially pronounced in the case of potassium and magnesium, em phasizing the scope of possible improvement in current practice with respect to electroly repletion. In order to quantify how our system compares with the performance of historical behavior with respect to our weighted reward function, we used Fitted-Q evaluation (FQE). The Q-value provides a measure of policy effectiveness. Plotting the distribution of values (that is, expected accumulated rewards) for the set of all pairs of patient states and actions in the data, we found that the average reward for the learned RL protocol was higher than that for the historical data in the case of all three electrolyte policies ( Figure 5). This difference is especially pronounced in the case of potassium and magnesium, emphasizing the scope of possible improvement in current practice with respect to electrolyte repletion. , and phosphate (P) measured by the Q-value prediction, which corresponds to the expected total rewards (time saved, money saved, avoidance of near misses, and side effects) during the entire patient admission. For all three electrolyte policies, the mean Q-value prediction of state-action pairs in the test set was higher for the learned RL policy than for clinician behavior observed in the UPHS data. This suggests that RL optimizes the reward function to create a learned policy that is better than clinician behavior.

Expected Outcomes of Implementing AI-Driven Protocol
We compared the repletion events in the historical data for the 6164 patients in our test set with the instances of recommended repletion according to our learned RL policy, accounting for both the shift towards oral repletion and the overall reduction in repletion events. We calculated the potential decrease in the cost of medication over the full fiveyear period to be from USD 62k to USD 20.5k. The corresponding estimated expenses related to customary lab work were reduced from USD 87.2k to USD 38k. When we included expenses related to the time spent by different healthcare providers with lab and drug expenses, the total expected expenditures from an RL-driven process were reduced from USD 519k to USD 156k, translating into a savings of USD 790 per hospital visit.
Beyond these direct cost savings, the RL system also avoids replacement of electrolytes when the patient is above the reference range, reducing potential harm to the patient, promoting precise electrolyte replacement, and avoiding potential misses and near Figure 5. Estimated performance of policy for potassium (K), magnesium (Mg), and phosphate (P) measured by the Q-value prediction, which corresponds to the expected total rewards (time saved, money saved, avoidance of near misses, and side effects) during the entire patient admission. For all three electrolyte policies, the mean Q-value prediction of state-action pairs in the test set was higher for the learned RL policy than for clinician behavior observed in the UPHS data. This suggests that RL optimizes the reward function to create a learned policy that is better than clinician behavior.

Expected Outcomes of Implementing AI-Driven Protocol
We compared the repletion events in the historical data for the 6164 patients in our test set with the instances of recommended repletion according to our learned RL policy, accounting for both the shift towards oral repletion and the overall reduction in repletion events. We calculated the potential decrease in the cost of medication over the full five-year period to be from USD 62k to USD 20.5k. The corresponding estimated expenses related to customary lab work were reduced from USD 87.2k to USD 38k. When we included expenses related to the time spent by different healthcare providers with lab and drug expenses, the total expected expenditures from an RL-driven process were reduced from USD 519k to USD 156k, translating into a savings of USD 790 per hospital visit.
Beyond these direct cost savings, the RL system also avoids replacement of electrolytes when the patient is above the reference range, reducing potential harm to the patient, promoting precise electrolyte replacement, and avoiding potential misses and near misses.

Validation of the Protocol
We validated the learned electrolyte repletion policy by testing the policy estimated from the UPHS cohort in EHR data from the MIMIC-IV cohort. The electrolyte replacement patterns in the MIMIC-IV database were similar to those observed in UPHS ( Figure 6A). Similar to the UPHS test data, the application of the RL protocol learned from the UPHS cohort and applied to the MIMIC-IV cohort resulted in a shift towards PO dosages and less frequent replacement ( Figure 6B). The learned RL protocol, in general, recommends repletion less frequently than reported in the MIMIC-IV dataset, reflecting the lower frequency of repletion in the UPHS data relative to the MIMIC-IV data. Finally, we confirmed that the learned RL protocol uses covariates similarly to suggest optimal actions in the MIMIC-IV dataset as in the UPHS data ( Figure 6C).

Discussion
This is the first demonstration of an RL-derived treatment protocol in an ICU setting, intended to provide potentially continuous recommendations for clinician-in-the-loop patient care to address the issue of electrolyte replacements. Our RL algorithm demonstrates several important variables that guide providers to replete electrolytes for the first time. Furthermore, we demonstrated in silico that we can use a reinforcement learning (RL) strategy to create a policy that differs from clinical recommendations and that uses patient characteristics at a given time and a dynamic set of clinical variables to tailor treatment to specific patient needs. Finally, RL performed similarly in datasets from two different institutions, showing equivalent behavior and improvements in clinician policies, and addressing the ever-important problem of AI validation.

Discussion
This is the first demonstration of an RL-derived treatment protocol in an ICU setting, intended to provide potentially continuous recommendations for clinician-in-the-loop patient care to address the issue of electrolyte replacements. Our RL algorithm demonstrates several important variables that guide providers to replete electrolytes for the first time. Furthermore, we demonstrated in silico that we can use a reinforcement learning (RL) strategy to create a policy that differs from clinical recommendations and that uses patient characteristics at a given time and a dynamic set of clinical variables to tailor treatment to specific patient needs. Finally, RL performed similarly in datasets from two different institutions, showing equivalent behavior and improvements in clinician policies, and addressing the ever-important problem of AI validation. The reinforcement learning system described in this paper uses available information from electronic health records of vital signs, lab tests, and administered drugs and procedures in order to estimate a patient-specific, provider-in-loop recommendation protocol for electrolyte repletion at six-hour intervals. This period was chosen as a reasonable time within the workflow of the intensive care unit. Recommendations are presented in an interpretable and hierarchical way in which the system first suggests whether or not a repletion is needed, along with the best route for repletion, and followed by the most appropriate dosage in the event that the clinician chooses to administer a repletion. This is a more controlled system of prescribing electrolyte repletion, reflecting a quantitative data-driven decision-making pathway that caregivers often fail to follow if the decision-making process is provider-or protocol-driven [13]. The RL system provides flexibility in deciding what the clinical priorities should be, adapting them according to the electrolyte considered and to challenging clinical situations, such as chronic renal failure, liver failure, or severe morbidity, or to the workflows of the specific healthcare center [31]. Our approach therefore presents an adaptive framework for the delivery of care capable of minimizing harm and maximizing precision, considering the patient context. Our optimal RL policy was able to recommend electrolyte replacements in a more targeted way [31,32]. The estimated reduction in recommended repletion events in the case of potassium and magnesium allows for considerable savings in the time spent by clinicians assessing electrolyte levels and the costs incurred from unnecessary or repeat orders placed without thorough re-evaluation of clinical need [1,7]. Moreover, the recommendation of electrolyte administration at pre-repletion values above the reference range is rarely if ever observed [16], eliminating potential risk to patients due to over-treatment that was observed in the historical patient data.
In addition, by placing larger penalties on intravenous rather than oral potassium repletion, we were able to arrive at a policy that chooses oral replacement where possible [32,39]. The higher effective cost of IV repletion can be justified in a number of ways: in the cost of the prescription itself of intravenous delivery, in the provider time taken to initiate and monitor the delivery of the drug, in the increased risk of overcorrection when setting the infusion rate as well as bruising, clotting, or infection at the infusion site, discomfort or infection at the infusion site, and the risk of accidental overdosing [20,39].
It is important to note that the estimates of efficacy presented here are based on retrospective evaluation, which is challenging for AI systems that use reinforcement learning with batch data. In this scenario, we do not have the ground truth as to the best possible actions to learn from, and we cannot collect additional data following our estimated policy, as in reinforcement learning for robotics or games. Furthermore, we are not able to accurately simulate this data, given the complexity of patient health trajectories. As soon as an action is taken in the historical test data that deviates from the optimal learned policy, the patient trajectory under the optimal policy decision and all subsequent treatment decisions are no longer perfectly known [8,28,31].
It can also be challenging in retrospective studies to disentangle potential confounders in the patient attributes used to determine the necessity of repletion, and care is needed to ensure that the drivers of repletion are appropriately interpreted. For example, it was observed that high serum creatinine levels increase the probability of recommending potassium repletion, assuming the patient has experienced kidney failure, resulting in the buildup of creatinine levels, and thus the need for dialysis, which in turn is likely to result in potassium deficiency. This recommendation may not hold if dialysis is not initiated or continued by the care provider. Finally, the system here focused on data between 2010 and 2015; it is possible that there has been a shift in electrolyte testing and ordering practices during or after this timeframe. The training dataset is limited to one center. We also limited the dataset to instances where data were complete, resulting in the substantial attrition of the dataset. It is unclear if this strategy provides a more robust treatment policy than using a more sizable but incomplete dataset. Further validation is needed to ensure that the repletion policy recommended is robust for this shift in time. Future developments will include the prospective validation of optimal RL policy recommendations by first running real-time side-by-side comparisons of system recommendations with providers' actions (i.e., shadowing providers), and then evaluating the efficacy of bedside policy recommendations in a provider-in-the-loop protocol.
Developing this data-driven decision support tool is one task, but its implementation into a clinical workflow may also encounter several obstacles. Providers may mistrust the automated recommendations, in particular where there is a substantial departure from current practice. This may occur, for instance, when providers are inclined to frequent recommendations of higher doses of PO repletion. In addition, questions of reimbursement, liability, and accountability may arise, and hospital systems need to figure out how to deal with operational and legal consequences of implementation [12,41]. However, the potential gains of thoughtful, well-planned implementation are considerable. Our estimation of the financial benefit is conservative and does not account for other factors that could not be quantified in the data [22,23].
The next step of this project is to develop an easily implemented module allowing for processing data from various healthcare systems to provide more cross-validation to assess the robustness of the algorithm against regional differences and more systemic biases related to practice patterns and biases. The implementation of the RL will be challenging, and one way to design the algorithm is to allow it to advise physicians during patient rounds. Designing the RL to work in a six-hour interval was carried out with that idea in mind. Because the RL algorithm is able to integrate new data into the optimal policy, these adaptive policies are uniquely suited for robust deployment in a variety of environments.
In summary, this work describes an approach to guiding the repletion of electrolytes of patients in the ICU, with the aim of avoiding the need for the patient to undergo prolonged durations of electrolyte imbalance, while minimizing the costs associated with ordering and administering oral and intravenous repletion. Funding: This work was funded by Helmsley Trust grant AWD1006624, NIH NCI 5U2CCA233195, NIH NHLBI R01 HL133218, and NSF CAREER AWD1005627.

Institutional Review Board Statement:
The study was approved by the Institutional Review Board at the University of Pennsylvania (#823822).

Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request upon approval from the IRB.

Appendix A Appendix A.1. Reward Design
The overall reward function can be written as R = w × φ, where φ(s,a,s ) is a fourdimensional vector function parameterized by the current state s, immediate action a, and next state s that formalizes each of objectives described, and w defines the relative weight of each of these objectives. Penalties for values above and below the reference range are applied independently to allow for asymmetric weighting of the risks posed by hypokalemia when compared with hyperkalemia. A sigmoid function to model penalties on abnormal vitals reflects the clinical importance of a more severe electrolyte imbalance.
Vector functions φ for both magnesium and phosphate are also needed, with elements corresponding to IV repletion cost, PO repletion cost, and abnormally high and abnormal low electrolyte levels.
In both the UPHS and MIMIC datasets, we used a 75-25 training-test split. We used the two datasets to model this clinical decision-making problem as a Markov decision process (MDP) and used a custom-designed reward function that penalizes states in which the patient is outside the given reference range for an electrolyte. We then used batch FQI to learn an optimal policy and find that our learned Q-table converges (i.e., stabilizes) after 25-50 iterations [30].

Appendix A.2. Fitted Q Iteration (FQI)
The FQI algorithm learns an estimator for value Q of each state-action pair in our MDP, where Q is the expected discounted cumulative reward, starting from the given state and taking the specified action. This algorithm uses a series of regression models, where the target Q-values for the regression at each iteration are obtained by bootstrapping on the estimated Q from the previous regression, and updating based on observed rewards in the current iteration [34,35]. FQI offers flexibility in the use of any regression method to solve the supervised problems at each iteration. We fitted our estimate of Q at each iteration of FQI, using gradient boosting machines (GBMs) [32]. This is an ensemble method in which weaker predictive models, such as decision trees, are built sequentially by training on residual errors, thereby allowing models to learn higher-order terms and more complex interactions amongst features [33].

Appendix A.3. Inverse Reinforcement Learning
Inverse reinforcement learning is the task of extracting the reward function, which explains the observed behavior in the data. In this case, it involves determining the value of reward weights w where R = w × φ gives us an optimal policy-similar to the policy followed by clinicians in the past. This is typically carried out by arbitrarily choosing initial weights w, solving for a policy that optimizes reward R = w × φ, estimating some representation of the dynamics of this policy, comparing the policy dynamics with the behavior seen in historical data, and updating weights accordingly, then iterating until the learned policy with our weights is acceptably close to past behavior. In this case, we first set w to assign equal priority to all objectives in φ and used the discounted time spent in each state to represent policy dynamics, using this as update w. • 1min Blood draw (RN) • 2min Figure A1. The workflow used for estimation of time and costs related to savings after the introduction of RL to a clinical setting as a clinician-in-the-loop decision-making support tool.