Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review

Faulds, Eileen R.; Rayan, Melanie Natasha; Mlachak, Matthew; Dungan, Kathleen M.; Allen, Ted; Patterson, Emily

doi:10.3390/diabetology7010019

Open AccessReview

Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review

by

Eileen R. Faulds

^1,2,*,

Melanie Natasha Rayan

³,

Matthew Mlachak

⁴,

Kathleen M. Dungan

²,

Ted Allen

⁵

and

Emily Patterson

⁶

¹

College of Nursing, The Ohio State University, Columbus, OH 43210, USA

²

Division of Endocrinology, Diabetes & Metabolism, College of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA

³

Division of Endocrinology, Diabetes & Metabolism, Monument Health, Rapid City, SD 57701, USA

⁴

Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA

⁵

Department Integrated Systems Engineering, The Ohio State University, Columbus, OH 43210, USA

⁶

School of Health and Rehabilitation Science, The Ohio State University, Columbus, OH 43210, USA

^*

Author to whom correspondence should be addressed.

Diabetology 2026, 7(1), 19; https://doi.org/10.3390/diabetology7010019

Submission received: 24 October 2025 / Revised: 8 December 2025 / Accepted: 4 January 2026 / Published: 12 January 2026

(This article belongs to the Special Issue Diabetes Management in the Hospital: Applications of Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Background: Dysglycemia remains a persistent challenge in hospital care. Despite advances in outpatient diabetes technology, inpatient insulin management largely depends on intermittent point-of-care glucose testing, static insulin dosing protocols and rule-based decision support systems. Artificial intelligence (AI) offers potential to transform this care through predictive modeling and adaptive insulin control. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines, a scoping review was conducted to characterize AI algorithms for insulin dosing and glycemic management in hospitalized patients. An interdisciplinary team of clinicians and engineers reached consensus on AI definitions to ensure inclusion of machine learning, deep learning, and reinforcement learning approaches. A librarian-assisted search of five databases identified 13,768 citations. After screening and consensus review, 26 studies (2006–2025) met the inclusion criteria. Data were extracted on study design, population, AI methods, data inputs, outcomes, and implementation findings. Results: Studies included ICU (N = 13) and general ward (N = 9) patients, including patients with diabetes and stress hyperglycemia. Early randomized trials of model predictive control demonstrated improved mean glucose (5.7–6.2 mmol/L) and time in target range compared with standard care. Later machine learning models achieved strong predictive accuracy (AUROC 0.80–0.96) for glucose forecasting or hypoglycemia risk. Most algorithms used data from Medical Information Mart for Intensive Care (MIMIC) databases; few incorporated continuous glucose monitoring (CGM). Implementation and usability outcomes were seldom reported. Conclusions: Hospital AI-driven models showed strong algorithmic performance but limited clinical validation. Future co-designed, interpretable systems integrating CGM and real-time workflow testing are essential to advance safe, adaptive insulin management in hospital settings.

Keywords:

diabetes; artificial intelligence; hospital; inpatient; glucose management; hypoglycemia

1. Introduction

Glycemic management remains one of the most persistent challenges in hospital medicine. More than 30% of hospitalized patients in the U.S. have diabetes, and up to 40% experience hyperglycemia during admission, including those without pre-existing diabetes who develop stress hyperglycemia [1,2]. Dysglycemia, whether hyperglycemia, hypoglycemia, or glucose variability, is strongly linked to adverse outcomes, including infection, renal failure, longer length of stay, intensive care unit (ICU) transfers, cardiovascular events, and mortality [3,4,5,6,7,8,9]. Hypoglycemia alone occurs in nearly one-third of hospitalized patients, with recurrent or severe episodes associated with arrhythmia, stroke, in-hospital falls, and increased short-term mortality [1,2,10]. Glycemic variability further predicts poor outcomes; in ICU populations, fluctuations in glucose are a stronger predictor of mortality than mean glucose values [11]. Potential complications of dysglycemia also carry substantial financial impact, with diabetes-related hospitalizations accounting for nearly one-third of U.S. inpatient medical costs, amounting to over $90 billion annually [12].

Despite decades of evidence linking dysglycemia to poor outcomes, inpatient glucose management continues to rely primarily on intermittent point-of-care (POC) testing and nurse-driven insulin titration and dosing protocols [11,13]. Such protocols are reactive, labor-intensive, and prone to misalignment with meals or insulin delivery, leading to glycemic excursions, increased variability, and potentially preventable complications. Even electronic decision support systems, such as Glucommander, Space GlucoseControl, and EndoTool, remain limited by their reliance on POC testing and narrow adjustment rules, falling short of addressing the dynamic physiology of hospitalized patients [14,15].

In contrast, artificial intelligence (AI) has transformed outpatient diabetes care. AI-driven automated insulin delivery (AID) systems now represent standard of care for many people with type 1 diabetes, increasing time in range 4.0–10.0 mmol/L (TIR), reducing hypoglycemia, and alleviating patient burden [16]. These systems integrate continuous glucose monitoring (CGM) with predictive algorithms such as model predictive control (MPC), reinforcement learning, and machine learning approaches to individualize insulin dosing [16]. Outpatient AID achieves >70% TIR, yet when applied to hospitalized patients, TIR drops to 52%, highlighting the need for models tailored to the inpatient environment, where insulin sensitivity may change within hours due to illness severity, medications, or nutrition status [17].

The rapid expansion of inpatient CGM adoption has laid the groundwork for AI-based decision support in hospitals [18,19]. CGM data streams create opportunities for predictive algorithms that can move inpatient care from reactive to proactive glucose management.

The aim of this scoping review is to systematically map the current state of AI algorithms developed or applied for insulin dosing in hospitalized patients. Specifically, we seek to characterize the types of AI methods employed, their application to different hospital settings, reported performance, and gaps in evidence to guide future research and clinical translation.

2. Methods

A scoping review was conducted to map and characterize the application of artificial intelligence (AI) algorithms for insulin dosing and glycemic management in hospital settings. The review protocol was developed a priori and followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Given the diversity of modeling approaches in this emerging field, the review team included clinical and engineering co-authors to reach consensus on the operational definition of AI methodology and to determine study eligibility. This interdisciplinary consensus process ensured inclusion of all relevant studies employing machine learning, deep learning, or reinforcement learning techniques. Although MPC originates from control science rather than machine learning, the review team chose to include MPC-based insulin algorithms within the scope of the AI review. MPC, unlike other classical control methods, is the most widely deployed advanced classical controller [20]. Additionally, MPC offers a class of optimization-based control methods with the ability to adapt comparably to deep reinforcement learning methods [21]. Several AI reviews have similarly included MPC due to its prospective forecasting, optimization of insulin dosing, and autonomous closed-loop behavior [22,23]. The scoping review protocol is available in Supplementary Material S1 [24].

2.1. Search Strategy

A structured, librarian-assisted search was conducted in PubMed/MEDLINE, Web of Science, CINAHL, Scopus, and Embase. The search strategy was based on the following keywords: (“artificial intelligence” OR “machine learning” OR “deep learning” OR “neural network” OR “reinforcement learning” OR “generative AI” OR “algorithm” OR “feature engineering” OR “time series classification” OR “rare events” OR “SMOTE” OR “XGBoost” OR “logistic regression” OR “interactions”) AND (insulin OR “insulin therapy” OR “insulin dosing” OR “glycemic control” OR “glycemic management” OR “blood glucose” OR “glucose lowering” OR “diabetes treatment” OR “diabetes therapy”) AND (hospital OR inpatient OR ICU OR “critical care” OR “acute care”). Filters were used to limit results to studies published in English. No date restrictions were applied.

2.2. Inclusion and Exclusion Criteria

Articles were included if they were published in English, reported original research, and described or evaluated AI algorithms designed for insulin management or hypoglycemia prevention in hospitalized patients. This included those treated in emergency departments and ICUs. Eligible designs included development, feasibility, pilot, randomized or non-randomized controlled trials, implementation, observational, and qualitative studies. Publications were excluded if they were conducted exclusively in outpatient or home settings, focused on development or evaluation of outpatient technologies, or involved use of personally owned AID devices during hospitalization. Studies conducted in neonatal ICUs or outpatient surgical settings were excluded. Articles were excluded if they focused solely on diabetes diagnosis, risk prediction or readmission without an insulin or glycemic management component or if they described non-AI-based tools such as static insulin calculators or standard computerized order sets. Articles without accessible full text, including abstract-only publications (e.g., conference proceedings), were also excluded.

2.3. Data Extraction

Two reviewers independently screened titles and abstracts to determine initial eligibility, followed by full-text review of potentially relevant articles. Discrepancies were resolved through consensus. In total, 13,768 references were imported for screening. After removal of 6203 duplicates (6154 identified by Covidence and 49 manually), 7564 studies underwent title and abstract review, of which 7464 were excluded. Ninety-nine full-text articles were assessed for eligibility, with 73 excluded for the following reasons: full text not available (n = 39), outpatient technology (n = 11), not AI (n = 10), wrong outcomes (n = 4), wrong study design (n = 3), not a study (n = 2), not in English (n = 2), or wrong setting (n = 2). Ultimately, 26 studies were included in the review. The full selection process is presented in the PRISMA flow diagram (Figure 1).

Extracted data elements were established a priori in the review protocol and included study ID (author, year, country), setting (ICU, non-ICU, emergency department), population (adult or pediatric; diabetes type; clinical subgroups), AI type (e.g., machine learning, neural networks, reinforcement learning, generative AI), input data (CGM, electronic health records [EHR], laboratory values, vital signs), insulin use case (intravenous infusion [IV], basal dosing, bolus/mealtime dosing, correctional insulin, continuous subcutaneous insulin infusion), reported outcomes (glycemic metrics, hypoglycemia, usability, workflow), study design, limitations, and other relevant notes. Two reviewers independently extracted data from included articles using standardized forms.

3. Results

3.1. Study Characteristics

A total of 26 studies published between 2006 and 2025 were included, encompassing a broad range of study designs, data sources, and hospital settings. Most studies were conducted in high-income countries, with the greatest representation from the United States [25,26,27,28,29,30] and Europe [31,32,33,34,35,36,37]. Additional studies were performed in Asia [38,39,40,41] and the Middle East [42,43]. All study elements appear in Table 1.

Study designs ranged from early randomized controlled trials (RCTs) evaluating rule-based or model predictive control systems [31,32,33,34] to more recent retrospective or prospective cohort studies, developing and validating data-driven machine learning models [25,26,36,37,39]. Three studies were in silico simulations using virtual ICU patient datasets [42,43,44]. Only a small subset performed prospective clinical validation, with most studies categorized as model development or retrospective validation work.

The studies included five primary AI use cases for inpatient glycemic management. Seven studies focused on IV insulin model development, employing model predictive control or hybrid systems to automate insulin infusion and maintain target glucose levels in ICU settings [31,32,33,34,44,45]. A second group of studies centered on glucose prediction, leveraging machine learning or neural network models to forecast short- or medium-term glucose trajectories using CGM or EHR data [25,38,40,43,46,47,48]. Hypoglycemia prediction models comprised the largest subgroup, aiming to identify patients at imminent or future risk of low glucose events using real-time or retrospective data [26,28,36,37,39,49]. Three studies examined insulin sensitivity modeling, using physiologic or in silico simulations to capture patient-specific metabolic responses and optimize insulin titration [35,42,50]. Finally, two studies investigated subcutaneous insulin dosing optimization through EHR-based or deep learning algorithms that estimated total daily insulin requirements [27,30].

Settings included ICUs [25,27,31,32,34,35,37,38,40,42,43,44,45,46,47,48,49,50] and hospital-wide or general medical/surgical wards [26,27,28,29,33,36,37]. Two studies, conducted at the same institution, recruited participants from a diabetes specialty floor [30,39].

3.2. Study Populations and Inclusion Criteria

Across all studies, sample sizes varied widely, from 20 to 60 patients in early RCTs to more than 60,000 admissions in Mehdizavareh (2025), a large EHR-based machine learning study [28,43]. Study populations were diverse in both clinical characteristics and inclusion thresholds. Approximately one-third of the included studies were prospective in design, while the remainder were retrospective or based on in silico simulations. Many early ICU-based RCTs enrolled mixed medical–surgical populations requiring IV for stress hyperglycemia, rather than specifically targeting individuals with diabetes [31,33,34]. In contrast, several later studies utilizing EHR data explicitly restricted inclusion to patients with type 2 diabetes or those treated with insulin during hospitalization [26,27,30].

The glucose inclusion thresholds also varied considerably, influencing both population composition and clinical applicability. Some studies included all inpatients with blood glucose levels exceeding 6.7 mmol/L [36], a criterion likely to capture a large proportion of non-diabetic patients experiencing transient hyperglycemia, while others used more stringent cutoffs (e.g., >8.3 mmol/L) or required a confirmed diabetes diagnosis [28]. This heterogeneity in inclusion criteria resulted in a broad spectrum of metabolic phenotypes represented across models, from critically ill patients with stress hyperglycemia to ambulatory ward patients with established diabetes.

Table 1. Study Characteristics, Models, and Outcomes.

Study ID	Country	Aim of Study	Study Design	Setting	Sample	Number Participants	Inclusion-Exclusion Criteria	AI Type	Input Data	Use Case	Glucose Outcomes	Model Performance Outcomes
Plank, 2006 [31]	Other: Austria, Czech Republic, United Kingdom	To evaluate a fully automated algorithm (MPC) for tight glycemic control in critically ill patients and compare results with routine glucose management protocols across three European ICUs	Multicenter RCT	ICU	Mean age 59 y: mixed medical–surgical population requiring IV insulin for BG < 7.8 mmol/L.	60 (30 MPC, 30 routine protocol)	Adult patients (18–90 years) Undergoing elective cardiac surgery Post-surgery blood glucose ≥ 6.7 mmol/L at ICU admission With or without established diabetes diagnosis	MPC	Hourly arterial blood glucose Insulin dosage Carbohydrate intake	IV insulin infusion management	TIR 4.4–6.1 mmol/L = 52% vs. 19% (0–24 h, p < 0.01); 65% vs. 25% (24–48 h, p < 0.05); mean BG ≈ 6.5 vs. 7.3 mmol/L; no hypoglycemia events with MPC vs. 2 events in controls.	Automated MPC algorithm using hourly arterial BG inputs; adaptive insulin infusion achieved rapid, stable control across centers; no device failures; hourly sampling required for optimal performance.
Pachler, 2008 [32]	Austria	To compare glucose control in ICU patients using an enhanced model predictive control (eMPC) algorithm with time-variant sampling against a standard glucose management protocol.	RCT	ICU	Intervention group: History of DM 8 (32%); Male 16 (64%); age 61.2 ± 14.0; BMI 28.7 ± 6.6; APACHE II 26.6 ± 3.5. Control group: History of DM 11 (44%); Male 17 (68%); age 59.5 ± 16.1; BMI 27.6 ± 4.6; APACHE II 26.7 ± 5.5	50 ICU patients randomized: 40 (20 per group) included in final analysis	Mechanically ventilated and assumed to require ≥3 days of intensive care, glucose > 6.1 mmol/L or already on insulin therapy	eMPC	Glucose concentration, insulin dosage, carbohydrate content of enteral and parenteral input	IV insulin	Median BG 5.9 mmol/L (eMPC) vs. 7.4 mmol/L (control, p < 0.05); median hyperglycemia 0.4 mmol/L vs. 1.6 mmol/L (p < 0.05); one mild hypoglycemia episode (<2.22 mmol/L) resolved after glucose bolus.	Mean sampling interval 117 ± 34 min vs. 174 ± 27 min (p < 0.001); median insulin rate 3.0 IU/h (IQR 2.0–5.6) vs. 2.3 IU/h (IQR 1.7–4.0). eMPC achieved tighter control with modestly higher workload and insulin use.
Cordingley, 2009 [45]	England and Belgium	To investigate the effectiveness of an eMPC algorithm for IV insulin infusion in critically ill patients compared to standard care over 72 h across two ICUs with different management protocols.	RCT	ICU	34 patients (20 Hospital 1, 14 Hospital 2); Intervention (eMPC): 16 patients (10 Hospital 1, 6 Hospital 2); Control (standard care): 18 patients (10 Hospital 1, 8 Hospital 2); Age (mean ± SD): Hospital 1 eMPC 59 ± 16 vs. control 57 ± 17; Hospital 2 eMPC 67 ± 9 vs. control 63 ± 7; Male (%): Hospital 1 eMPC 60% vs. control 90%; Hospital 2 eMPC 50% vs. control 63%; BMI (mean ± SD): Hospital 1 eMPC 25.4 ± 5.8 vs. control 28.7 ± 5.9; Hospital 2 eMPC 28.0 ± 3.9 vs. control 26.9 ± 3.7; Diabetes: 1 patient with DM2 (Hospital 2 control); APACHE II (median, range): Hospital 1 eMPC 17 (7–28) vs. control 14 (5–26); Hospital 2 eMPC 16 (11–28) vs. control 16 (10–26).	34	Inclusion: Arterial plasma glucose greater than 6.7 mmol/L or already receiving IV insulin, and expected to be receiving mechanical ventilation for ≥72 h from the study initiation Exclusion: Insulin allergy and chronic mental incapacity	eMPC	Weight, arterial plasma glucose concentration, insulin dosing history, carbohydrate intake	IV insulin	Mean time-weighted BG lower with eMPC at Hospital 2 (5.9 vs. 7.1 mmol/L, p < 0.001) and comparable at Hospital 1 (5.7 vs. 5.4 mmol/L). Time in range higher at Hospital 2 (57.7% vs. 23.5%, p < 0.01). No severe hypoglycemia; sampling interval shorter (1.1–1.8 h vs. 1.9–2.5 h).
Amrein, 2010 [33]	Austria	To evaluate the performance of the enhanced eMPC algorithm for glycemic control across the full ICU stay in critically ill patients.	Non-randomized experimental study	Non-ICU cardiovascular, infectious, neurologic, gastrointestinal units	Age 69 ± 11; Male: 16 (80%); BMI 27.4 ± 4.5; APACHE II 25.5 ± 5.2; 6	20	Inclusion: ≥5 days ICU treatment; glucose > 6.1 mmol/L or already on insulin therapy Exclusion: Insulin allergy, presence of ketoacidosis	eMPC	Glucose concentration, insulin dosage, carbohydrate content of enteral and parenteral nutrition	IV insulin infusion/Hypoglycemia prevention and reduction	Mean BG 5.8 ± 0.5 mmol/L; 58% time in target (4.4–6.1 mmol/L); 11% 3.3–4.4 mmol/L; 0.8% 2.2–3.3 mmol/L; 0.02% < 2.2 mmol/L. Three mild hypoglycemia events (0.02 per treatment day), all resolved.	Mean insulin 101 ± 50 IU/day; sampling interval 1.7 ± 0.3 h; 98% compliance with algorithm advice; 2% overridden; no technical failures; nurses reported good efficacy and manageable workload. 2% of time points eMPC was overruled by nurse (42 corrected downwards, 12 corrected upwards)
Pappada, 2010 [47]	United States	To demonstrate the use of a predictive model in real-time bedside monitoring to support intelligent insulin therapy recommendations and clinical automation.	Prospective proof-of-concept modeling and real-time validation study using CGM and EHR data from ICU trauma patients.	Trauma ICU	Demographics not provided	6		Neural Network	CGM and EHR	Prediction of glycemic states and management	Prospective proof-of-concept modeling and validation study using CGM data in ICU trauma patients; patient-specific neural network model achieved CEGA 95.1% region A vs. 69.8% for general model (MAD 7.9% vs. 15.9%).
Kopecký, 2013 [34]	Czech Republic	To assess a system combining CGM and an eMPC algorithm for IV insulin therapy in postsurgical patients.	RCT	ICU Postcardiac surgery patients	Intervention: History of DM2 (16.6%); age 68.1 ± 2.2; female 6 (50%); BMI 29.1 ± 0.8; Caucasian 100% Control: History of DM (33.3%); age 67.5 ± 3.3; female: 4(33%); BMI 27.8 ± 1.0; Caucasian 100%	24	Inclusion: Undergoing major elective cardiac surgery Exclusion: Insulin allergy and inability to consent	eMPC	Glucose concentration, insulin dosage, carbohydrate intake (CGM served as input for intervention group)	IV insulin	Mean BG 6.2 ± 0.1 vs. 6.1 ± 0.6 mmol/L (ns); % time in target 46.3 ± 5.5 vs. 46.2 ± 6.5; % below target 13.1 ± 2.6 vs. 15.4 ± 2.4; 0 vs. 2 hypoglycemia ≤ 2.9 mmol/L; time to target 7.6 ± 1.0 h vs. 8.8 ± 5.4 h (ns).	CGM accuracy 97.5% (A + B zones of CEGA); 11 of 12 sensors completed 24 h; ≤1 extra calibration in most cases; no technical failures. eMPC + CGM was feasible and safe for tight glucose control postcardiac surgery.
DeJournett, 2016 [44]	United States	Evaluate the performance of an artificial intelligence-based closed-loop glucose controller through in silico testing.	Non-randomized experimental study	In silico ICU environment	Patients simulated under varying conditions, representing critically ill adults with diverse insulin sensitivity and nutritional states.	80 virtual ICU patient profiles	Simulated patients	Knowledge-based, closed-loop model	Initial inputs: Starting glucose concentration, patient weight, desired glucose range, concentrations of insulin and dextrose solutions Dynamic inputs during simulation: Current glucose level, glucose rate of change, current insulin infusion rate, current IV dextrose infusion rate	IV insulin	In silico AI controller achieved 94.2% time in control, 97.8% in 3.9–7.8 mmol/L, hyperglycemia 2.1%, hypoglycemia 0.09%, CV 11.1%. No severe hypoglycemia (<2.5 mg/dL). Vastly improved over “no control” simulation (TIR 98% vs. 20%).	Forward-chaining knowledge-based AI system using 5–10 min control cycles; 126,000 five-day simulations (~107 million glucose values); 95% of simulations reached target within 2 h; stable control across ranges. Demonstrated safety, high precision, and strong in silico performance vs. no control.
Benyó, 2018 [35]	New Zealand, Hungary, and Belgium	To analyze the insulin sensitivity and model accuracy of the STAR (Stochastic TARgeted Control) glycemic control protocol across various regions.	In silico, retrospective multicenter modeling study using ICU glycemic control data.	In silico ICU environment	Demographics not provided	60	Treated using the STAR protocol	Time-series classification	Blood glucose measurements, insulin administration records, nutrition records	Insulin sensitivity	Median and temporal trajectories of insulin sensitivity (SI) analyzed across three ICU cohorts (Hungary, Belgium, New Zealand). SI patterns differed significantly early in ICU stay, while model-noise (accuracy) parameters did not differ. No direct blood glucose or insulin dose outcomes were reported.
Kim, 2020 [38]	South Korea	To develop a personalized deep learning model using recurrent neural networks to predict blood glucose levels 30 min in advance for hospitalized Type 2 diabetes patients.	Prospective cohort study	ICU	Sex: Female 13 (65%) 30–39 years: 3 40–49 years: 6 50–59 years: 4 60–69 years: 7	20	Age ≥ 20 and <70 years, ICU admission with DM; Dexcom G5 CGM for at least 3–7 days during hospitalization	Recurrent Neural Network, specifically tested Simple RNN, Gated Recurrent Unit, and Long Short-Term Memory, Time-series prediction	Dexcom G5 CGM sensor readings	Glucose prediction	30 min ahead prediction RMSE = 21.5 mg/dL, MAPE = 11.1%; 87.9% A + 11.1% B zones on CEGA; 99% predictions clinically acceptable.	GRU > LSTM > RNN; best model = 1 GRU + 2 dense layers (batch 50 + shuffle); ≥50% training data required for personalization; performance stable across subjects.
Pappada, 2020 [46]	UK	To develop a neural-network-based glucose model for predicting future patient glucose levels up to 135 min ahead in ICU settings	Prospective cohort study	ICU	DM1: 8 (6.3%), DM2: 97 (76.4%), No history of DM: 22 (17.3%); Sex: Female 46 (36%) Women: age 62.1 ± 11.0; BMI 35.3±10.0 Men: age 61.5 ± 10.3; BMI 32.5 ±8.1.	127	Admitted to ICU with diagnosis of DM or BG > 8.3 mmol/L	Feed-forward ANN with two hidden layers (15 and 10 nodes) trained on CGM data to predict glucose up to 135 min ahead (5 min intervals). Model weights optimized using Levenberg–Marquardt backpropagation.	CGMS iPro2 (Medtronic) recordings from the first 72 h of ICU admission used as ground truth. Forty-one input features included vital signs (heart rate, respiration rate), laboratory results (lactate, creatinine, WBC count), nutrition status (NPO or tube-feeding rate), insulin delivery data (IV and subcutaneous), inotrope use, and POC glucose values, along with current and historical CGM readings (1 h history; 12 prior CGM values).	Glucose prediction	Successfully predicted CGM hypoglycemia (<4.0 mmol/L) at 53.6%, 34.4%, and 0.0% for 30, 60, and 135 min prediction horizons (PHs); no POC hypoglycemia events observed. Successfully predicted CGM hyperglycemia (>10.0 mmol/L) at 94.4%, 90.7%, and 86.2% for 30, 60, and 135 min PHs; POC hyperglycemia predicted at 74.7%, 66.7%, and 59.8%, respectively. CEGA for predicted CGM glucose values 99.6% zones A + B at 135 min: CEGA for POC glucose 99.4% zones A + B at 135 min.	Feed-forward ANN with two hidden layers (15 and 10 nodes) predicted glucose up to 135 min ahead. Validation set (n = 15 patients): Pearson correlation 0.96–0.97 between predicted and reference glucose values. Mean absolute deviation (MAD%) for predicted vs. CGM: 1.0% (5 min) to 10.6% (135 min); predicted vs. POC: 10.2% (5 min) to 15.9% (135 min). Average error between POC and CGM: 10.0%. ANN accuracy declined with longer prediction horizon; strong performance for short-term forecasting.
Ruan, 2020 [36]	UK	To analyze data from inpatients with diabetes admitted to a large university hospital to predict the risk of hypoglycemia through the use of machine learning algorithms.	Retrospective cross-sectional study	inpatients (hospital-wide, non-ICU-specific)	mean age 66 ± 18 y, 47% female, 91% type 2 diabetes: median LOS 7 days (IQR 3–13)	17,658 (32,758 admissions)	All adult inpatients with a diagnosis of diabetes admitted to Oxford University Hospitals NHS Foundation Trust between January 2014 and December 2018	Supervised machine learning using XGBoost (selected from 18 ML algorithms evaluated for EHR-based hypoglycemia prediction).	42 structured EHR variables including demographics (age, sex, ethnicity), vitals, labs (BG, creatinine, potassium, HbA1c), medications (insulin, oral agents, corticosteroids), admission characteristics, and prior hypoglycemia history. Temporal glucose trends and variability metrics over the prior 24 h included.	Hypoglycemia risk prediction	Incidence of biochemical hypoglycemia (<4 mmol/L) = 21.5%; clinically significant (<3 mmol/L) = 9.6%.	XGBoost AUROC = 0.96 (for both <4 and <3 mmol/L); precision = 0.88; recall = 0.70. Logistic regression AUROC = 0.75. Model improved further when insulin dose and prior hypoglycemia were added.
Fitzgerald, 2021 [25]	United States	To present a data-driven method for predicting ICU patient response to glycemic control protocols while considering variations in patient care.	Retrospective cohort study	ICU	MIMIC-III database (18,691 admissions; medical, surgical, and cardiac ICUs). Adult patients (≥18 y) with ≥3 BG measurements included. Typical cohort demographics: mean age ≈ 63 y, 56% male, mean BMI ≈ 28 kg/m². Retrospective EHR data used to train and validate a 2 h-ahead BG prediction model.	18,961	Admissions with available blood glucose measurements, EHR data recorded via MetaVision	Gradient-boosted tree machine learning algorithm	EHR	Blood glucose prediction	No direct clinical glucose outcomes: study focused on model-based forecasting of 2 h-ahead BG using ICU EHR data.	CatBoost gradient boosting model achieved MAPE 16.5–16.8%, RMSE ≈ 2.8 mg/dL, 95% interval coverage 93–94%, and CEGA 97% in zones A and B. Accuracy highest for BG 5.5–11.1 mmol/L; lower for hypo/hyperglycemia ranges. Performance consistent across surgical, cardiac, and medical ICU subgroups.
Mathioudakis, 2021 [26]	United States	To develop a machine learning model to predict the risk of iatrogenic hypoglycemia within 24 h of each blood glucose measurement during hospitalization.	retrospective cohort study	Non-ICU	Age: 66.0 (56.0–75.0); Male: 27,781; BMI 29.0 (24.6–34.6); Race: White: 30,429 (55.3); Black: 17,806 (32.4). Asian: 2595 (4.7); Other: 4148 (7.5); Weight (kg) 83.0 (68.9–100.2); DM1: 1321 (2.4), DM2: 21,660 (39.4), None: 31,203 (56.8), Other: 794 (1.4)	35,000	Inclusion: ≥4 POC BG measurements during hospitalization, received at least 1 subcutaneous insulin during hospitalization Exclusion: LOS < 24 h; missing weight information, and treatment with IV insulin or insulin pumps, POC BG obtained while in ICU;	Stochastic Gradient Boosting (SGB)	43 clinical predictors of iatrogenic hypoglycemia (unspecified)	Hypoglycemia prediction	Predicted iatrogenic hypoglycemia ≤ 4.0 mmol/L within 24 h after each BG test (3.1% event rate); C = 0.90 internal, 0.86–0.88 external; sensitivity/specificity ≈ 82%; PPV 0.09–0.13; NPV 0.99–1.00; +LR 3.1–4.7; −LR 0.22–0.25.	SGB algorithm using 43 static and time-varying EHR features (basal insulin, BG variability, prior hypoglycemia top predictors); trained on 70% of data, validated internally and externally across five hospitals; false-positive rate 18–27%, false-negative rate 18%; robust discrimination for short-term hypoglycemia risk prediction.
Nguyen, 2021 [27]	United States	To determine whether a machine learning model can predict initial inpatient total daily insulin dose more accurately than existing weight-based guideline dosing.	Retrospective cohort study	ICU and non-ICU	Sex: Female 7497 (44.5%). Weight (kg): 84.1 (24.0)	16,848	Inclusion: Achieved “good” glucose control (≥3 BG measurements within 5.5–10.0 mmol/L on a calendar day without any out-of-range measurements), received SubQ insulin, weight recorded Exclusion: On TPN, PPN, tube feeds, insulin pumps, insulin infusions, or rarely used insulin formulations	Machine learning; Ensemble SuperLearner algorithm (regularized regression, random forest, gradient-boosted trees), Two-stage prediction framework	EHR data including demographics, labs, medications, diagnoses, diet orders, blood glucose measurements, history of basal insulin use	Initial subcutaneous total daily insulin dose prediction; Two-stage: classify low (≤6 units) vs. higher (>6 units) insulin need, and predict specific TDD for higher insulin users	Average time to achieve target glycemic range (5.5–10.0 mmol/L) = 2.2 ± 4.4 days from admission; no direct hypoglycemia data reported.	Two-stage SuperLearner (ensemble regression, random forest, gradient-boosted trees). Stage I AUROC 0.85 vs. 0.57 (weight-only); Stage II MAPE 51% vs. 60% (weight-only) vs. 136–329% (clinical calculator). MAE 12.2 vs. 14 vs. 25.4. Top predictors: weight, prior glucose metrics, diet, creatinine, basal insulin use. Demonstrated robust discrimination and improved accuracy for insulin dosing prediction.
Mantena, 2022 [49]	United States	To develop and validate complex machine learning models predicting hypoglycemia risk using a large, multicenter ICU database.	Retrospective cohort study	ICU	Intervention: Age 62.9 (16.9); Female: 8134 (49.5); African American: 2810 (17.1), Asian: 222 (1.3), Caucasian: 11,511 (70.0), Hispanic: 697 (4.2), Native American: 144 (0.9), Other/unknown: 904 (5.5) Control: Age 63.9 (16.7); Female: 29,440 (44.6); African American: 7690 (11.6), Caucasian: 50,747 (76.9), Hispanic: 2751 (4.2), Other/Unknown: 3016 (4.6)	82,479	Inclusion: ≥2 POC BG readings during their ICU stay (eICU-CRD)	XGBoost	eICU database with information on patient demographics, diagnoses, labs, vitals, medications administered in ICU	Hypoglycemia prediction	19.9% hypoglycemia (<72 mg/dL) across cohort; 38.7% of hypoglycemic patients were non-diabetic; descriptive only, no statistical testing.	XGBoost model AUROC = 0.85; sensitivity = 0.76; specificity = 0.76; precision = 0.44; strong calibration; top predictors included prior hypoglycemia, albumin, creatinine, BG variability, kidney disease, and glucose-lowering therapy; retrospective validation only.
Witte, 2022 [37]	Other: Switzerland	To generate a broadly applicable multiclass classification model for predicting hypoglycemia from patients’ EHR to indicate where adjustments in patient monitoring and therapeutic interventions are required	Retrospective multicenter cohort	Hospitalized, 30% ICU	38,250 adult inpatients (44% women; mean age ≈ 64 years) across six hospitals within the Insel Gruppe network, contributing 63,579 admissions. Approximately 30% were ICU admissions classified as decompensated cases. Median BMI ≈ 27 kg/m²; median hospital stay ≈ 7 days.	38,250	Adults ≥ 18 years with ≥1 lab BG during hospitalization. Eligible if they met ≥1 of the following: DM diagnosis, treatment with any antidiabetic medication, or abnormal glucose levels on lab testing (<4.0 or ≥11.1 mmol/L).	XGBoost	EHR records Patient demographics Medication history Previous glucose events	Predict hypoglycemia	No direct glucose or hypoglycemia rates; retrospective EHR-based prediction of dysglycemia events (median prediction horizon = 7 h for hypo, 4 h for hyper).	Multiclass XGBoost ensemble achieved sensitivity 59% (hypo), 63.6% (hyper), specificity ≈ 94–99%, balanced accuracy ≈ 80%; prediction horizon 4–7 h; demonstrated feasibility of early dysglycemia prediction using routine EHR data.
Fitzgerald, 2023 [28]	United States	To develop and validate a real-time, EHR-based hypoglycemia prediction model integrated with the insulin-ordering process for general inpatients (non-ICU).	Retrospective multicenter cohort	Hospitalized (non-ICU)	Mean age 63 y; 48% female; Race: ≈55% White, 35% Black, remainder other/unknown; Median length of stay: 5 days (IQR 3–9); DM2 ~85%;	45,000 patients (≈60,000 admissions)	Adults ≥ 18 y with ≥1 insulin order and BG measurement; excluded pregnancy, ICU, short stays < 24 h, or incomplete EHR data	Supervised ML ensemble (XGBoost + logistic regression) for real-time hypoglycemia risk prediction	15 EHR variables: age, sex, race, 24 h mean and nadir BG, BG variability (CV), basal/bolus insulin use and dose, creatinine, eGFR, nutrition status (NPO vs. feeding), hospital day, prior hypoglycemia, and steroid/glucose-altering medication use	Hypoglycemia prediction	Predicted hypoglycemia (<4.0 mmol/L) within 24 h of insulin order; 7% event rate.	AUROC 0.88; sensitivity 0.80; specificity 0.84; PPV 0.11; NPV 0.99; calibration Brier score 0.03; externally validated across 4 hospitals
Alkhafaf, 2024 [42]	New Zealand, Hungary, and Belgium	To evaluate a new ANN-based insulin sensitivity (SI) prediction method using in silico simulation	In Silico Validation/Modeling	In silico ICU environment	Demographics not provided	2551 virtual ICU patients derived from the MIMIC-IV dataset	Treated using the STAR protocol	Neural-Network-based Quantile Regression	Blood glucose levels, insulin administration records, and nutrition intake	Insulin sensitivity prediction	In silico simulation over the first 24 h after insulin initiation comparing STAR vs. SPRINT protocols using 2551 virtual ICU patients derived from MIMIC-IV data. STAR achieved lower median BG (5.73 [IQR 5.14–6.43] mmol/L) than SPRINT (6.29 [IQR 5.29–6.83]); mild hypoglycemia (<4.0 mmol/L) more frequent with STAR (85% vs. 49%), while severe (<2.22 mmol/L) rare (2% vs. 0%). No formal significance testing reported.
Gong, 2024 [39]	China	To develop and test multiple machine learning algorithms for predicting nocturnal hypoglycemia in Type 2 diabetes patients.	Retrospective cohort study	Endocrinology and metabolism unit	Intervention: age < 40 years: 67 (11.7), 40–65 years: 239 (41.7), ≥65 years: 257 (46.6); Female: 276 (48.2); Duration of Diabetes <5 years: 359 (62.7) 5–10 years: 39 (6.8) ≥10 years: 175 (30.5). Control: age < 40 years: 385 (11.2), 40–65 years: 1452 (42.2), ≥65 years: 1605 (46.6); Not provided.; Female: 1412 (41.0); Duration of Diabetes <5 years: 1223 (35.5) 5–10 years: 1171 (34.0) ≥10 years: 1048 (30.5)	440	Inclusion: Diagnosis of DM2, aged ≥ 18 years, consented to undergo CGM for ≥24 h. Exclusion: Admission for hypoglycemicia, DKA or HHS, infectious diseases, acute coronary syndromes, malignant tumors, anemia or renal failure, deletion of medical records and data duplication	Logistic regression, random forest, light gradient boosting machine	Age, sex, duration of diabetes, use of oral antidiabetic drugs, insulin, creatinine, uric acid, glycated albumin, aspartate aminotransferase, alanine aminotransferase	Hypoglycemia prediction	14.3% nocturnal hypoglycemia (≤3.9 mmol/L, 00:00–06:00 h); lower mean BG, higher LBGI, and TBR vs. non-hypoglycemia group; descriptive group comparisons only.	LightGBM best performer (AUC = 0.869, specificity = 0.802, recall = 0.797, F1 = 0.255); top predictors: prior TBR, LBGI, M value, duration of diabetes, insulin use before bed; well-calibrated model, no external validation.
Szabó, 2024 [50]	New Zealand, Hungary, and Belgium	To propose three AI-based insulin sensitivity prediction methods to enhance prediction accuracy and optimize model parameters for clinical requirements.	Multicenter retrospective modeling and in silico simulation validation study using ICU glycemic control data	ICU	Demographics not provided	2357 virtual ICU patients derived from real STAR protocol data	≥12 h in the ICU treated by the STAR protocol	Deep neural network, Mixture Density Network. Quantile Regression	Insulin and nutrition inputs	Insulin sensitivity prediction	In silico multicenter modeling study comparing three AI approaches (CDN, MDN, QR) using STAR ICU data. MDN achieved the best insulin sensitivity prediction accuracy (I-Score ≈ 0.92–1.05). Simulated mean glucose: 6.02–6.11 mmol/L vs. 6.18 mmol/L (STAR); % in target 4.4–6.1 mmol/L: 90.7% vs. 87.1%. No significant hypoglycemia reported.	model accuracy for predicting 90% quantile interval of future insulin sensitivity (SI). MDN achieved lowest RMSE (≈0.067), outperforming QR (≈0.071) and CDN (≈0.076). No direct glucose or clinical outcomes reported.
Wright, 2024 [29]	United States	To develop and validate machine learning models that predict inpatient hypoglycemia (<4.0 mmol/L within 24 h) at the time an insulin order is placed, integrating results into clinical decision support.	Retrospective cohort study	Non-ICU	Age (mean): 57; Female: 9006 (43%)	21,052 orders	Inclusion: Age ≥ 18 years, Hospitalization> 24 h with SubQ insulin orders Exclusion: ICU, palliative care units, orders with missing data (blood glucose)	Logistic regression, random forest, extreme gradient boosting (XGBoost)	EHR data including patient characteristics, vitals, diagnoses, labs, medication orders and administrations, diet orders	Hypoglycemia prediction and prevention within 24 h of a new insulin order placed	9% of insulin orders followed by hypoglycemia (< 4.0 mmol/L) within 24 h; descriptive only, no inferential analyses.	Trained on 21,052 insulin orders; logistic regression, random forest, and XGBoost achieved AUCs 0.81, 0.80, and 0.79; sensitivity 0.44–0.49 at PPV 0.30. Key predictors: recent BG trends and insulin dose. Internally validated; no external or prospective validation performed.
Kim, 2024 [48]	South Korea	To develop and test an attention-based model predicting adverse glycemic events 30 min in advance using past glycemic data.	Prospective cohort study	ICU	Sex overall: Female 40 (39%). With hypoglycemia: Age 52.8 ± 12.6; BMI 24.3 ± 5.1. Without hypoglycemia: Age 55.7 ± 14.2; BMI 26.8 ± 4.7	102	Hospital admission with DM2 diagnosis and aged between 20 and 90 years (inclusive of those in ICU)	Deep learning based predictive model using multi-agent reinforcement learning (MARL) for feature selection	CGM, EHR data including insulin administration times, meal intake times	Prediction of adverse glycemic events	Predicted inpatient hypoglycemia (BG < 4.0 mmol/L) within 24 h following insulin orders in adult non-ICU patients. Observed hypoglycemia rate = 9% (1839/21,052 insulin orders). No direct clinical outcomes (e.g., mean glucose or TIR) reported beyond this event rate; data are descriptive only.	Compared logistic regression, random forest, and XGBoost models trained on 21,052 insulin orders from adult non-ICU patients at Vanderbilt University Medical Center (2019). Model discrimination: AUROC = 0.81 (logistic regression), 0.80 (random forest), 0.79 (XGBoost); PPV ≈ 0.30; sensitivity = 0.44–0.49 across models. Most predictive features: last BG value, lowest and average BG in prior 24 h, coefficient of variation in BG, and insulin dose. Models internally validated using 10-fold cross-validation; no external or prospective validation reported.
Mehdizavareh, 2025 [43]	United States	To develop and validate a multi-source irregular time-series transformer model using real-time EHR data to predict blood glucose levels in ICU patients and enable early intervention.	Retrospective cohort study	ICU	Demographics not provided	86,508	Inclusion: ≥6 BG measurements during ICU stay, subsequent readings within 5 min–10 h	Multi-source Irregular Time-Series Transformer	Labs, medications, vital signs, diagnoses (ICD-9/ICD-10 codes), patient demographics, admission information, intake/output records, past medical history, treatment records, infusion data, Glasgow Coma Scale scores, sedation scores	BG prediction (hypoglycemia < 4.0 mmol/L, hyperglycemia > 10.0 mmol/L, euglycemia); Clinical decision support for early intervention	No direct glucose or hypoglycemia event rates reported; outcomes limited to model-based predictions.	Multi-source Irregular Time-Series Transformer (MITST) outperformed Random Forest for predicting ICU hypoglycemia and hyperglycemia. AUROC: 0.915 vs. 0.862 (hypo, +5.3 pp, p < 0.001); 0.909 vs. 0.903 (hyper, +0.6 pp, p < 0.001); macro-average 0.900 vs. 0.883 (+1.7 pp). Sensitivity: 0.841 vs. 0.769 (hypo, +7.2 pp); 0.833 vs. 0.818 (hyper, +1.5 pp). AUPRC 0.247 vs. 0.208; specificity 0.845 vs. 0.829; NPV 0.996 vs. 0.995. Demonstrated improved discrimination and calibration for next-glucose-level classification in ICU patients.
Park, 2025 [40]	Korea	To bridge the gap between individual biomarker predictors and holistic interaction-based approaches by training the CDLD model to predict ICU patient blood glucose levels.	Retrospective cohort study	ICU	Age (mean) 63.2; Female: 2304 (46%)	MIMIC-IV dataset; 5014 patients hospitalized from de-identified dataset; The dataset used in the study/model includes 5001 glucose level measurements from 2551 patients.	At least one abnormal blood glucose reading during hospital stay	Cyclic Dual Latent Discovery (CDLD), a deep learning framework that explicitly models patient–provider interactions to improve prediction of blood glucose levels.	Although the authors describe the full MIMIC-IV dataset, only selected variables (sex, age, peak glucose, and provider ID) were used as model inputs; other EHR modules were not utilized but referenced for data context.	Glucose prediction	No direct glucose or hypoglycemia results; predictions derived from retrospective ICU data only.	CDLD model RMSE = 0.0336 (training), 0.0852 (validation), 0.0898 (test); training loss 0.0037, validation loss 0.0066; demonstrated high predictive accuracy and generalization for discharge glucose prediction using patient–provider interaction modeling.
Symeonidis, 2025 [41]	United States	To develop and validate a DQN (DQN) algorithm for predicting optimal insulin doses and glucose levels in ICU patients to improve glucose control and reduce hypoglycemia risk.	Retrospective cohort study	ICU	Age (mean): 65.5; Female 41.4%	2493	Initial ICU admission, diabetes diagnosis or receiving insulin therapy, at least one type of insulin administered, following SSI protocol exclusively, short/rapid acting insulin only, minimum 10 glucose/insulin entries per patient for model training	Reinforcement learning (primary), Deep learning (DQN with neural network)	EHR, labs, vitals, demographics, glucose values, insulin doses	IV insulin, glucose control (hyperglycemia prevention)	DQN achieved MAE = 12.16 mg/dL, RMSE = 15.42 mg/dL, Time in Range = 90.79% (4.4–6.1 mmol/L) vs. 87.14% with linear regression; p < 0.05; no increase in hypoglycemia reported.	Insulin-dose MAE = 1.99 units, RMSE = 2.27 units; improved 2.5% MAE and 6.6% RMSE vs. linear regression; requires DQN with neural network, experience replay, and k-NN selection.
Ying, 2025 [22]	China	To evaluate whether an AI-based insulin clinical decision support system (NCDSS) for hospitalized Type 2 diabetes patients achieves noninferior glycemic control compared to standard insulin therapy by physicians.	RCT	Endocrinology and metabolism unit	Intervention: Age (mean) 63.5; Female: 30 (40%) Control: Age (mean) 65; Female: 35 (47%)	149 randomized (75 intervention, 74 control)	Inclusion: ≥18 years old, DM2 with A1C 7.0–11.0%, on diet/oral antidiabetic/insulin therapy in last 3 months Exclusion: Acute complications of diabetes, BMI ≥ 45, pregnancy or breastfeeding, severe cardiac, hepatic, or kidney diseases, psychiatric or psychological diseases, severe edema, infections, or peripheral circulation disorders, surgery during hospitalization	Machine learning–driven insulin clinical decision support system (iNCDSS)	Capillary blood glucose measurements, patient clinical characteristics, HER	Basal insulin, Basal-bolus insulin and pre-mixed insulin	TIR 4.0–10.0 mmol/L = 76.4% vs. 73.6% (p = 0.33, noninferior); mean glucose ≈ 140 vs. 144 mg/dL; no severe hypoglycemia (<40 mg/dL) or ketoacidosis; time < 54 mg/dL = 0%; insulin dose 27 U vs. 30 U (p = 0.01, ns after adjustment).	AI-based iNCDSS provided real-time insulin titration across multiple regimens; 98.9% of AI recommendations accepted; physicians rated clarity 4.6/5, safety 4.4/5; system integrated seamlessly and achieved noninferior glycemic outcomes to expert endocrinologists.

Footnotes: AI = artificial intelligence; AUC = area under the curve; AUROC = area under the receiver operating characteristic; AUPRC = area under the precision–recall curve; BG = blood glucose; BMI = body mass index; CGM = continuous glucose monitoring; CEGA = Clarke error grid analysis; CDLD = cyclic dual latent discovery; CV = coefficient of variation; DM = diabetes mellitus; DM1 = type 1 diabetes mellitus; DM2 = type 2 diabetes mellitus; DQN = deep Q-network; EHR = electronic health record; eMPC = enhanced model predictive control; GRU = gated recurrent unit; ICU = intensive care unit; IQR = interquartile range; IV = intravenous; LSTM = long short-term memory network; MAD = mean absolute deviation; MAPE = mean absolute percentage error; MAE = mean absolute error; MDN = mixture density network; MIMIC = Medical Information Mart for Intensive Care; ML = machine learning; MITST = multi-source irregular time-series transformer; MPC = model predictive control; NCDSS = neural network clinical decision support system; NPV = negative predictive value; POC = point-of-care; PPV = positive predictive value; QR = quantile regression; RNN = recurrent neural network; RMSE = root mean square error; ROC = receiver operating characteristic; SGB = stochastic gradient boosting; SI = insulin sensitivity; STAR = stochastic targeted control; SubQ = subcutaneous; TBR = time below range; TDD = total daily dose; TIR = time in range.

3.3. Models

Performance metrics varied by study design, prediction horizon, and target outcome (e.g., glucose prediction vs. insulin dosing vs. hypoglycemia detection). Across studies reporting model discrimination, Area Under the ROC Curve (AUROC) values ranged from 0.80 to 0.96, with most exceeding 0.85, demonstrating strong predictive capacity for inpatient glucose or insulin outcomes.

Among neural-network-based models, Artificial Neural Networks achieved high short-horizon accuracy, with Pappada (2020) reporting a mean absolute difference (MAD) between predicted and CGM glucose values of 1.0–10.6% across 5–135 min prediction horizons and a Clarke Error Grid Analysis (CEGA) of 99.6% within zones A and B [46]. Similarly, Kim (2020) demonstrated robust classification of hypo-, normo-, and hyperglycemia using a multi-agent reinforcement learning approach, achieving F1 scores of 60.6%, 89.0%, and 89.8%, respectively [38].

Tree-based ensemble models demonstrated comparable performance. Ruan (2020) reported AUROC 0.85 for predicting inpatient hypoglycemia [36], while Mantena (2022) achieved AUROC 0.85, sensitivity 0.76, and specificity 0.76 for hypoglycemia detection using an XGBoost classifier trained on eICU-CRD data [49]. In more complex implementations, Fitzgerald (2021) and Fitzgerald (2023) used gradient boosting and logistic regression ensembles to predict insulin requirements, achieving AUROC 0.83–0.86 across internal and external validations [25,28].

Studies employing reinforcement learning and hybrid architectures demonstrated moderate to high predictive accuracy but varied in validation rigor. Symeonidis (2025) compared multiple RL-derived insulin dosing models, with success rates exceeding 90% and I-scores > 1.0 across three forecast intervals [41]. Mehdizavareh (2025) introduced the MITST-based architecture, improving macro-average AUROC from 0.883 (baseline random forest) to 0.900 and AUPRC from 0.208 to 0.247 for next-glucose-level classification, highlighting the growing strength of transformer models in ICU settings [43].

Only a minority of studies evaluated algorithm calibration or real-time usability. Wright (2024) assessed a clinical decision support model’s positive predictive value (PPV) for hypoglycemia alerts (PPV = 0.30) but emphasized the need for in-practice validation [29]. Across all studies, internal validation was common, while external validation was reported in just five studies [26,28,30,36,37], underscoring the early developmental phase of most algorithms. AI models used across studies and their clinical application appear in Table 2.

Figure 2 illustrates the evolution of computational approaches over time, highlighting the early predominance of MPC-based insulin dosing systems and the subsequent emergence of neural network, tree-based, and reinforcement learning models in the prediction and hypoglycemia-detection literature.

3.4. Data Inputs and Sources

Across the included studies, data inputs varied widely in scope and clinical depth, reflecting both the evolution of available hospital data and the expanding capabilities of AI methods. Early IV insulin and model predictive control studies relied primarily on POC glucose measurements and insulin infusion rates as core inputs, occasionally incorporating basic patient demographics [31,33,34]. Later model development studies integrated multimodal EHR data, including demographics, comorbidities, medications, laboratory values (e.g., serum creatinine, lactate, white blood cell count), and vital signs (heart rate, respiratory rate, and blood pressure) to improve predictive precision [26,28,30,36]. Several models explicitly incorporated nutritional variables such as NPO status, enteral or parenteral feeding rates, and timing of meals, recognizing their influence on inpatient glucose fluctuations [27,46].

CGM data were used as either primary input or ground truth validation in five studies, supporting fine-grained temporal predictions unavailable through routine POC testing [29,40,42,43,46].

Across the included studies, seven distinct data sources were identified, with the Medical Information Mart for Intensive Care (MIMIC) family of critical care databases (Beth Israel Deaconess Medical Center, Boston, MA, USA) being most frequently used. Four studies leveraged the MIMIC-III database either as the primary source of ICU data or for comparative validation, reflecting its continued value as a benchmark for inpatient AI model development [25,38,41,49]. Two more recent studies utilized the updated MIMIC-IV dataset, which includes a broader and more temporally recent ICU cohort [40,42].

3.5. Glucose Outcomes

Among the 26 studies, 12 reported direct glucose or glycemic control outcomes, most often in conjunction with model performance metrics. Early RCTs of rule-based MPC algorithms achieved mean glucose values between 5.7 and 6.2 mmol/L and demonstrated significantly improved time in target range compared to standard protocols [31,33,34]. Hypoglycemia events were rare in these trials (<5%), reflecting the safety advantages of closed-loop control systems during IV insulin infusion.

Later AI-based models primarily focused on glucose prediction rather than direct intervention, with results expressed as prediction error or CEGA accuracy. Pappada (2020) reported successful prediction of CGM hypoglycemia (<4.0 mmol/L) at 53.6%, 34.4%, and 0% accuracy for 30, 60, and 135 min horizons, respectively [46]. Alkhafaf (2024) validated an in silico MPC model, achieving median glucose of 5.73 mmol/L (IQR 5.14–6.43) with minimal (2%) severe hypoglycemia (<2.22 mmol/L) [42]. In large EHR-based models, predictive discrimination for inpatient hypoglycemia and hyperglycemia remained high (AUROC 0.85–0.92), but translation to improved bedside outcomes has yet to be demonstrated [30,36].

Notably, none of the modern AI studies deployed in real-world clinical environments demonstrated significant improvements in glycemic control or reduction in hypoglycemia rates compared to standard practice, reflecting the current implementation gap between in silico and clinical validation.

3.6. Implementation and Feasibility Outcomes

Only a small number of studies reported usability, workflow integration, or implementation outcomes. Among the early model predictive control trials, bedside feasibility and nursing workload were key considerations; both Plank (2006) and Amrein (2010) reported that automated insulin infusion systems were well tolerated, required minimal manual overrides, and did not increase nursing burden [31,33]. Kopecký (2013) similarly demonstrated safe use of CGM-integrated MPC in a cardiac ICU, with continuous algorithm monitoring feasible for clinical staff [34].

In later AI-driven studies, implementation outcomes primarily focused on decision support integration and alert usability. Wright (2024) evaluated a hypoglycemia alert model within an EHR-based clinical decision support system, achieving a positive predictive value (PPV) of 0.30 and highlighting the need for prospective workflow validation [29]. Fitzgerald (2023) and Ying (2025) discussed model interpretability and clinician acceptance as critical facilitators of adoption, whereas limited generalizability and data dependency were identified as barriers to deployment [28,30]. None of the included studies reported major safety concerns or adverse events attributable to algorithm use.

Methodological Quality

To summarize methodological variability and quality across included studies, we extracted key design and reporting characteristics relevant to study quality. Table 3 provides an overview of common strengths and limitations observed across the evidence base.

4. Discussion

Across the 26 included studies, reported glycemic outcomes reflected both the methodological evolution of inpatient insulin systems and shifting clinical targets over time. Early rule-based and MPC trials, many of which were randomized and conducted in ICU settings, consistently demonstrated improved time in target glucose range and lower mean glucose compared with standard care [30,33,34]. These studies targeted tight glycemic control (typically 4.4–6.1 mmol/L), creating greater contrast with contemporaneous usual care protocols and thereby amplifying observed efficacy. In contrast, later studies, most of which were retrospective model development or validation efforts, focused primarily on glucose prediction and hypoglycemia detection rather than direct insulin delivery. While these models achieved excellent predictive performance (AUROC > 0.85 across most studies), few reported corresponding improvements in glycemic outcomes, reflecting the current translational gap between predictive capability and real-world clinical benefit.

A key distinction emerges between closed-loop insulin control systems and predictive-only models: the former directly influence glucose trajectories and thus report measurable physiologic outcomes, whereas the latter aim to anticipate or prevent dysglycemia without automated actuation. Notably, models employing artificial neural networks and reinforcement learning demonstrated the highest short-horizon predictive accuracy but were rarely linked to clinical endpoints, underscoring a methodological divide between algorithmic performance and physiologic validation. Recent outpatient-focused literature further illustrates this disconnect. Several state-of-the-art glucose prediction frameworks, including hybrid mechanistic-ML approaches [51], transformer-based multi-resolution forecasting models [52], and other deep learning architectures developed for trend detection and postprandial prediction, demonstrate impressive reductions in short-term prediction error in free-living settings. However, these models are almost uniformly evaluated outside the hospital environment and are not linked to autonomous insulin actuation or clinical outcomes. Collectively, these findings highlight that improvements in model sophistication have not yet translated into equivalent gains in inpatient glycemic control, emphasizing the need for prospective implementation studies that bridge predictive modeling with actionable insulin delivery.

The variability in populations among the represented studies has direct implications for the generalizability and reliability of AI-based glucose management models. Models trained on heterogeneous inpatient cohorts that include both diabetic and non-diabetic individuals may achieve strong performance metrics but risk obscuring subgroup-specific patterns relevant to clinical decision-making. Conversely, models restricted to patients with type 2 diabetes or to those receiving insulin therapy may demonstrate higher within-group accuracy but limited external applicability to broader hospital populations.

Notably, Gong (2024) and Ying (2025) developed and validated their algorithms in cohorts of patients with type 2 diabetes admitted solely for inpatient glucose control, without concomitant acute or critical illness [30,39]. This design reflects a unique clinical context, uncommon in the United States and Europe, where hospitalization may be used for optimization of outpatient glycemic control. As a result, these models may not generalize to typical inpatient populations, in which hyperglycemia typically occurs in the setting of acute medical or surgical conditions, variable nutrition, and multiple concurrent therapies [53].

Moreover, the predominance of single-center retrospective datasets and in silico simulations restricts model adaptability to differing institutional workflows, laboratory practices, and patient demographics. Prospective and multicenter validations were rare, meaning most models have not been tested under real-world variation in care delivery. Together, these factors underscore the importance of future studies designing representative training datasets and incorporating cross-site validation to ensure that AI systems perform reliably across diverse inpatient settings and glucose phenotypes.

Despite rapid progress in AI-driven diabetes management for outpatients, no commercially available inpatient insulin management systems currently use AI. Systems such as Glucommander, Space GlucoseControl, and EndoTool rely on rule-based or proportional–integral–derivative (PID) algorithms with static decision rules rather than adaptive machine learning models [14,15,54]. Computerized insulin decision support systems have been shown to improve glycemic control and standardize insulin delivery, reducing calculation errors and nursing workload [55]. However, their performance remains constrained by dependence on intermittent point-of-care (POC) testing, which introduces inherent delays and variability.

Outcomes from implementation studies illustrate these trade-offs. In a burn ICU evaluation of EndoTool, time in target range (4.4–6.1 mmol/L) improved from 41% to 47%, yet there was no significant reduction in hypoglycemia (<4.4 mg/dL, <3.3 mg/dL, or <40 mg/dL) compared with prior standard care [56]. Similarly, across hospital implementations of Glucommander and other rule-based protocols, mean glucose decreased and the proportion of values within the target range increased (71.0% to 51.3%), but hypoglycemia <3.3 mmol/L and <2.8 mmol/L was present in 42.9% and 3.9%, respectively [15,57]. Although effective for protocol standardization, these systems lack the continuous learning, adaptability, and individualized precision that define AI. This fundamental limitation—static modeling applied to dynamic physiologic systems—underscores why, despite decades of progress, inpatient glucose management remains largely reactive rather than predictive.

A clear temporal pattern emerged across the 26 included studies. The early 2000s saw a surge of research in tight glucose control following the landmark study by Van den Berghe et al. (2001), which demonstrated reduced mortality, infections, and length of stay with intensive insulin therapy in surgical ICU patients [3]. Subsequent replication attempts, including the NICE-SUGAR trial (2009), failed to reproduce these benefits and instead reported increased mortality with very tight glucose targets [58]. This reversal reshaped clinical guidelines toward more conservative inpatient targets (7.8–10.0 mmol/L) [11] and led to nearly a decade of diminished AI inpatient research activity between 2010 and 2020. The recent resurgence of studies since 2020 parallels both the emergence of AI accessibility (e.g., deep learning frameworks, open datasets) and the renewed push toward CGM adoption in hospitals, which provides the continuous data streams necessary for AI model development. This historical context is critical in understanding why most studies identified are developmental rather than clinical implementation trials.

The majority of included studies focused on model development and internal validation, with only a small subset performing pilot or RCT testing, often limited in sample size and single-site design. Common architectures included neural networks, reinforcement learning, and hybrid approaches integrating EHR and CGM data. None of these models has yet achieved regulatory clearance or commercial implementation. The current evidence base is therefore best viewed as foundational, reflecting proof-of-concept work rather than clinical readiness. Translation into practice will require external validation across diverse institutions, integration with real-time hospital systems, and human factors research to evaluate clinician trust and workflow fit.

A recurring limitation across studies is dependence on CGM data for model training and performance. Several studies used POC glucose as inputs, with one study using as few as two POC glucose values per day [49]. Intermittent sampling fails to capture glycemic dynamics, limiting predictive accuracy. Continuous, high-resolution glucose data are essential for both model learning and real-time decision support.

This raises an important implementation dependency, in that AI-based insulin algorithms cannot scale clinically in the inpatient setting as they have in ambulatory environments until CGM is routinely deployed in inpatient settings. Expansion of inpatient CGM therefore represents not only a technological advancement but also a prerequisite requirement for the next generation of hospital AI decision support. From a regulatory perspective, no AI insulin dosing system has yet achieved FDA clearance for inpatient use. The need for real-time model transparency, validation across heterogeneous populations, and cybersecurity oversight presents unique challenges. Furthermore, algorithm interpretability remains a major barrier to clinician trust, especially in high-acuity environments where safety and explainability are paramount [46].

Only a small number of studies included usability or implementation outcomes, and none incorporated formal co-design with end users. This represents a critical gap, as the bedside nurse or other frontline healthcare worker remains central to the operation, oversight, and trustworthiness of any inpatient glucose management system. AI-based insulin or glucose control tools must integrate seamlessly within complex hospital workflows where time pressures, staffing constraints, and alarm fatigue already challenge adoption of new technologies. Evidence from related clinical decision support and alarm design efforts underscores that early end-user involvement is essential to ensure usability, safety, and sustained engagement. Future AI systems for inpatient glycemic management will require iterative design with nursing and interprofessional input to move from high-performing algorithms to practical, trustworthy tools embedded in routine care.

Although formal risk-of-bias assessments are not required for scoping reviews, the included studies exhibited several recurring methodological limitations that influence interpretability. Most studies were retrospective, single-center, or in silico only, with limited prospective or real-time clinical validation. Reporting of clinical outcomes such as hypoglycemia, TIR, or length of stay was inconsistent, and many ML models lacked external validation or calibration assessment. These factors, combined with heterogeneity in data sources, patient populations, and outcome definitions, limit the ability to compare models directly or draw firm conclusions regarding clinical effectiveness. As a result, the current evidence base should be viewed as exploratory, highlighting promising directions rather than definitive clinical recommendations.

The field of inpatient AI for glycemic management is advancing rapidly yet remains at an early stage of clinical translation. These results are consistent with contemporary perspectives on artificial intelligence in diabetes technology. As reviewed by Jacobs et al., AI-driven methods are increasingly shaping approaches to automated glycemic control, and the field is moving toward solutions that integrate predictive modeling, clinical interpretability, and workflow-compatible automation [23]. To move beyond proof-of-concept modeling, future inpatient studies must integrate prospective, multicenter validation with interoperable clinical decision support systems capable of functioning across diverse EHR infrastructures. The development of transparent, interpretable models, particularly those co-designed with frontline nursing and medical staff, will be essential to foster clinician trust and safe implementation. Expansion of CGM in hospital settings will further enable real-time model evaluation and closed-loop insulin delivery, while standardized reporting of model performance, usability, and clinical outcomes will enhance comparability across studies. Interdisciplinary collaborations between clinicians, engineers, data scientists, and implementation scientists will be critical to advance AI systems from algorithmic performance to measurable patient and workflow outcomes.

5. Conclusions

This scoping review provides a comprehensive synthesis of artificial intelligence applications for insulin management and glycemic control in hospitalized patients. Over two decades of research demonstrate a clear shift from early rule-based control systems to data-driven models emphasizing glucose prediction and hypoglycemia detection. Despite strong algorithmic performance, clinical implementation remains limited, and few models have demonstrated direct impact on patient outcomes. Moving forward, co-designed, validated, and interpretable AI systems will be central to achieving safe and effective integration of predictive analytics into hospital glycemic care, ultimately improving both patient outcomes and clinician experience.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/diabetology7010019/s1, Supplemental Material S1: Scoping Review Protocol.

Author Contributions

Conceptualization, E.R.F. and M.N.R.; methodology and protocol, E.R.F., M.N.R., K.M.D., T.A. and E.P.; abstract and full-text review, E.R.F., M.N.R. and M.M.; writing—original draft preparation, E.R.F. and M.N.R.; writing—review and editing, all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

K.M.D. discloses research support from Insulet, Endogenex, Sequel/Med Tech, and Dexcom; consulting from Eli Lilly and Dexcom; and honorarium from UptoDate, Elsevier, Academy for Continued Healthcare Learning, and Impact Education. E.R.F. discloses research support from Dexcom and Insulet, consulting from Dexcom, and honorarium from Dexcom and Medscape.

References

Pasquel, F.J.; Lansang, M.C.; Dhatariya, K.; Umpierrez, G.E. Management of diabetes and hyperglycaemia in the hospital. Lancet Diabetes Endocrinol. 2021, 9, 174–188. [Google Scholar] [CrossRef]
Dhatariya, K.; Umpierrez, G.E. Management of Diabetes and Hyperglycemia in Hospitalized Patients. In Endotext; Feingold, K.R., Adler, R.A., Ahmed, S.F., Anawalt, B., Blackman, M.R., Boyce, A., Braverman, L.E., Buse, J.B., Christakis, I., Correa, R., et al., Eds.; MDText.com, Inc.: South Dartmouth, MA, USA, 2000. Available online: https://www.ncbi.nlm.nih.gov/books/NBK279093/ (accessed on 20 October 2024).
Van den Berghe, G.; Wouters, P.; Weekers, F.; Verwaest, C.; Bruyninckx, F.; Schetz, M.; Vlasselaers, D.; Ferdinande, P.; Lauwers, P.; Bouillon, R. Intensive insulin therapy in critically ill patients. N. Engl. J. Med. 2001, 345, 1359–1367. [Google Scholar] [CrossRef]
American Diabetes Association. 15. Diabetes Care in the Hospital: Standards of Medical Care in Diabetes-2021. Diabetes Care 2021, 44, S211–S220. [Google Scholar] [CrossRef]
Clement, S.; Braithwaite, S.S.; Magee, M.F.; Ahmann, A.; Smith, E.P.; Schafer, R.G.; Hirsch, I.B.; on behalf of the Diabetes in Hospitals Writing Committee. Management of diabetes and hyperglycemia in hospitals. Diabetes Care 2004, 27, 553–591. [Google Scholar] [CrossRef]
Moghissi, E.S.; Korytkowski, M.T.; DiNardo, M.; Einhorn, D.; Hellman, R.; Hirsch, I.B.; Inzucchi, S.E.; Ismail-Beigi, F.; Kirkman, M.S.; Umpierrez, G.E. American Association of Clinical Endocrinologists and American Diabetes Association consensus statement on inpatient glycemic control. Diabetes Care 2009, 32, 1119–1131. [Google Scholar] [CrossRef]
Bogun, M.; Inzucchi, S.E. Inpatient management of diabetes and hyperglycemia. Clin. Ther. 2013, 35, 724–733. [Google Scholar] [CrossRef]
Dhatariya, K.; Mustafa, O.G.; Rayman, G. Safe care for people with diabetes in hospital. Clin. Med. 2020, 20, 21–27. [Google Scholar] [CrossRef] [PubMed]
Umpierrez, G.E.; Isaacs, S.D.; Bazargan, N.; You, X.; Thaler, L.M.; Kitabchi, A.E. Hyperglycemia: An independent marker of in-hospital mortality in patients with undiagnosed diabetes. J. Clin. Endocrinol. Metab. 2002, 87, 978–982. [Google Scholar] [CrossRef] [PubMed]
Galindo, R.J.; Migdal, A.L.; Davis, G.M.; Urrutia, M.A.; Albury, B.; Zambrano, C.; Vellanki, P.; Pasquel, F.J.; Fayfman, M.; Peng, L.; et al. Comparison of the FreeStyle Libre Pro Flash Continuous Glucose Monitoring (CGM) System and Point-of-Care Capillary Glucose Testing in Hospitalized Patients with Type 2 Diabetes Treated with Basal-Bolus Insulin Regimen. Diabetes Care 2020, 43, 2730–2735. [Google Scholar] [CrossRef] [PubMed]
American Diabetes Association Professional Practice Committee; ElSayed, N.A.; McCoy, R.G.; Aleppo, G.; Balapattabi, K.; Beverly, E.A.; Early, K.B.; Bruemmer, D.; Echouffo-Tcheugui, J.B.; Ekhlaspour, L.; et al. 16. Diabetes Care in the Hospital: Standards of Care in Diabetes—2025. Diabetes Care 2025, 48, S321–S334. [Google Scholar] [CrossRef]
Parker, E.D.; Lin, J.; Mahoney, T.; Ume, N.; Yang, G.; Gabbay, R.A.; ElSayed, N.A.; Bannuru, R.R. Economic Costs of Diabetes in the U.S. in 2022. Diabetes Care 2024, 47, 26–43. [Google Scholar] [CrossRef]
Honarmand, K.M.; Sirimaturos, M.P.; Hirshberg, E.L.M.; Bircher, N.G.M.; Agus, M.S.D.M.; Carpenter, D.L.P.-C.; Downs, C.R.; Farrington, E.A.P.; Freire, A.X.M.; Grow, A.; et al. Society of Critical Care Medicine Guidelines on Glycemic Control for Critically Ill Children and Adults 2024. Crit. Care Med. 2024, 52, e161–e181. [Google Scholar] [CrossRef]
Wysocki, T.; Taylor, A.; Hough, B.S.; Linscheid, T.R.; Yeates, K.O.; Naglieri, J.A. Deviation from developmentally appropriate self-care autonomy: Association with diabetes outcomes. Diabetes Care 1996, 19, 119–125. [Google Scholar] [CrossRef]
Davidson, P.C.; Steed, R.D.; Bode, B.W. Glucommander: A computer-directed intravenous insulin system shown to be safe, simple, and effective in 120,618 h of operation. Diabetes Care 2005, 28, 2418–2423. [Google Scholar] [CrossRef]
Iftikhar, M.M.; Saqib, M.M.; Qayyum, S.N.M.; Asmat, R.M.; Mumtaz, H.M.; Rehan, M.M.; Ullah, I.M.; Ud-Din, I.M.; Noori, S.; Khan, M.M.; et al. Artificial intelligence-driven transformations in diabetes care: A comprehensive literature review. Ann. Med. Surg. 2024, 86, 5334–5342. [Google Scholar] [CrossRef] [PubMed]
Kudva, Y.C.; Raghinaru, D.; Lum, J.W.; Graham, T.E.; Liljenquist, D.; Spanakis, E.K.; Pasquel, F.J.; Ahmann, A.; Ahn, D.T.; Aleppo, G.; et al. A Randomized Trial of Automated Insulin Delivery in Type 2 Diabetes. N. Engl. J. Med. 2025, 392, 1801–1812. [Google Scholar] [CrossRef] [PubMed]
Wallia, A.; Umpierrez, G.E.; Rushakoff, R.J.; Klonoff, D.C.; Rubin, D.J.; Golden, S.H.; Cook, C.B.; Thompson, B.; The DTS Continuous Glucose Monitoring in the Hospital Panel. Consensus Statement on Inpatient Use of Continuous Glucose Monitoring. J. Diabetes Sci. Technol. 2017, 11, 1036–1044. [Google Scholar] [CrossRef]
Van Steen, S.C.J.; Rijkenberg, S.; Limpens, J.; Van der Voort, P.H.J.; Hermanides, J.; DeVries, J.H. The Clinical Benefits and Accuracy of Continuous Glucose Monitoring Systems in Critically Ill Patients—A Systematic Scoping Review. Sensors 2017, 17, 146. [Google Scholar] [CrossRef]
Yu, Z.; Long, J. Review on advanced model predictive control technologies for high-power converters and industrial drives. Electronics 2024, 13, 4969. [Google Scholar] [CrossRef]
Qin, S.; Badgwell, T.A. A survey of industrial model predictive control technology. Control. Eng. Pract. 2003, 11, 733–764. [Google Scholar] [CrossRef]
Ying, Z.; Li, X.; Chen, Y. Artificial intelligence in glycemic management for diabetes: Applications, opportunities and challenges. J. Transl. Intern. Med. 2025, 13, 314–317. [Google Scholar] [CrossRef]
Jacobs, P.G.; Herrero, P.; Facchinetti, A.; Vehi, J.; Kovatchev, B.; Breton, M.D.; Cinar, A.; Nikita, K.S.; Doyle, F.J.; Bondia, J.; et al. Artificial Intelligence and Machine Learning for Improving Glycemic Control in Diabetes: Best Practices, Pitfalls, and Opportunities. IEEE Rev. Biomed. Eng. 2024, 17, 19–41. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Fitzgerald, O.; Perez-Concha, O.; Gallego, B.; Saxena, M.K.; Rudd, L.; Metke-Jimenez, A.; Jorm, L. Incorporating real-world evidence into the development of patient blood glucose prediction algorithms for the ICU. J. Am. Med. Inform. Assoc. 2021, 28, 1642–1650. [Google Scholar] [CrossRef] [PubMed]
Mathioudakis, N.N.; Abusamaan, M.S.; Shakarchi, A.F.; Sokolinsky, S.; Fayzullin, S.; McGready, J.; Zilbermint, M.; Saria, S.; Golden, S.H. Development and Validation of a Machine Learning Model to Predict Near-Term Risk of Iatrogenic Hypoglycemia in Hospitalized Patients. JAMA Netw. Open 2021, 4, e2030913. [Google Scholar] [CrossRef]
Nguyen, M.; Jankovic, I.; Kalesinskas, L.; Baiocchi, M.; Chen, J.H. Machine learning for initial insulin estimation in hospitalized patients. J. Am. Med. Inform. Assoc. 2021, 28, 2212–2219. [Google Scholar] [CrossRef] [PubMed]
Fitzgerald, O.; Perez-Concha, O.; Gallego-Luxan, B.; Metke-Jimenez, A.; Rudd, L.; Jorm, L. Continuous time recurrent neural networks: Overview and benchmarking at forecasting blood glucose in the intensive care unit. J. Biomed. Inform. 2023, 146, 104498. [Google Scholar] [CrossRef]
Wright, A.P.; Embi, P.J.; Nelson, S.D.; Smith, J.C.; Turchin, A.; Mize, D.E. Development and Validation of Inpatient Hypoglycemia Models Centered Around the Insulin Ordering Process. J. Diabetes Sci. Technol. 2024, 18, 423–429. [Google Scholar] [CrossRef]
Ying, Z.; Fan, Y.; Chen, C. Real-time Artificial Intelligence Assisted 1 Insulin Titration System for Glucose Control in Patients with Type 2 Diabetes: A Randomized Controlled Study. JAMA Netw. Open 2025, 8, e258910. [Google Scholar] [CrossRef] [PubMed]
Plank, J.; Blaha, J.; Cordingley, J.; Wilinska, M.E.; Chassin, L.J.; Morgan, C.; Squire, S.; Haluzik, M.; Kremen, J.; Svacina, S.; et al. Multicentric, randomized, controlled trial to evaluate blood glucose control by the model predictive control algorithm versus routine glucose management protocols in intensive care unit patients: Response to Ligtenberg et al. Diabetes Care 2006, 29, 1987–1988. [Google Scholar] [CrossRef][Green Version]
Pachler, C.; Plank, J.; Weinhandl, H.; Chassin, L.J.; Wilinska, M.E.; Kulnik, R.; Kaufmann, P.; Smolle, K.-H.; Pilger, E.; Pieber, T.R.; et al. Tight glycaemic control by an automated algorithm with time-variant sampling in medical ICU patients. Intensiv. Care Med. 2008, 34, 1224–1230. [Google Scholar] [CrossRef]
Amrein, K.; Ellmerer, M.; Hovorka, R.; Kachel, N.; Parcz, D.; Korsatko, S.; Smolle, K.; Perl, S.; Bock, G.; Doll, W.; et al. Hospital glucose control: Safe and reliable glycemic control using enhanced model predictive control algorithm in medical intensive care unit patients. Diabetes Technol. Ther. 2010, 12, 405–412. [Google Scholar] [CrossRef]
Kopecký, P.; Mráz, M.; Bláha, J.; Lindner, J.; Svačina, Š.; Hovorka, R.; Haluzík, M. The use of continuous glucose monitoring combined with computer-based eMPC algorithm for tight glucose control in cardiosurgical ICU. BioMed Res. Int. 2013, 2013, 186439. [Google Scholar] [CrossRef]
Benyó, B.; Palancz, B.; Szlávecz, Á.; Stewart, K.; Homlok, J.; Pretty, C.G.; Chase, J.G. Unsupervised classification-based analysis of the temporal pattern of insulin sensitivity and modelling noise of patient groups under tight glycemic control. IFAC-PapersOnLine 2018, 51, 62–67. [Google Scholar] [CrossRef]
Ruan, Y.; Bellot, A.; Moysova, Z.; Tan, G.D.; Lumb, A.; Davies, J.; van der Schaar, M.; Rea, R. Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records. Diabetes Care 2020, 43, 1504–1511. [Google Scholar] [CrossRef]
Witte, H.; Nakas, C.; Bally, L.; Leichtle, A.B. Machine Learning Prediction of Hypoglycemia and Hyperglycemia From Electronic Health Records: Algorithm Development and Validation. JMIR Form. Res. 2022, 6, e36176. [Google Scholar] [CrossRef] [PubMed]
Kim, D.-Y.; Choi, D.-S.; Kim, J.; Chun, S.W.; Gil, H.-W.; Cho, N.-J.; Kang, A.R.; Woo, J. Developing an Individual Glucose Prediction Model Using Recurrent Neural Network. Sensors 2020, 20, 6460. [Google Scholar] [CrossRef]
Gong, C.; Cai, T.; Wang, Y.; Xiong, X.; Zhou, Y.; Zhou, T.; Sun, Q.; Huang, H. Development and Validation of a Nocturnal Hypoglycaemia Risk Model for Patients with Type 2 Diabetes Mellitus. Nurs. Open 2024, 11, e70055. [Google Scholar] [CrossRef]
Park, S.; Kim, S.; Rim, D. Cyclic dual latent discovery for improved blood glucose prediction through patient–provider interaction modeling: A prediction study. Ewha Med. J. 2025, 48, e34. [Google Scholar] [CrossRef] [PubMed]
Symeonidis, P.; Rizos, E.; Andras, C.; Hairistanidis, S.; Manolopoulos, Y.; Zanker, M. Deep reinforcement learning for personalized insulin dosing and glucose control of hospitalized in ICU patients. Int. J. Data Sci. Anal. 2025, 20, 6841–6854. [Google Scholar] [CrossRef]
Alkhafaf, O.S.; Alsultani, A.; Roel, A.N.; Szabó, B.; Pintár, P.; Szlávecz, Á.; Paláncz, B.; Kovács, K.; Chase, J.G.; Benyó, B. In-silico validation of insulin sensitivity prediction by neural network-based quantile regression. IFAC-PapersOnLine 2024, 58, 368–373. [Google Scholar] [CrossRef]
Mehdizavareh, H.; Khan, A.; Cichosz, S.L. Enhancing glucose level prediction of ICU patients through hierarchical modeling of irregular time-series. Comput. Struct. Biotechnol. J. 2025, 27, 2898–2914. [Google Scholar] [CrossRef]
DeJournett, L.; DeJournett, J. In Silico Testing of an Artificial-Intelligence-Based Artificial Pancreas Designed for Use in the Intensive Care Unit Setting. J. Diabetes Sci. Technol. 2016, 10, 1360–1371. [Google Scholar] [CrossRef] [PubMed]
Cordingley, J.J.; Vlasselaers, D.; Dormand, N.C.; Wouters, P.J.; Squire, S.D.; Chassin, L.J.; Wilinska, M.E.; Morgan, C.J.; Hovorka, R.; Berghe, G.V.D. Intensive insulin therapy: Enhanced Model Predictive Control algorithm versus standard care. Intensiv. Care Med. 2009, 35, 123–128. [Google Scholar] [CrossRef]
Pappada, S.M.; Owais, M.H.; Cameron, B.D.; Jaume, J.C.; Mavarez-Martinez, A.; Tripathi, R.S.; Papadimos, T.J. An Artificial Neural Network-based Predictive Model to Support Optimization of Inpatient Glycemic Control. Diabetes Technol. Ther. 2020, 22, 383–394. [Google Scholar] [CrossRef]
Pappada, S.M.; Borst, M.J.; Cameron, B.D.; Bourey, R.E.; Lather, J.D.; Shipp, D.; Chiricolo, A.; Papadimos, T.J. Development of a neural network model for predicting glucose levels in a surgical critical care setting. Patient Saf. Surg. 2010, 4, 15. [Google Scholar] [CrossRef]
Kim, S.-H.; Kim, D.-Y.; Chun, S.-W.; Kim, J.; Woo, J. Impartial feature selection using multi-agent reinforcement learning for adverse glycemic event prediction. Comput. Biol. Med. 2024, 173, 108257. [Google Scholar] [CrossRef]
Mantena, S.; Arévalo, A.R.; Maley, J.H.; Vieira, S.M.d.S.; Mateo-Collado, R.; Sousa, J.M.d.C.; Celi, L.A. Predicting hypoglycemia in critically Ill patients using machine learning and electronic health records. J. Clin. Monit. Comput. 2022, 36, 1297–1303. [Google Scholar] [CrossRef] [PubMed]
Szabó, B.; Szlávecz, Á.; Paláncz, B.; Alkhafaf, O.S.; Alsultani, A.B.; Kovács, K.; Chase, J.G.; Benyó, B.I. Comparison of three artificial intelligence methods for predicting 90% quantile interval of future insulin sensitivity of intensive care patients. IFAC J. Syst. Control 2024, 30, 100284. [Google Scholar] [CrossRef]
Naskinova, I.; Kolev, M.; Karova, D.; Milev, M. Hybrid Stochastic–Machine Learning Framework for Postprandial Glucose Prediction in Type 1 Diabetes. Algorithms 2025, 18, 623. [Google Scholar] [CrossRef]
Koca, Ö.A.; Kılıç, V. Trend-weighted multi-resolution transformer for multi-parametric glucose prediction and control. Biomed. Signal Process. Control 2026, 113, 108885. [Google Scholar] [CrossRef]
Faulds, E.R. Assessing the Impact of AI in Inpatient Diabetes Management. JAMA Netw. Open 2025, 8, e258924. [Google Scholar] [CrossRef] [PubMed]
Salinas, P.D.; Mendez, C.E. Glucose Management Technologies for the Critically Ill. J. Diabetes Sci. Technol. 2019, 13, 682–690. [Google Scholar] [CrossRef] [PubMed]
Olinghouse, C. Development of a computerized intravenous insulin application (Auto Cal) at Kaiser Permanente Northwest, integrated into Kaiser Permanente HealthConnect: Impact on safety and nursing workload. Perm. J. 2012, 16, 67–70. [Google Scholar] [CrossRef] [PubMed]
Mann, E.A.; Jones, J.A.; Wolf, S.E.; Wade, C.E. Computer decision support software safely improves glycemic control in the burn intensive care unit: A randomized controlled clinical study. J. Burn. Care Res. 2011, 32, 246–255. [Google Scholar] [CrossRef] [PubMed]
Newton, C.A.; Smiley, D.; Bode, B.W.; Kitabchi, A.E.; Davidson, P.C.; Jacobs, S.; Steed, R.D.; Stentz, F.; Peng, L.; Mulligan, P.; et al. A comparison study of continuous insulin infusion protocols in the medical intensive care unit: Computer-guided vs. standard column-based algorithms. J. Hosp. Med. 2010, 5, 432–437. [Google Scholar] [CrossRef]
NICE-SUGAR Study Investigators; Finfer, S.; Chittock, D.R.; Su, S.Y.; Blair, D.; Foster, D.; Dhingra, V.; Bellomo, R.; Cook, D.; Dodek, P. Intensive versus conventional glucose control in critically ill patients. N. Engl. J. Med. 2009, 360, 1283–1297. [Google Scholar] [CrossRef]

Figure 1. PRISMA Flow Diagram.

Figure 2. Temporal distribution of computational models used for inpatient glycemic management across included studies (2006–2025). Studies are positioned on a timeline according to publication year, with insulin dosing and closed-loop control models displayed above the central axis and glucose prediction or hypoglycemia-detection models displayed below. Dot color denotes model category (MPC, neural networks/deep learning, tree-based machine learning, reinforcement learning, or other AI). Bubble size reflects the relative volume of studies within each category during the corresponding time period. The figure highlights the early concentration of MPC-based insulin dosing trials and the more recent proliferation of data-driven predictive models.

Table 2. Practical Overview of AI Methods for Inpatient Glucose Management.

AI/Algorithm Type	What It Does (Clinically)	Best Suited for	Why It Helps (Strengths)	Watch-Outs (Limitations)	Representative Studies
Model Predictive Control (MPC/eMPC)	Uses a mathematical model of glucose–insulin physiology and element of intelligence by forecasting future glucose trends and optimizing insulin delivery in real time.	IV insulin dosing (ICU)	Stable, safe glucose control in prospective trials; reduces manual calculations; smooth dosing adjustments	Requires frequent glucose input; does not “learn” from new data (not true AI); still limited by intermittent POC testing	Plank, 2006 [31]; Pachler, 2008 [32]; Amrein, 2010 [33]; Kopecký, 2013 [34]
Feed-forward Artificial Neural Networks (ANN)	Mimics the brain’s neuron layers to find patterns between inputs (vitals, labs, insulin, nutrition) and outputs (future glucose). Predicts short-term glucose levels from complex clinical data.	Short-horizon glucose prediction	Captures nonlinear, multivariable relationships; accurate for near-term glucose prediction	Opaque (“black box”) reasoning; large, diverse data required; struggles with rare hypo events	Pappada, 2020 [46]; Benyó, 2018 [35]
Recurrent Neural Networks (RNN/LSTM)	Processes sequential time-series data (like a continuous glucose record) and “remembers” prior values to predict the next glucose or classify state (hypo/normo/hyper).	Temporal glucose forecasting; pattern recognition	Excellent at using trends; strong short-term predictive accuracy	Can degrade with missing or irregular data; requires dense CGM-like inputs	Kim, 2020 [38]
Gradient Boosting Trees (e.g., XGBoost, CatBoost)	Uses hundreds of small decision trees trained on tabular EHR data (labs, meds, demographics) to predict risk of hypo/hyperglycemia or classify insulin requirements. Each tree learns from prior errors to improve accuracy.	Hypoglycemia risk prediction; insulin requirement classification	High accuracy (AUROC often >0.85); robust to missing EHR data; interpretable importance scores	Model calibration and thresholds required for safe use; may not generalize across hospitals	Ruan, 2020 [36]; Mantena, 2022 [49]; Fitzgerald, 2021 [25], 2023 [28]
Random Forest/Logistic Regression Ensembles	Combines multiple trees or regression models to produce stable probability estimates for glycemic events or insulin dose needs.	Dysglycemia risk stratification	Transparent coefficients and feature ranking; interpretable for clinical teams	May miss subtle nonlinearities; relies on well-labeled EHR data	Nguyen 2021 [27]; Wright, 2024 [29]
Reinforcement Learning (RL/DQN)	Learns optimal insulin dosing “policies” through trial and error in a simulated environment. Observes the effect of dose decisions on glucose outcomes and adjusts its strategy over time to maximize time in range and minimize hypoglycemia.	Policy discovery for insulin dosing (simulation/ICU)	Can personalize insulin delivery dynamically; learns long-term optimal strategies	Mostly validated in silico; requires safety constraints and real-world testing	Kim, 2024 [48]; Symeonidis, 2025 [41]
Transformers (e.g., MITST)	Uses attention mechanisms to look across all prior glucose and treatment data to weigh which past events matter most for predicting the next glucose level. Think of it as “smart focus” over time.	ICU next-glucose-level classification	State-of-the-art sequence modeling; excels at long-range temporal dependencies	Data-hungry; difficult to interpret clinically; requires high-performance computation	Mehdizavareh, 2025 [43]
Cyclic Deep Latent Discovery (CDLD)	Learns hidden (“latent”) factors that represent both patient physiology and provider decision behavior, then uses those traits to predict outcomes like discharge glucose or variability.	Retrospective glucose prediction; provider–patient interaction modeling	Incorporates provider patterns (e.g., dosing aggressiveness) alongside patient factors; strong predictive accuracy (low RMSE)	No direct decision support yet; limited to retrospective EHR data	Park, 2025 [40]
Quantile Regression/Mixture Density Networks (MDN)	Predicts ranges or probability intervals (e.g., 90% confidence for future insulin sensitivity) rather than a single value. Helps set safer dose bounds by quantifying uncertainty.	Insulin sensitivity forecasting; safety bounds	Communicates prediction confidence; supports conservative dosing	Typically, in silico; requires validation with real patient data	Szabó, 2024 [50]; Alkhafaf, 2024 [42]

Footnote: AI = artificial intelligence; ANN = artificial neural network; AUROC = area under the receiver operating characteristic curve; CDLD = cyclic deep latent discovery; CGM = continuous glucose monitoring; DQN = deep Q-network; EHR = electronic health record; ICU = intensive care unit; IV = intravenous; LSTM = long short-term memory network; MDN = mixture density network; MITST = multi-source irregular time-series transformer; MPC = model predictive control; RL = reinforcement learning; RNN = recurrent neural network; RMSE = root mean square error.

Table 3. Summary of Methodological Quality Considerations Across Included Studies.

Dimension	Common Strengths	Common Limitations	Overall Appraisal
Study Design	Inclusion of early RCTs for MPC algorithms; detailed reporting of model development steps	Majority of studies are retrospective, observational, or in silico only; limited prospective clinical validation	Evidence base is dominated by developmental or exploratory studies rather than rigorous clinical trials
Data Sources	Use of diverse datasets including EHR, CGM, and large ICU databases (e.g., MIMIC)	Many studies rely on single-center data; variable glucose measurement density; input features differ widely	Limited generalizability and difficulty comparing across datasets
Validation Approaches	Internal validation commonly performed; some studies include external or multicenter validation	External, multicenter, and real-time validation remain rare; calibration and error analysis often underreported	Predictive performance promising but lacks robust clinical readiness
Outcome Reporting	Many studies provide detailed model performance metrics (AUROC, RMSE, etc.)	Clinical outcomes (e.g., TIR, hypoglycemia, LOS) inconsistently reported; outcome definitions vary	Hard to evaluate clinical meaningfulness or compare across studies
Implementation Considerations	Some MPC and AI-based CDSS studies assess feasibility or workflow integration	Most ML/RL studies do not evaluate usability, clinician acceptance, or workflow burden	Evidence insufficient to judge real-world adoption potential

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Faulds, E.R.; Rayan, M.N.; Mlachak, M.; Dungan, K.M.; Allen, T.; Patterson, E. Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review. Diabetology 2026, 7, 19. https://doi.org/10.3390/diabetology7010019

AMA Style

Faulds ER, Rayan MN, Mlachak M, Dungan KM, Allen T, Patterson E. Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review. Diabetology. 2026; 7(1):19. https://doi.org/10.3390/diabetology7010019

Chicago/Turabian Style

Faulds, Eileen R., Melanie Natasha Rayan, Matthew Mlachak, Kathleen M. Dungan, Ted Allen, and Emily Patterson. 2026. "Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review" Diabetology 7, no. 1: 19. https://doi.org/10.3390/diabetology7010019

APA Style

Faulds, E. R., Rayan, M. N., Mlachak, M., Dungan, K. M., Allen, T., & Patterson, E. (2026). Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review. Diabetology, 7(1), 19. https://doi.org/10.3390/diabetology7010019

Article Menu

Artificial Intelligence Algorithms for Insulin Management and Hypoglycemia Prevention in Hospitalized Patients—A Scoping Review

Abstract

1. Introduction

2. Methods

2.1. Search Strategy

2.2. Inclusion and Exclusion Criteria

2.3. Data Extraction

3. Results

3.1. Study Characteristics

3.2. Study Populations and Inclusion Criteria

3.3. Models

3.4. Data Inputs and Sources

3.5. Glucose Outcomes

3.6. Implementation and Feasibility Outcomes

Methodological Quality

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI