Human-in-the-Loop Performance of LLM-Assisted Arterial Blood Gas Interpretation: A Single-Center Retrospective Study
Abstract
1. Introduction
- Determining acidosis or alkalosis via pH measurement.
- Identifying the primary disorder (metabolic or respiratory) based on changes in bicarbonate (HCO3−) and arterial partial pressure of carbon dioxide (PaCO2), respectively.
- Assessing compensatory mechanisms, which typically involve respiratory response to metabolic disorders and vice versa.
- In metabolic acidosis, calculating the plasma anion gap to refine etiological assessment.
2. Materials and Methods
2.1. Study Design and Patient Population
2.2. Conventional Evaluation
2.3. LLM-Assisted Evaluation
“Identify whether or not there is an acid-base disorder based on the following results. Reference values are shown in parentheses. pH = XX (7.35–7.45), PaCO2 = XX mmHg (35–45 mmHg), HCO3− = XX mmol/L (22–26 mmol/L). The serum electrolyte results are: sodium = XX mmol/L, chloride = XX mmol/L. If an acid-base disorder is identified, determine the primary disorder.”
Prompt | LLM-I | LLM-S |
---|---|---|
P2 | Calculate the expected PaCO2 compensation in metabolic acidosis and determine whether an additional respiratory disorder is present. | Calculate the expected PaCO2 compensation in metabolic acidosis. |
P3 | Calculate and interpret the anion gap value. | Calculate the anion gap value. |
P4 | Calculate and interpret the delta/delta ratio using the changes in anion gap and HCO3− values. | Calculate the delta/delta ratio using the changes in anion gap and HCO3− values. |
P5 | Calculate the expected HCO3− compensation in acute/chronic respiratory acidosis and determine whether an additional metabolic disorder is present. | Calculate the expected HCO3− compensation in acute/chronic respiratory acidosis. |
P6 | Calculate the expected PaCO2 compensation in metabolic alkalosis and determine whether an additional respiratory disorder is present. | Calculate the expected PaCO2 compensation in metabolic alkalosis. |
P7 | Calculate the expected HCO3− compensation in acute/chronic respiratory alkalosis and determine whether an additional metabolic disorder is present. | Calculate the expected HCO3− compensation in acute/chronic respiratory alkalosis. |
2.3.1. LLM-Assisted Evaluation with Interpretation (LLM-I)
2.3.2. LLM-Assisted Evaluation with Supervision (LLM-S)
2.4. Statistical Analysis
- Agreement on primary disorder (APD): concordance between the LLM-assisted and the conventional method in identifying the primary acid–base disorder.
- Agreement on primary disorder with detection (APD-a): same as APD, but the LLM identified the primary disorder even if it appeared as a secondary disorder.
- Agreement on both primary and secondary disorders regardless of order (APSD): concordance between the two methods in identifying the same two disorders (primary and secondary), regardless of order.
- Agreement on the classification of metabolic acidosis (AMA): concordance between the two methods in classifying metabolic acidosis based on the AG.
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ABG | Arterial blood gas |
AG | Anion gap |
AMA | Agreement on the classification of metabolic acidosis |
APD | Agreement on primary disorder |
APD-a | Agreement on primary disorder with detection |
APSD | Agreement on primary and secondary disorders |
HCO3− | Bicarbonate |
ICU | Intensive care unit |
IQR | Interquartile range |
LLM-I | LLM-Assisted evaluation with interpretation |
LLM-S | LLM-Assisted evaluation with supervision |
LLMs | Large language models |
MAc | Metabolic acidosis |
MAk | Metabolic alkalosis |
NPV | Negative predictive value |
NoABD | No acid–base disorder |
PaCO2 | Arterial partial pressure of carbon dioxide |
pH | Potential of hydrogen |
PPV | Positive predictive value |
RAc | Respiratory acidosis |
RAk | Respiratory alkalosis |
SA | Sensitivity analysis |
Se. | Sensitivity |
Sp. | Specificity |
USD | United States dollars |
Δ/Δ | Delta/delta ratio |
References
- Alberts, I.L.; Mercolli, L.; Pyka, T.; Prenosil, G.; Shi, K.; Rominger, A.; Afshar-Oromieh, A. Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? Eur. J. Nucl. Med. Mol. Imaging 2023, 50, 1549–1552. [Google Scholar] [CrossRef] [PubMed]
- Yu, E.; Chu, X.; Zhang, W.; Meng, X.; Yang, Y.; Ji, X.; Wu, C. Large Language Models in Medicine: Applications, Challenges, and Future Directions. Int. J. Med. Sci. 2025, 22, 2792–2801. [Google Scholar] [CrossRef]
- Hager, P.; Jungmann, F.; Holland, R.; Bhagat, K.; Hubrecht, I.; Knauer, M.; Vielhauer, J.; Makowski, M.; Braren, R.; Kaissis, G.; et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 2024, 30, 2613–2622. [Google Scholar] [CrossRef]
- Yang, H.S.; Wang, F.; Greenblatt, M.B.; Huang, S.X.; Zhang, Y. AI Chatbots in Clinical Laboratory Medicine: Foundations and Trends. Clin. Chem. 2023, 69, 1238–1246. [Google Scholar] [CrossRef]
- Vrdoljak, J.; Boban, Z.; Vilović, M.; Kumrić, M.; Božić, J. A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration. Healthcare 2025, 13, 603. [Google Scholar] [CrossRef]
- Doumat, G.; Daher, D.; Ghanem, N.-N.; Khater, B. Knowledge and attitudes of medical students in Lebanon toward artificial intelligence: A national survey study. Front. Artif. Intell. 2022, 5, 1015418. [Google Scholar] [CrossRef] [PubMed]
- Ejaz, H.; McGrath, H.; Wong, B.L.; Guise, A.; Vercauteren, T.; Shapey, J. Artificial intelligence and medical education: A global mixed-methods study of medical students’ perspectives. Digit. Health 2022, 8, 205520762210890. [Google Scholar] [CrossRef] [PubMed]
- Berend, K.; de Vries, A.P.J.; Gans, R.O.B. Physiological Approach to Assessment of Acid–Base Disturbances. N. Engl. J. Med. 2014, 371, 1434–1445. [Google Scholar] [CrossRef]
- Berend, K. Diagnostic Use of Base Excess in Acid–Base Disorders. N. Engl. J. Med. 2018, 378, 1419–1428. [Google Scholar] [CrossRef]
- Adrogué, H.J.; Madias, N.E. Secondary Responses to Altered Acid-Base Status. J. Am. Soc. Nephrol. 2010, 21, 920–923. [Google Scholar] [CrossRef]
- Donner, A.; Rotondi, M.A. Sample Size Requirements for Interval Estimation of the Kappa Statistic for Interobserver Agreement Studies with a Binary Outcome and Multiple Raters. Int. J. Biostat. 2010, 6, 31. [Google Scholar] [CrossRef] [PubMed]
- McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Med. 2012, 22, 276–282. [Google Scholar] [CrossRef]
- Rodríguez-Villar, S.; Do Vale, B.M.; Fletcher, H.M. El algoritmo de la gasometría arterial: Propuesta de un enfoque sistemático para el análisis de los trastornos del equilibrio ácido-base. Rev. Esp. Anestesiol. Reanim. 2020, 67, 20–34. [Google Scholar] [CrossRef]
- Kaufman, D. Interpretation of Arterial Blood Gases (ABGs). Available online: https://www.thoracic.org/professionals/clinical-resources/critical-care/clinical-education/abgs.php (accessed on 14 July 2025).
- Fenves, A.Z.; Emmett, M. Approach to Patients with High Anion Gap Metabolic Acidosis: Core Curriculum 2021. Am. J. Kidney Dis. 2021, 78, 590–600. [Google Scholar] [CrossRef]
- Emmett, M.; Palmer, B. The Delta Anion Gap/Delta HCO3 Ratio in Patients with a High Anion Gap Metabolic Acidosis. Available online: https://www.uptodate.com/contents/the-delta-anion-gap-delta-hco3-ratio-in-patients-with-a-high-anion-gap-metabolic-acidosis (accessed on 16 July 2025).
- Delgado, R.; Tibau, X.-A. Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE 2019, 14, e0222916. [Google Scholar] [CrossRef]
- Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem. 2004, 28, 367–374. [Google Scholar] [CrossRef]
- Kautz, T.; Eskofier, B.M.; Pasluosta, C.F. Generic performance measure for multiclass-classifiers. Pattern Recognit. 2017, 68, 111–125. [Google Scholar] [CrossRef]
- Brinkman, J.E.; Sharma, S. Physiology, Metabolic Alkalosis. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2025. Available online: https://pubmed.ncbi.nlm.nih.gov/29493916/ (accessed on 16 July 2025).
- Lee, S.Y.; Koh, E.S.; Chung, S. #1772 Comparison of interpretation of acid-base disorder in patients with critical illness: Nephrologist versus ChatGPT. Nephrol. Dial. Transplant. 2024, 39, gfae069-1768-1772. [Google Scholar]
- OpenAI Model Release Notes. Available online: https://help.openai.com/en/articles/9624314-model-release-notes?utm_source=chatgpt.com#h_8e49be5daa (accessed on 15 July 2025).
- Google Gemini 2.5 Flash. Available online: https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash (accessed on 15 July 2025).
- Spataro, J. Bringing the Latest Capabilities to Copilot for Microsoft 365 Customers. Available online: https://www.microsoft.com/en-us/microsoft-365/blog/2024/04/02/bringing-the-latest-capabilities-to-copilot-for-microsoft-365-customers/ (accessed on 15 July 2025).
- Gün, M. AI-Assisted Blood Gas Interpretation: A Comparative Study with an Emergency Physician. Am. J. Emerg. Med. 2025, 94, 1–2. [Google Scholar] [CrossRef]
- Rastegar, A. Use of the ΔAG/ΔHCO3− Ratio in the Diagnosis of Mixed Acid-Base Disorders. J. Am. Soc. Nephrol. 2007, 18, 2429–2431. [Google Scholar] [CrossRef]
- Rudkin, S.E.; Grogan, T.R.; Treger, R.M. The Δ Anion Gap/Δ Bicarbonate Ratio in Early Lactic Acidosis: Time for Another Delta? Kidney360 2021, 2, 20–25. [Google Scholar] [CrossRef]
- Khasawneh, E.; Gosling, C.; Williams, B. What impact does maths anxiety have on university students? BMC Psychol. 2021, 9, 37. [Google Scholar] [CrossRef] [PubMed]
- Turan, E.İ.; Baydemir, A.E.; Balıtatlı, A.B.; Şahin, A.S. Assessing the accuracy of ChatGPT in interpreting blood gas analysis results ChatGPT-4 in blood gas analysis. J. Clin. Anesth. 2025, 102, 111787. [Google Scholar] [CrossRef] [PubMed]
- Honore, P.; Kishen, R.; Jacobs, R.; Joannes-Boyau, O.; De Waele, E.; De Regt, J.; Van Gorp, V.; Boer, W.; Spapen, H. Facing acid–base disorders in the third millennium—The Stewart approach revisited. Int. J. Nephrol. Renov. Dis. 2014, 7, 209–217. [Google Scholar] [CrossRef]
- Ozdemir, H.; Sasmaz, M.I.; Guven, R.; Avci, A. Interpretation of acid–base metabolism on arterial blood gas samples via machine learning algorithms. Ir. J. Med. Sci. 2025, 194, 277–287. [Google Scholar] [CrossRef]
- Urda-Cîmpean, A.E.; Leucuța, D.-C.; Drugan, C.; Duțu, A.-G.; Călinici, T.; Drugan, T. Assessing the Accuracy of Diagnostic Capabilities of Large Language Models. Diagnostics 2025, 15, 1657. [Google Scholar] [CrossRef]
- Geetha, S.D.; Khan, A.; Khan, A.; Kannadath, B.S.; Vitkovski, T. Evaluation of ChatGPT pathology knowledge using board-style questions. Am. J. Clin. Pathol. 2024, 161, 393–398. [Google Scholar] [CrossRef]
- Bélisle-Pipon, J.-C. Why we need to be careful with LLMs in medicine. Front. Med. 2024, 11, 1495582. [Google Scholar] [CrossRef]
- Yu, S.; Lee, S.-S.; Hwang, H. The ethics of using artificial intelligence in medical research. Kosin Med. J. 2024, 39, 229–237. [Google Scholar] [CrossRef]
- Tozsin, A.; Ucmak, H.; Soyturk, S.; Aydin, A.; Gozen, A.S.; Fahim, M.A.; Güven, S.; Ahmed, K. The Role of Artificial Intelligence in Medical Education: A Systematic Review. Surg. Innov. 2024, 31, 415–423. [Google Scholar] [CrossRef]
- Masevicius, F.D. Has Stewart approach improved our ability to diagnose acid-base disorders in critically ill patients? World J. Crit. Care Med. 2015, 4, 62. [Google Scholar] [CrossRef] [PubMed]
- Ray, M.; Kats, D.J.; Moorkens, J.; Rai, D.; Shaar, N.; Quinones, D.; Vermeulen, A.; Mateo, C.M.; Brewster, R.C.L.; Khan, A.; et al. Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework. JAMA Pediatr. 2025, 179, 1026–1033. [Google Scholar] [CrossRef] [PubMed]
- Delaunay, J.; Cusido, J. Evaluating the Performance of Large Language Models in Predicting Diagnostics for Spanish Clinical Cases in Cardiology. Appl. Sci. 2024, 15, 61. [Google Scholar] [CrossRef]
- Li, Z.; Shi, Y.; Liu, Z.; Yang, F.; Payani, A.; Liu, N.; Du, M. Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages. Proc. AAAI Conf. Artif. Intell. 2024, 39, 28186–28194. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Y.; Huang, W.; Mao, J.; Wang, R.; Hu, H. MELA: Multilingual Evaluation of Linguistic Acceptability. arXiv 2024, arXiv:2311.09033. [Google Scholar] [CrossRef]
NoABD | MAc | RAc | MAk | RAk | |
---|---|---|---|---|---|
Female | 14 (35%) | 13 (33%) | 18 (45%) | 21 (52.5%) | 16 (40%) |
Age, years | 50 (38–64) | 45 (36–56) | 53 (38–65) | 54 (42–69) | 47 (41–62) |
pH | 7.40 (7.38–7.42) | 7.30 (7.23–7.33) | 7.31 (7.26–7.34) | 7.49 (7.47–7.52) | 7.48 (7.46–7.51) |
PaCO2, mmHg | 38 (36–40) | 28 (21–31) | 50 (48–53) | 43 (40–46) | 25 (22–28) |
HCO3−, mmol/L | 24 (23–25) | 12 (10–14) | 25 (23–28) | 33 (30–36) | 20 (17–25) |
Anion gap, mmol/L | 8.9 (7.0–11.0) | 18.2 (14.5–21.2) | 9.3 (6.9–10.6) | 8.0 (5.9–9.4) | 10.1 (9.1–11.0) |
Isolated primary disorder frequency | NA | 18 (45%) | 26 (65%) | 8 (20%) | 34 (85%) |
APD (κ CI95%) | APD-a (κ CI95%) | APSD (κ CI95%) | AMA (κ CI95%) | ||
---|---|---|---|---|---|
LLM-I | ChatGPT | κ = 0.91 (0.84–0.98), Rk = 0.91 | κ = 0.96 (0.89–1.0), Rk = 0.96 | κ = 0.65 (0.60–0.70), Rk = 0.67 | κ = 0.55 (0.39–0.72), Rk = 0.58 |
Copilot | κ = 0.95 (0.88–1.0), Rk = 0.95 | κ = 0.98 (0.91–1.0), Rk = 0.98 | κ = 0.61 (0.56–0.66), Rk = 0.63 | κ = 0.48 (0.32–0.63), Rk = 0.52 | |
Gemini | κ = 0.88 (0.81–0.95), Rk = 0.89 | κ = 0.94 (0.87–1.0), Rk = 0.94 | κ = 0.62 (0.57–0.67), Rk = 0.63 | κ = 0.76 (0.58–0.95), Rk = 0.77 | |
LLM-S | ChatGPT | κ = 0.89 (0.83–0.96), Rk = 0.89 | κ = 0.92 (0.85–0.99), Rk = 0.92 | κ = 0.91 (0.85–0.96), Rk = 0.91 | κ = 0.85 (0.66–1.0), Rk = 0.86 |
Copilot | κ = 0.92 (0.85–0.99), Rk = 0.92 | κ = 0.94 (0.88–1.0), Rk = 0.95 | κ = 0.81 (0.75–0.86), Rk = 0.81 | κ = 0.86 (0.67–1.0), Rk = 0.86 | |
Gemini | κ = 0.92 (0.85–0.99), Rk = 0.92 | κ = 0.97 (0.90–1.0), Rk = 0.97 | κ = 0.81 (0.76–0.86), Rk = 0.81 | κ = 0.94 (0.75–1.0), Rk = 0.94 |
pH ≤ 7.30 or pH ≥ 7.50 | Severe Secondary Disorder | ||||
---|---|---|---|---|---|
APD (κ CI95%) | APSD (κ CI95%) | APD (κ CI95%) | APSD (κ CI95%) | ||
LLM-I | ChatGPT | κ = 0.94 (0.80–1.0), Rk = 0.94 | κ = 0.71 (0.61–0.82), Rk = 0.72 | κ = 0.95 (0.70–1.0), Rk = 0.95 | κ = 0.91 (0.71–1.0), Rk = 0.92 |
Copilot | κ = 0.96 (0.82–1.0), Rk = 0.96 | κ = 0.60 (0.50–0.70), Rk = 0.61 | κ = 0.95 (0.70–1.0), Rk = 0.95 | κ = 0.68 (0.52–0.84), Rk = 0.73 | |
Gemini | κ = 0.90 (0.76–1.0), Rk = 0.91 | κ = 0.70 (0.60–0.80), Rk = 0.71 | κ = 0.74 (0.50–0.98), Rk = 0.74 | κ = 0.78 (0.60–0.97), Rk = 0.80 | |
LLM-S | ChatGPT | κ = 0.96 (0.82–1.0), Rk = 0.96 | κ = 1.0 (0.89–1.0), Rk = 1.0 | κ = 0.89 (0.64–1.0), Rk = 0.90 | κ = 1.0 (0.78–1.0), Rk = 1.0 |
Copilot | κ = 0.96 (0.82–1.0), Rk = 0.96 | κ = 0.84 (0.74–0.94), Rk = 0.85 | κ = 0.89 (0.64–1.0), Rk = 0.90 | κ = 0.74 (0.55–0.93), Rk = 0.75 | |
Gemini | κ = 0.96 (0.82–1.0), Rk = 0.96 | κ = 0.91 (0.81–1.0), Rk = 0.91 | κ = 0.90 (0.66–1.0), Rk = 0.90 | κ = 0.75 (0.57–0.93), Rk = 0.78 |
Se. | Sp. | PPV | NPV | Accuracy (CI95%) | |||
---|---|---|---|---|---|---|---|
LLM-I | ChatGPT | APD | 0.93 | 0.98 | 0.94 | 0.98 | 0.93 (0.89–0.96) |
APD-a | 0.97 | 0.99 | 0.97 | 0.99 | 0.97 (0.93–0.99) | ||
APSD | 0.74 | 0.96 | 0.72 | 0.96 | 0.69 (0.92–0.75) | ||
AMA | 0.76 | 0.90 | 0.61 | 0.89 | 0.69 (0.55–0.81) | ||
Copilot | APD | 0.96 | 0.99 | 0.97 | 0.98 | 0.96 (0.92–0.98) | |
APD-a | 0.99 | 1.0 | 0.99 | 1.0 | 0.98 (0.96–1.0) | ||
APSD | 0.70 | 0.96 | 0.66 | 0.96 | 0.66 (0.58–0.72) | ||
AMA | 0.52 | 0.88 | 0.59 | 0.87 | 0.63 (0.48–0.76) | ||
Gemini | APD | 0.91 | 0.98 | 0.92 | 0.98 | 0.91 (0.86–0.94) | |
APD-a | 0.95 | 0.99 | 0.96 | 0.99 | 0.95 (0.91–0.98) | ||
APSD | 0.70 | 0.96 | 0.67 | 0.96 | 0.62 (0.59–0.73) | ||
AMA | 0.86 | 0.95 | 0.70 | 0.94 | 0.85 (0.72–0.93) | ||
LLM-S | ChatGPT | APD | 0.92 | 0.98 | 0.93 | 0.98 | 0.92 (0.87–0.95) |
APD-a | 0.94 | 0.98 | 0.94 | 0.98 | 0.94 (0.89–0.96) | ||
APSD | 0.95 | 0.99 | 0.92 | 0.99 | 0.92 (0.87–0.95) | ||
AMA | 0.95 | 0.97 | 0.94 | 0.96 | 0.91 (0.79–0.97) | ||
Copilot | APD | 0.94 | 0.98 | 0.95 | 0.98 | 0.94 (0.89–0.96) | |
APD-a | 0.96 | 0.99 | 0.96 | 0.99 | 0.96 (0.92–0.98) | ||
APSD | 0.85 | 0.98 | 0.79 | 0.98 | 0.83 (0.77–0.88) | ||
AMA | 0.95 | 0.97 | 0.77 | 0.96 | 0.91 (0.79–0.97) | ||
Gemini | APD | 0.94 | 0.98 | 0.95 | 0.98 | 0.94 (0.89–0.96) | |
APD-a | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 (0.94–0.99) | ||
APSD | 0.84 | 0.98 | 0.79 | 0.98 | 0.84 (0.78–0.88) | ||
AMA | 0.94 | 0.99 | 0.83 | 0.99 | 0.96 (0.87–0.99) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ayala-De la Cruz, S.; Arenas-Hernández, P.E.; Fernández-Herrera, M.F.; Quiñones-Díaz, R.A.; Llaca-Díaz, J.M.; Díaz-Chuc, E.A.; Robles-Espino, D.G.; San Miguel-Garay, E.A. Human-in-the-Loop Performance of LLM-Assisted Arterial Blood Gas Interpretation: A Single-Center Retrospective Study. J. Clin. Med. 2025, 14, 6676. https://doi.org/10.3390/jcm14186676
Ayala-De la Cruz S, Arenas-Hernández PE, Fernández-Herrera MF, Quiñones-Díaz RA, Llaca-Díaz JM, Díaz-Chuc EA, Robles-Espino DG, San Miguel-Garay EA. Human-in-the-Loop Performance of LLM-Assisted Arterial Blood Gas Interpretation: A Single-Center Retrospective Study. Journal of Clinical Medicine. 2025; 14(18):6676. https://doi.org/10.3390/jcm14186676
Chicago/Turabian StyleAyala-De la Cruz, Sergio, Paola Elizabeth Arenas-Hernández, María Fernanda Fernández-Herrera, Rebeca Alejandrina Quiñones-Díaz, Jorge Martín Llaca-Díaz, Erik Alejandro Díaz-Chuc, Diana Guadalupe Robles-Espino, and Erik Alejandro San Miguel-Garay. 2025. "Human-in-the-Loop Performance of LLM-Assisted Arterial Blood Gas Interpretation: A Single-Center Retrospective Study" Journal of Clinical Medicine 14, no. 18: 6676. https://doi.org/10.3390/jcm14186676
APA StyleAyala-De la Cruz, S., Arenas-Hernández, P. E., Fernández-Herrera, M. F., Quiñones-Díaz, R. A., Llaca-Díaz, J. M., Díaz-Chuc, E. A., Robles-Espino, D. G., & San Miguel-Garay, E. A. (2025). Human-in-the-Loop Performance of LLM-Assisted Arterial Blood Gas Interpretation: A Single-Center Retrospective Study. Journal of Clinical Medicine, 14(18), 6676. https://doi.org/10.3390/jcm14186676