Next Article in Journal
Comparison of Strengths of Mandibular Angle Fractures Following Different Plate Designs: A Human Cadaver Study
Previous Article in Journal
Braimah-Taiwo et al. New Classification System and Treatment Algorithm of Mandibulo-Maxillary Synostosis Related to Noma. Field Experience From Noma Children Hospital Sokoto, Nigeria
 
 
Craniomaxillofacial Trauma & Reconstruction is published by MDPI from Volume 18 Issue 1 (2025). Previous articles were published by another publisher in Open Access under a CC-BY (or CC-BY-NC-ND) licence, and they are hosted by MDPI on mdpi.com as a courtesy and upon agreement with Sage.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Utility of an Artificial Intelligence Language Model for Post-Operative Patient Instructions Following Facial Trauma

by
Suresh Mohan
1,2,*,
Spenser Souza
2,
Shayan Fakurnejad
2 and
P. Daniel Knott
2
1
Division of Otolaryngology, Department of Surgery, Yale School of Medicine, 47 College St, Ste 216, New Haven, CT 06510, USA
2
Division of Facial Plastic & Reconstructive Surgery, Department of Otolaryngology, University of California-San Francisco, San Francisco, CA, USA
*
Author to whom correspondence should be addressed.
Craniomaxillofac. Trauma Reconstr. 2024, 17(4), 291-294; https://doi.org/10.1177/19433875231222803
Submission received: 1 November 2022 / Revised: 1 December 2022 / Accepted: 1 January 2023 / Published: 16 December 2023

Abstract

:
Study Design: Qualitative Descriptive Study. Objective: To evaluate the utility of post-operative instructions (POIs) for facial trauma provided by the language model ChatGPT as compared to those from electronic medical record (EMR) templates and Google search engine. Methods: POIs for four common facial trauma procedures (mandibular fracture, maxillary fracture, nasal fracture, and facial laceration) were generated by ChatGPT, extracted from EMR templates, or obtained from Google search engine. The POIs were evaluated by a panel of surgeons using the Patient Education Materials Assessment Tool (PEMAT-P). Results: Average PEMAT-P understandability scores across all 3 sources ranged from 55% to 94%. Actionability scores ranged from 33% to 94%, and procedure-specific scores ranged from 50% to 92%. ChatGPT-generated POIs ranged from 82% to 88% for understandability, 60% to 80% for actionability, and 50% to 75% for procedure-specific items. Conclusions: ChatGPT has great potential to be a useful adjunct in providing post-operative instructions for patients undergoing facial trauma procedures. Detailed and patient-specific prompts are necessary to output tailored instructions that are accurate, understandable, and actionable.

Introduction

Post-operative instructions (POIs) are critical to patient understanding, self-care, and recovery following surgery for facial trauma. Accurate and comprehensive instructions can help ensure patient safety, reduce phone call burden, reduce complication rates, and help optimize final outcomes. However, post-operative instructions for facial trauma patients are often generic, inaccurate, or insufficiently detailed. Templated electronic medical record (EMR) POIs or institutional handouts have limited actionability as detailed, patient-specific information is usually neglected.
A challenge with medical informational materials is delivering an appropriate reading level with sufficient detail without overwhelming jargon. Lack of consideration of health literacy may be contributory; prior work has shown low health literacy to be associated with worsened post-operative recovery and a lower health-related quality of life.[1] Despite a plethora of well-developed, appropriate instructions and literature available for patients, post-operative instructions are commonly under-vetted for appropriate reading level and content. Because surgeons want to provide patient-specific detail, they are often left to compose the instructions themselves, typically without critique or peer review for quality.
ChatGPT (chat generative pre-trained transformer) (OpenAI, San Francisco) is a machine learning–based large language model that can interpret and respond to conversational prompts and instructions.[2] An explosion of work has looked at its utility in medical documentation, including post-operative instructions.[3] ChatGPT has the potential to quickly and easily generate patient-specific, detailed, and germane POIs. Herein, we conducted a pilot study to evaluate the utility of post-operative instructions for facial trauma repair provided by the language model ChatGPT as compared to Google search engine and electronic medical record generated instructions.

Methods

This was an IRB-exempt study as no patient information was involved. The study was performed at the Zuckerberg San Francisco General Hospital. Design and methodology was adapted from prior work.[3] POIs for four common facial trauma procedures (mandibular fracture, maxillary fracture, nasal fracture, and facial laceration) were generated by ChatGPT, extracted from EMR templates, or obtained from the Google search engine. The prompts for ChatGPT were as follows: “Provide postoperative instructions at an 8th grade reading level for a patient who just underwent surgical repair of (insert facial trauma).” The reading level was chosen in line with National Institutes of Health guidelines for patient information.[4] The EMR templates were extracted by name of the procedure from the built-in database of post-operative instruction sets. Google search prompts were “Postoperative instructions after surgical repair of (insert facial trauma).” First non-sponsored Google results were selected for review.
The POIs were uniformly formatted with visuals removed to minimize confounding bias, placed in a random order, and evaluated by a panel of surgeons with a focus in facial plastic surgery. The primary outcome was the Patient Education Materials Assessment Tool (PEMAT-P) score, a rigorously validated instrument to evaluate and compare how easy print patient education materials are to comprehend and act upon.[5] The PEMAT-P has two subscores: an understandability score with a maximum of 19 pts and actionability with a maximum of 7 pts. Understandability and actionability scores from the PEMAT-P were calculated for each instruction set. Additionally, to check specificity and thoroughness of each POI, 4 salient points for each procedure (Table 1) were determined a priori and used to generate a procedure-specific item score as a secondary outcome.
One-way analysis of variance and Kruskal–Wallis tests were used to compare mean scores between POIs. Significance was set to P < .05. Analysis was performed using Excel version 16.75.2 (Microsoft).

Results

Three sets of instructions for each of the four procedures resulted in a total of twelve POIs (n = 12). When comparing across all modalities, understandability scores ranged 54.6%–93.9%, actionability scores ranged 33.3%–93.3%, and procedure-specific scores ranged 50%–91.7% (Table 1). ChatGPT understandability scores ranged 82%–88%, actionability scores ranged 60%–80%, and procedure-specific items ranged 50%–75%.
Actionability scores were highest in ChatGPT (70%), intermediate in EMR template (68.3%), and lowest in Google search (56.7%) (P = .04) (Table 2). Electronic medical record templates trended toward the highest understandability (87.1%) compared with ChatGPT (83.3%) and Google search (66.7%) though this did not reach significance (P = .21). Procedure-specific item scores were highest in Google search (83.3%) above EMR template (66.7%) and ChatGPT (64.6%) (P = .02).

Discussion

Facial trauma patients, especially vulnerable subsets such as the undomiciled, urban poor, and elderly, tend to havea higher rate of complications associated with inadequate socioeconomic support, complex medical needs, and limited post-operative rates of compliance.[6] A high rate of leaving against medical advice in the trauma population can further complicate sound post-operative care and follow-up.[7] Therefore, informative and individualized post-operative care instructions are particularly critical in the facial trauma population to optimize outcomes.
There is a paucity of literature surrounding effective provision of high-quality post-operative instructions for facial trauma patients. This pilot aimed to understand the potential benefits in composition of post-operative instructions for common facial trauma procedures using a large language model as compared to EMR templates or Google search–based POIs. Prior work attempted the same in pediatric otolaryngology procedures and found that despite scoring modestly on PEMAT-P, ChatGPT held promise for patients and clinicians given its high customizability.[3] In contrast, in the present study, ChatGPT demonstrated excellent actionability and high understandability PEMAT-P scores, but the ChatGPT-generated POIs were found lacking in procedure-specific details. As ChatGPT prompts can be easily altered to include certain details, this presents an easily circumventable issue. Further, the ability to tailor reading level allows for greater flexibility in providing patient-specific POIs. At the very least, ChatGPT offers a ready opportunity to improve the quality, readability, and actionability of institutional, search engine, or EMR templates.
ChatGPT is a powerful tool yet has notable limitations including the ability to generate realistic-sounding but false answers, lack of up-to-date knowledge base, lack of sourcing of its answers, and inevitable ethical implications. This pilot study was limited by small sample size, single-institution setting, and a small panel of reviewers. Future work will broaden the spectrum of facial trauma procedures, provide greater detail to ChatGPT prompts, and use an updated version of the large language model.

Conclusions

ChatGPT has great potential to be a useful adjunct in generating post-operative instructions for patients undergoing facial trauma procedures. Detailed and patient-specific prompts are necessary to output tailored instructions that are accurate, understandable, and actionable.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Conflicts of Interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Halleberg Nyman, M.; Nilsson, U.; Dahlberg, K.; Jaensson, M. Association between functional health literacy and postoperative recovery, health care contacts, and health-related quality of life among patients undergoing day surgery: secondary analysis of a randomized clinical trial. JAMA Surg. 2018, 153, 738–745. [Google Scholar] [PubMed]
  2. Tom Brown, B.M.; Ryder, N.; Subbiah, M.; et al. Language models are few-shot learners. In Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
  3. Ayoub, N.F.; Lee, Y.J.; Grimm, D.; Balakrishnan, K. Comparison between ChatGPT and google search as sources of postoperative patient instructions. JAMA Otolaryngol Head Neck Surg. 2023, 149, 556–558. [Google Scholar] [PubMed]
  4. National Institutes of Health, T. MedlinePlus: How to Write Easy to Read Health Materials. 2007. https://www.nlm.nih.gov/medlineplus/etr.html.
  5. Shoemaker, S.J.; Wolf, M.S.; Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Counsel. 2014, 96, 395–403. [Google Scholar] [CrossRef] [PubMed]
  6. Schaffer, K.B.; Wang, J.; Nasrallah, F.S.; et al. Disparities in triage and management of the homeless and the elderly trauma patient. Inj Epidemiol. 2020, 7, 39. [Google Scholar] [CrossRef] [PubMed]
  7. Adeyemi, O.J.; Veri, S. Characteristics of trauma patients that leave against medical advice: an eight-year survey analysis using the National Hospital Ambulatory Medical Care Survey, 2009-2016. J Clin Orthop Trauma. 2021, 17, 18–24. [Google Scholar] [PubMed]
Table 1. Understandability, Actionability, and Procedure-Specific Scores for Each Procedure.
Table 1. Understandability, Actionability, and Procedure-Specific Scores for Each Procedure.
Cmtr 17 00081 i001
Table 2. Comparison of ChatGPT, Google Search, and Institution Instructions.
Table 2. Comparison of ChatGPT, Google Search, and Institution Instructions.
Cmtr 17 00081 i002

Share and Cite

MDPI and ACS Style

Mohan, S.; Souza, S.; Fakurnejad, S.; Knott, P.D. Utility of an Artificial Intelligence Language Model for Post-Operative Patient Instructions Following Facial Trauma. Craniomaxillofac. Trauma Reconstr. 2024, 17, 291-294. https://doi.org/10.1177/19433875231222803

AMA Style

Mohan S, Souza S, Fakurnejad S, Knott PD. Utility of an Artificial Intelligence Language Model for Post-Operative Patient Instructions Following Facial Trauma. Craniomaxillofacial Trauma & Reconstruction. 2024; 17(4):291-294. https://doi.org/10.1177/19433875231222803

Chicago/Turabian Style

Mohan, Suresh, Spenser Souza, Shayan Fakurnejad, and P. Daniel Knott. 2024. "Utility of an Artificial Intelligence Language Model for Post-Operative Patient Instructions Following Facial Trauma" Craniomaxillofacial Trauma & Reconstruction 17, no. 4: 291-294. https://doi.org/10.1177/19433875231222803

APA Style

Mohan, S., Souza, S., Fakurnejad, S., & Knott, P. D. (2024). Utility of an Artificial Intelligence Language Model for Post-Operative Patient Instructions Following Facial Trauma. Craniomaxillofacial Trauma & Reconstruction, 17(4), 291-294. https://doi.org/10.1177/19433875231222803

Article Metrics

Back to TopTop