Introduction
Post-operative instructions (POIs) are critical to patient understanding, self-care, and recovery following surgery for facial trauma. Accurate and comprehensive instructions can help ensure patient safety, reduce phone call burden, reduce complication rates, and help optimize final outcomes. However, post-operative instructions for facial trauma patients are often generic, inaccurate, or insufficiently detailed. Templated electronic medical record (EMR) POIs or institutional handouts have limited actionability as detailed, patient-specific information is usually neglected.
A challenge with medical informational materials is delivering an appropriate reading level with sufficient detail without overwhelming jargon. Lack of consideration of health literacy may be contributory; prior work has shown low health literacy to be associated with worsened post-operative recovery and a lower health-related quality of life.[
1] Despite a plethora of well-developed, appropriate instructions and literature available for patients, post-operative instructions are commonly under-vetted for appropriate reading level and content. Because surgeons want to provide patient-specific detail, they are often left to compose the instructions themselves, typically without critique or peer review for quality.
ChatGPT (chat generative pre-trained transformer) (OpenAI, San Francisco) is a machine learning–based large language model that can interpret and respond to conversational prompts and instructions.[
2] An explosion of work has looked at its utility in medical documentation, including post-operative instructions.[
3] ChatGPT has the potential to quickly and easily generate patient-specific, detailed, and germane POIs. Herein, we conducted a pilot study to evaluate the utility of post-operative instructions for facial trauma repair provided by the language model ChatGPT as compared to Google search engine and electronic medical record generated instructions.
Methods
This was an IRB-exempt study as no patient information was involved. The study was performed at the Zuckerberg San Francisco General Hospital. Design and methodology was adapted from prior work.[
3] POIs for four common facial trauma procedures (mandibular fracture, maxillary fracture, nasal fracture, and facial laceration) were generated by ChatGPT, extracted from EMR templates, or obtained from the Google search engine. The prompts for ChatGPT were as follows: “Provide postoperative instructions at an 8th grade reading level for a patient who just underwent surgical repair of (insert facial trauma).” The reading level was chosen in line with National Institutes of Health guidelines for patient information.[
4] The EMR templates were extracted by name of the procedure from the built-in database of post-operative instruction sets. Google search prompts were “Postoperative instructions after surgical repair of (insert facial trauma).” First non-sponsored Google results were selected for review.
The POIs were uniformly formatted with visuals removed to minimize confounding bias, placed in a random order, and evaluated by a panel of surgeons with a focus in facial plastic surgery. The primary outcome was the Patient Education Materials Assessment Tool (PEMAT-P) score, a rigorously validated instrument to evaluate and compare how easy print patient education materials are to comprehend and act upon.[
5] The PEMAT-P has two subscores: an understandability score with a maximum of 19 pts and actionability with a maximum of 7 pts. Understandability and actionability scores from the PEMAT-P were calculated for each instruction set. Additionally, to check specificity and thoroughness of each POI, 4 salient points for each procedure (
Table 1) were determined a priori and used to generate a procedure-specific item score as a secondary outcome.
One-way analysis of variance and Kruskal–Wallis tests were used to compare mean scores between POIs. Significance was set to P < .05. Analysis was performed using Excel version 16.75.2 (Microsoft).
Results
Three sets of instructions for each of the four procedures resulted in a total of twelve POIs (n = 12). When comparing across all modalities, understandability scores ranged 54.6%–93.9%, actionability scores ranged 33.3%–93.3%, and procedure-specific scores ranged 50%–91.7% (
Table 1). ChatGPT understandability scores ranged 82%–88%, actionability scores ranged 60%–80%, and procedure-specific items ranged 50%–75%.
Actionability scores were highest in ChatGPT (70%), intermediate in EMR template (68.3%), and lowest in Google search (56.7%) (
P = .04) (
Table 2). Electronic medical record templates trended toward the highest understandability (87.1%) compared with ChatGPT (83.3%) and Google search (66.7%) though this did not reach significance (
P = .21). Procedure-specific item scores were highest in Google search (83.3%) above EMR template (66.7%) and ChatGPT (64.6%) (
P = .02).
Discussion
Facial trauma patients, especially vulnerable subsets such as the undomiciled, urban poor, and elderly, tend to havea higher rate of complications associated with inadequate socioeconomic support, complex medical needs, and limited post-operative rates of compliance.[
6] A high rate of leaving against medical advice in the trauma population can further complicate sound post-operative care and follow-up.[
7] Therefore, informative and individualized post-operative care instructions are particularly critical in the facial trauma population to optimize outcomes.
There is a paucity of literature surrounding effective provision of high-quality post-operative instructions for facial trauma patients. This pilot aimed to understand the potential benefits in composition of post-operative instructions for common facial trauma procedures using a large language model as compared to EMR templates or Google search–based POIs. Prior work attempted the same in pediatric otolaryngology procedures and found that despite scoring modestly on PEMAT-P, ChatGPT held promise for patients and clinicians given its high customizability.[
3] In contrast, in the present study, ChatGPT demonstrated excellent actionability and high understandability PEMAT-P scores, but the ChatGPT-generated POIs were found lacking in procedure-specific details. As ChatGPT prompts can be easily altered to include certain details, this presents an easily circumventable issue. Further, the ability to tailor reading level allows for greater flexibility in providing patient-specific POIs. At the very least, ChatGPT offers a ready opportunity to improve the quality, readability, and actionability of institutional, search engine, or EMR templates.
ChatGPT is a powerful tool yet has notable limitations including the ability to generate realistic-sounding but false answers, lack of up-to-date knowledge base, lack of sourcing of its answers, and inevitable ethical implications. This pilot study was limited by small sample size, single-institution setting, and a small panel of reviewers. Future work will broaden the spectrum of facial trauma procedures, provide greater detail to ChatGPT prompts, and use an updated version of the large language model.
Conclusions
ChatGPT has great potential to be a useful adjunct in generating post-operative instructions for patients undergoing facial trauma procedures. Detailed and patient-specific prompts are necessary to output tailored instructions that are accurate, understandable, and actionable.