Previous Article in Journal
Evaluation of MRI-Based Measurements for Patellar Dislocation: Reliability and Reproducibility
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making

1
Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, 53100 Siena, Italy
2
Faculty of Medicine and Surgery, Peninsula Clinical School, Monash University, Melbourne, VIC 3199, Australia
3
Department of Plastic and Reconstructive Surgery, Frankston Hospital, Peninsula Health, Frankston, VIC 3199, Australia
*
Author to whom correspondence should be addressed.
Diagnostics 2025, 15(20), 2646; https://doi.org/10.3390/diagnostics15202646 (registering DOI)
Submission received: 28 September 2025 / Revised: 14 October 2025 / Accepted: 15 October 2025 / Published: 20 October 2025

Abstract

Background: Local flaps are a cornerstone of reconstructive plastic surgery for oncological skin defects, ensuring functional recovery and aesthetic integration. Their selection, however, varies with surgeon experience. Generative artificial intelligence has emerged as a potential decision-support tool, although its clinical role remains uncertain. Methods: We evaluated three generative AI platforms (ChatGPT-5 by OpenAI, Grok by xAI, and Gemini by Google DeepMind) in their free-access versions available in September 2025. Ten preoperative photographs of suspected cutaneous neoplastic lesions from diverse facial and limb sites were submitted to each platform in a two-step task: concise description of site, size, and tissue involvement, followed by the single most suitable local flap for reconstruction. Outputs were compared with the unanimous consensus of experienced plastic surgeons. Results: Performance differed across models. ChatGPT-5 consistently described lesion size accurately and achieved complete concordance with surgeons in flap selection. Grok showed intermediate performance, tending to recognise tissue planes better than lesion size and proposing flaps that were often acceptable but not always the preferred choice. Gemini estimated size well, yet was inconsistent for anatomical site, tissue involvement, and flap recommendation. When partially correct answers were considered acceptable, differences narrowed but the overall ranking remained unchanged. Conclusion: Generative AI can support reconstructive reasoning from clinical images with variable reliability. In this series, ChatGPT-5 was the most dependable for local flap planning, suggesting a potential role in education and preliminary decision-making. Larger studies using standardised image acquisition and explicit uncertainty reporting are needed to confirm clinical applicability and safety.
Keywords: plastic surgery; local flaps; reconstructive planning; clinical images; generative AI; large language models; decision support; surgical education plastic surgery; local flaps; reconstructive planning; clinical images; generative AI; large language models; decision support; surgical education

Share and Cite

MDPI and ACS Style

Marcaccini, G.; Corradini, L.; Shadid, O.; Seth, I.; Rozen, W.M.; Grimaldi, L.; Cuomo, R. From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making. Diagnostics 2025, 15, 2646. https://doi.org/10.3390/diagnostics15202646

AMA Style

Marcaccini G, Corradini L, Shadid O, Seth I, Rozen WM, Grimaldi L, Cuomo R. From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making. Diagnostics. 2025; 15(20):2646. https://doi.org/10.3390/diagnostics15202646

Chicago/Turabian Style

Marcaccini, Gianluca, Luca Corradini, Omar Shadid, Ishith Seth, Warren M. Rozen, Luca Grimaldi, and Roberto Cuomo. 2025. "From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making" Diagnostics 15, no. 20: 2646. https://doi.org/10.3390/diagnostics15202646

APA Style

Marcaccini, G., Corradini, L., Shadid, O., Seth, I., Rozen, W. M., Grimaldi, L., & Cuomo, R. (2025). From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making. Diagnostics, 15(20), 2646. https://doi.org/10.3390/diagnostics15202646

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop