Artificial Intelligence in Esophagectomy: A Systematic Review

Aleksiev, Vladimir; Markov, Daniel; Bechev, Kristian; Stanchev, Desislav; Shterev, Filip; Markov, Galabin

doi:10.3390/jcm15062169

Open AccessSystematic Review

Artificial Intelligence in Esophagectomy: A Systematic Review

by

Vladimir Aleksiev

^1,2,*

,

Daniel Markov

^3,4

,

Kristian Bechev

⁵,

Desislav Stanchev

^6,7

,

Filip Shterev

^2,8 and

Galabin Markov

⁹

¹

Department of Cardiovascular Surgery, Medical University of Plovdiv, 4002 Plovdiv, Bulgaria

²

Department of Thoracic Surgery, University Hospital “Kaspela”, 4002 Plovdiv, Bulgaria

³

Department of General and Clinical Pathology, Medical University of Plovdiv, 4002 Plovdiv, Bulgaria

⁴

Department of Clinical Pathology, University Multidisciplinary Hospital for Active Treatment “Pulmed”, 4002 Plovdiv, Bulgaria

⁵

Department of Neurological Surgery, Pulmed University Hospital, 4000 Plovdiv, Bulgaria

⁶

II-nd Department of Internal Diseases, Section of Gastroenterology, Medical University of Plovdiv, 4002 Plovdiv, Bulgaria

⁷

Department of Gastroenterology, University Hospital “Kaspela”, 4002 Plovdiv, Bulgaria

⁸

I-st Department of Internal Diseases, Section of Pneumology and Phthysiatrics, Medical University of Plovdiv, 4002 Plovdiv, Bulgaria

⁹

Faculty of Medicine, Medical University of Plovdiv, 4002 Plovdiv, Bulgaria

^*

Author to whom correspondence should be addressed.

J. Clin. Med. 2026, 15(6), 2169; https://doi.org/10.3390/jcm15062169

Submission received: 31 January 2026 / Revised: 5 March 2026 / Accepted: 9 March 2026 / Published: 12 March 2026

(This article belongs to the Special Issue Recent Clinical Advances in Esophageal Surgery)

Download

Browse Figures

Versions Notes

Abstract

Background: Esophagectomy remains a technically demanding oncologic procedure with substantial morbidity, despite ongoing advances in minimally invasive and robotic techniques. Limitations in intraoperative visualization and anatomical recognition contribute to complications such as nerve injury and bleeding. Artificial intelligence (AI)-based intraoperative video analysis has emerged as a potential adjunct to enhance surgical perception and safety, but its application in esophagectomy has not been comprehensively reviewed. Methods: A systematic review was conducted in accordance with PRISMA guidelines. PubMed, Scopus, and Web of Science were searched without a lower date limit to identify eligible studies published up to January 2026, capturing early and contemporary applications of intraoperative AI in esophagectomy. Human studies involving any surgical approach were included. Data on the AI task, methodology, validation strategy, performance metrics, and reported clinical outcomes was extracted. Risk of bias was assessed using the ROBINS-I tool. Results: Six studies met the inclusion criteria, predominantly evaluating AI-driven analysis of intraoperative video during minimally invasive or robotic esophagectomy. Reported applications included real-time anatomical structure recognition, recurrent laryngeal nerve segmentation, detection of excessive nerve traction, instrument and event recognition, and surgical phase identification. Across studies, AI systems demonstrated performance comparable to expert surgeons for selected tasks and achieved real-time or near–real-time inference. One study reported earlier detection of excessive recurrent laryngeal nerve traction compared to conventional nerve integrity monitoring. However, most studies were retrospective, single-center, and feasibility-focused, with limited external validation and minimal assessment of patient-centered clinical outcomes. Conclusions: Artificial intelligence-based intraoperative analysis in esophagectomy is increasingly achievable and may enhance anatomical recognition, intraoperative risk detection, and procedural awareness. Nevertheless, current evidence remains preliminary, heterogeneous, and largely exploratory. Prospective, multicenter studies with standardized reporting and clinically meaningful outcome evaluation are required before routine implementation. Until such data is available, AI should be regarded as a complementary intraoperative tool rather than a standalone clinical decision-making system.

Keywords:

esophagectomy; artificial intelligence; intraoperative video analysis; minimally invasive surgery; robotic surgery

1. Introduction

Despite the emerging advances in surgical oncology, esophageal cancer remains the 7th leading cause of cancer death worldwide [1]. It ranks among the leading causes of malignancy-related mortality, accounting for nearly 500,000 deaths each year, with just as many people being diagnosed with the disease annually [2]. Its aggressive biological behavior, tendency for early lymphatic spread, and frequent late presentation contribute to a poor prognosis and a 5-year-survival rate of less than 25% [3]. For patients with potentially curable disease, multimodal treatment consisting of neoadjuvant chemoradiotherapy or chemotherapy followed by surgical resection has become the standard of care and offers the best chance for long-term survival [4].

Esophagectomy, however, is among the most complex procedures in surgery. The operation requires meticulous dissection within anatomically confined and highly vascularized regions, including the mediastinum and abdomen, in close proximity to vital structures such as the recurrent laryngeal nerve, thoracic duct, major vessels, and airway. Limited exposure, challenging visualization, and demanding ergonomics contribute to a steep learning curve and a substantial risk of perioperative morbidity, even in experienced centers [5]. As a result, complications such as anastomotic leakage, pulmonary morbidity, and nerve injury remain prevalent and significantly impact postoperative outcomes [6].

The introduction of minimally invasive esophagectomy (MIE), including video-assisted thoracoscopic and laparoscopic approaches, has aimed to mitigate surgical trauma while maintaining oncologic radicality. More recently, robotic-assisted minimally invasive esophagectomy (RAMIE) has further transformed the surgical landscape by offering high-definition three-dimensional visualization, instrument articulation, tremor filtration, and improved surgeon ergonomics [7]. These advantages enhance exposure and precision during complex mediastinal dissection. Despite these technological developments, esophagectomy continues to be associated with substantial complication rates ranging from 20 to 40% across different surgical approaches [8]. This suggests that limitations in intraoperative perception, anatomy identification, and spatial awareness remain key contributors to adverse outcomes.

Artificial intelligence (AI) has emerged as a promising tool to address such limitations. Broadly, AI encompasses computational techniques that enable machines to perform tasks traditionally requiring human intelligence. In healthcare and surgery, AI applications are typically categorized into rule-based systems, machine learning algorithms, and deep learning models. Deep learning approaches often rely on neural networks that extract complex patterns from large datasets [9] (Figure 1). The increasing availability of high-quality surgical video data and digital perioperative information has positioned surgery as a particularly suitable domain for AI-driven innovation.

In surgical practice, AI has been applied to a range of tasks, including intraoperative image and anatomy recognition, surgical phase and workflow analysis, performance assessment, and decision support. In visually intensive procedures such as minimally invasive and robotic esophagectomy, AI-based analysis of operative video holds particular promise for improving anatomical orientation, enhancing safety, and supporting surgeon decision-making [10] (Figure 2). Despite growing interest and an expanding body of literature, the role of AI in esophagectomy has not yet been comprehensively synthesized.

Objective of the Study

The objective of this systematic review is to identify, summarize, and critically appraise the existing clinical and technical evidence on the intraoperative use of artificial intelligence during esophagectomy, with particular focus on the types of AI technologies employed, their intraoperative applications, validation methods, and reported clinical outcomes.

2. Materials and Methods

2.1. Study Design and Reporting Standards

This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement [11]. The PRISMA checklist is provided in the Supplementary Materials (Figure 3). The review aimed to evaluate the current evidence on the intraoperative application of artificial intelligence in esophagectomy.

2.2. Review Question and PICO Framework

The review question was defined using the PICO framework. The population of interest consisted of adult patients undergoing esophagectomy using any surgical approach, including open, minimally invasive, hybrid, or robotic techniques. The intervention was the intraoperative use of artificial intelligence, including but not limited to machine learning, deep learning, computer vision, and decision-support systems capable of real-time or near–real-time computational processing. Studies with or without a comparator, including comparisons with conventional surgery without AI assistance, were eligible. Outcomes of interest included feasibility, technical accuracy, validation performance, intraoperative decision-making, safety, and reported clinical outcomes.

2.3. Literature Search Strategy

A systematic literature search was performed in PubMed, Scopus, and Web of Science to identify relevant studies reporting intraoperative applications of artificial intelligence in esophagectomy. The search was conducted without a predefined lower date limit in order to allow for a comprehensive capture of early intraoperative applications of artificial intelligence during esophagectomy. All eligible studies published up to January 2026 were included. The last search was conducted on 15 January 2026. The last search was conducted on 15 January 2026. The search strategy combined terms related to esophagectomy and artificial intelligence, with particular emphasis on intraoperative applications and AI-assisted surgical video analysis. The mentioned search strategies are reviewed in detail in the Supplementary Materials.

2.4. Eligibility Criteria

Studies were eligible for inclusion if they met the following criteria:

Original studies reporting the intraoperative use of artificial intelligence;
Procedures involving esophagectomy;
Human studies;
Any surgical approach, including open, minimally invasive, or robotic surgery;
Reporting technical performance metrics and/or clinical outcomes.

Studies were excluded if they met any of the following criteria:

Artificial intelligence used exclusively in the preoperative setting (e.g., imaging, staging, prediction of response to neoadjuvant therapy) or postoperative setting (e.g., prediction of complications);
Simulation, phantom, cadaveric, or animal studies;
Conference abstracts without available full text;
Reviews, editorials, commentaries, or letters;
Studies not involving esophageal surgery or focusing on other malignancies such as gastric cancer.

2.5. Study Selection

All records identified through the database searches were exported to a standardized spreadsheet, and duplicates were removed. Two reviewers independently screened titles and abstracts to exclude studies not involving artificial intelligence or not addressing its intraoperative use during esophagectomy. Full-text articles were then independently assessed for eligibility by the same reviewers based on the predefined inclusion and exclusion criteria. Any disagreements were resolved through discussion with a third reviewer. Studies focusing exclusively on preoperative or postoperative AI applications were excluded at this stage.

2.6. Data Extraction

Data was extracted independently by two reviewers into a standardized spreadsheet. Discrepancies were resolved by consensus or consultation with a third reviewer. Extracted variables included study characteristics, surgical approach, type of artificial intelligence, intraoperative application, timing of AI use, training and validation datasets, reported performance metrics, surgeon interaction with AI, influence on intraoperative decision-making, and reported technical and clinical outcomes. Primary outcomes were AI technical performance metrics (e.g., Dice coefficient, IoU, AUC, and F1-score), intraoperative feasibility, and reported clinical outcomes (e.g., recurrent laryngeal nerve/RLN/injury, intraoperative decision impact). Due to heterogeneity of AI performance metrics, no pooled effect measures were calculated. Results are reported using study-specific diagnostic accuracy metrics (e.g., Dice coefficient, AUC, sensitivity, specificity, F1-score). Due to methodological heterogeneity in AI tasks, datasets, and outcome metrics, quantitative synthesis (meta-analysis) was not performed. A structured narrative synthesis was conducted, grouping studies by AI application domain (anatomy recognition, nerve protection, workflow analysis, etc.).

2.7. Risk of Bias Assessment

Risk of bias was evaluated using the ROBINS-I tool, and reporting quality was assessed with the STARD 2015 checklist. The results are summarized in Table 1. Overall, studies were at moderate to serious risk of bias, primarily due to retrospective, single-center designs, small sample sizes, and limited external validation. Outcome measurement was often unblinded, and confounding factors and data/frame selection methods were inconsistently addressed. Bias related to intervention classification and deviations was generally low to moderate, highlighting the preliminary nature of current evidence and the need for prospective, standardized validation. Due to the small number of heterogeneous studies, formal assessment of publication bias (e.g., funnel plots) was not performed. Formal certainty-of-evidence assessment (e.g., GRADE) was not conducted due to heterogeneity and absence of comparable outcome measures.

2.8. Use of Artificial Intelligence in Figure Creation and Language Editing and Correction

Artificial intelligence-based tools were used to assist in the graphical generation of selected schematic elements within Figure 1 and Figure 2. Specifically, generative design software was employed to create illustrative icons, visual motifs, and background graphical components. The conceptual structure, hierarchy, labeling, scientific content, and overall layout of the figures were designed and defined by the authors. All AI-generated graphical elements were reviewed, modified where necessary, and approved by the authors to ensure scientific accuracy and consistency with the manuscript. No scientific data was generated, altered, or interpreted using AI tools in figure preparation. The use of AI tools did not influence the study design, data extraction, data analysis, or interpretation of results. Artificial intelligence was also used for language editing and correction.

3. Results

Several studies evaluated artificial intelligence applications for intraoperative video analysis during esophagectomy, focusing on anatomical recognition, nerve protection, traction detection, and workflow analysis.

Table 2 summarizes key findings in the reviewed literature. Collectively, these studies show that contemporary AI models can achieve performance comparable to expert surgeons in selected tasks and operate at computational speeds compatible with intraoperative video streams (i.e., real-time or near–real-time processing). In some settings, these systems were also able to detect potentially injurious events earlier than conventional intraoperative monitoring. In the context of the included studies, “real-time” refers primarily to computational processing speeds compatible with intraoperative video streams rather than prospective deployment during live surgical procedures.

Furube et al. [12] developed a proof-of-concept AI system to detect excessive traction (ET) on the left recurrent laryngeal nerve (RLN) during robot-assisted minimally invasive esophagectomy (RAMIE). The model extended a previously validated anatomical recognition framework and was trained on video frames from 130 RAMIE cases, classifying frames as ET or non-ET and generating a real-time excessive traction risk (ETR) score (0–100%). In an independent evaluation using 10 surgical videos, the system correctly identified 84.4% of ET scenes (38/45). The ETR score correlated with visually assessed traction severity, and in a representative case, ET was detected earlier than changes in nerve integrity monitor (NIM) amplitude, suggesting potential for pre-injury warning.

In a separate study, Sato et al. [16] developed a deep learning-based model for real-time detection of the RLN in thoracoscopic esophagectomy videos. The AI system demonstrated superior RLN localization performance compared with general surgeons and performance approaching that of expert esophageal surgeons. The Dice coefficient for AI-based RLN segmentation was 0.58. Quantitative performance metrics indicated clinically relevant detection accuracy with computational inference speeds compatible with real-time video processing.

Den Boer et al. [15] evaluated a deep learning model for semantic segmentation of key anatomical structures (azygos vein/vena cava, aorta, and lung) in RAMIE videos. Using 1050 annotated frames (850 for training and 200 for testing), the model achieved median Dice coefficients of 0.79 for the azygos/vena cava, 0.74 for the aorta, and 0.89 for lung segmentation, with real-time inference (~39 frames/s). Performance was comparable to expert annotations for some structures, although greater variability was observed for aortic segmentation, highlighting challenges in reference standard definition.

Another study by Furube et al. [13] specifically assessed the diagnostic performance and clinical impact of AI-assisted RLN recognition. A deep learning model trained on 120 RAMIE videos was evaluated on eight external cases. Mean Intersection over Union (IoU) values were 0.40 ± 0.26 for the right RLN and 0.34 ± 0.27 for the left RLN. When AI assistance was provided, surgeons correctly identified the right RLN at the start of lymph node dissection in 81.3% of cases, compared with 46.9% without AI support (p = 0.004). During lymph node dissection, IoU values were significantly higher with AI assistance (0.59 ± 0.18 vs. 0.40 ± 0.29; p = 0.010), indicating improved anatomical delineation.

Brandenburg et al. [14] prospectively investigated the extraction of intraoperative “surgomic” features from robotic esophagectomy videos using machine learning combined with active learning (AL). Across a multicenter dataset of 22 RAMIE videos, 14,004 frames were annotated for ten video-based features related to instruments, bleeding, and anatomical structures. Bayesian ResNet18 models achieved a mean F1-score of 0.75 ± 0.16 across all features, with the highest performance observed for instrument detection. Active learning improved annotation efficiency compared with equidistant sampling, particularly for less frequent instrument classes, while maintaining comparable performance.

Finally, Takeuchi et al. [17] reported one of the earliest applications of AI for surgical workflow analysis in RAMIE. A deep learning model was trained to recognize surgical phases from annotated videos, achieving an overall accuracy of approximately 84% using cross-validation. Although the system was applied retrospectively and did not influence intraoperative decision-making, the study demonstrated the technical feasibility of automated workflow recognition in esophageal surgery.

Table 3 summarizes the principal intraoperative applications of artificial intelligence (AI) in esophagectomy, highlighting how contemporary AI systems are being integrated into the operating room to enhance surgical perception, workflow awareness, and decision support. The table categorizes AI use according to application domain, primary function, representative intraoperative tasks, and commonly employed computational methods. Collectively, these applications illustrate the expanding role of computer vision and machine learning in real-time anatomy recognition, surgical phase identification, risk pattern detection, instrument tracking, and augmented intraoperative guidance, with the overarching aim of improving surgical safety, efficiency, and outcomes during complex esophageal procedures.

Overall, included studies primarily evaluated feasibility and technical performance, with limited assessment of downstream clinical outcomes. Although a number of studies reported real-time or near–real-time inference speeds, most models were validated retrospectively using previously recorded surgical videos. In these studies, “real-time” generally referred to computational performance (e.g., frame processing rates compatible with live use) rather than prospective intraoperative deployment with integrated surgeon feedback during active procedures. No included study reported randomized or fully prospective clinical implementation assessing outcome impact during routine surgical workflow. Therefore, current evidence primarily reflects technical real-time capability rather than validated real-time clinical integration. In the context of the included studies, “real-time” means computational processing speeds compatible with intraoperative video streams, typically ranging from approximately 25–40 frames per second or inference times below 50 ms per frame. Such performance thresholds are considered sufficient to avoid perceptible latency during minimally invasive or robotic surgery. However, achieving these frame rates in retrospective testing does not necessarily guarantee seamless integration into live surgical environments, where system latency, hardware constraints, and interface design may influence effective responsiveness.

4. Discussion

This systematic review indicates that artificial intelligence-based analysis of intraoperative video during esophagectomy is achievable and may contribute to improved anatomical recognition, risk detection, and procedural understanding [13]. Across multiple studies, AI systems were able to identify critical anatomical structures, detect hazardous mechanical events, and recognize surgical workflow phases in real time or near real time, highlighting the growing potential of AI as an intraoperative support tool in complex esophageal surgery [16].

Several studies focused on protecting the recurrent laryngeal nerve, a structure whose injury is associated with significant postoperative morbidity. Furube et al. extended static anatomical recognition to dynamic risk assessment by developing a real-time AI system capable of detecting excessive traction on the RLN [12]. The reported detection rate of 84.4% and earlier identification of traction compared with nerve integrity monitoring suggest that visually driven AI may enable pre-injury alerts. This represents an important conceptual advance, as the model infers dynamic physiologic risk rather than static anatomy alone. However, the retrospective design, small external test set, and incomplete reporting of reference standards, blinding, and confidence intervals resulted in moderate adherence to the STARD criteria, limiting interpretability and generalizability. Importantly, the effect of such alerts on postoperative RLN palsy rates remains unproven.

Complementary work by Sato et al. demonstrated that deep learning-based RLN detection can outperform general surgeons and approach expert-level recognition during thoracoscopic esophagectomy. These findings support the role of AI in reducing surgeon-dependent variability in anatomical identification. Nonetheless, similar methodological limitations were observed, including incomplete reporting of diagnostic accuracy elements and reliance on retrospective, single-center datasets. While improved RLN recognition is a meaningful surrogate marker, future studies must determine whether these gains translate into measurable reductions in nerve injury and associated complications.

Beyond nerve-specific applications, den Boer et al. showed that AI-driven semantic segmentation can accurately identify major thoracic anatomical structures during robot-assisted minimally invasive esophagectomy, achieving real-time inference and Dice coefficients comparable to expert annotations for selected structures [15]. These findings underscore the feasibility of AI-assisted intraoperative guidance. However, performance variability between structures and inter-annotator differences highlights persistent challenges in defining reliable reference standards, a recurring limitation across surgical AI studies.

More broadly, the reviewed literature revealed substantial heterogeneity in evaluation metrics, validation strategies, and reporting quality. Many studies relied on overlap-based metrics such as the Dice coefficient or Intersection over Union, which do not directly capture clinical safety or patient-centered outcomes. Bridging the gap between technical accuracy and clinical utility requires carefully designed prospective studies. These studies should include predefined patient-centered endpoints. Future trials should evaluate whether AI-assisted anatomy recognition or risk detection translates into measurable reductions in clinically relevant complications, such as recurrent laryngeal nerve palsy, anastomotic leakage, intraoperative blood loss, or operative time. Randomized or controlled prospective study designs comparing AI-assisted versus conventional surgery would provide higher-level evidence regarding efficacy. In addition to complication rates, workflow metrics, surgeon cognitive load, and decision-modification frequency should be assessed to determine how AI integration influences intraoperative behavior. Only through such outcome-driven validation can technical performance metrics be meaningfully linked to improvements in surgical safety and patient outcomes. Validation cohorts among the available studies were frequently small. External or prospective validation was uncommon.

An additional concern relates to domain shift. Model performance may deteriorate when applied to data that differ from the original training environment. Most included studies relied on single-center datasets using specific imaging systems and surgical workflows. Variations in lighting, video resolution, tissue characteristics, and surgeon technique may affect robustness. Without multicenter training and out-of-sample validation, generalizability remains uncertain.

Future research should prioritize collaborative data sharing and multicenter validation to ensure that AI models maintain stable performance across diverse surgical environments. Incomplete reporting of blinding, missing data handling, and precision estimates further constrained reproducibility, as reflected by moderate STARD scores across studies. The presence of serious risk of bias in several studies, particularly in the domain of outcome measurement, warrants careful interpretation of reported performance metrics. In many cases, reference annotations were performed retrospectively and without blinding, potentially introducing observer bias and overestimation of model accuracy. Furthermore, the absence of standardized ground-truth definitions for anatomical segmentation may have influenced overlap-based metrics such as Dice coefficient and IoU. These methodological limitations suggest that reported performance values should be interpreted as preliminary technical estimates rather than definitive indicators of clinical effectiveness.

Emerging work in surgomics and workflow analysis expands the scope of intraoperative AI beyond anatomy recognition. Brandenburg et al. demonstrated that active learning can reduce annotation burden while maintaining robust performance in multicenter surgical video datasets, representing a critical step toward scalable AI development [14]. Similarly, Takeuchi et al. established the feasibility of automated surgical phase recognition, providing foundational context-awareness upon which future real-time guidance and decision-support systems may be built [17]. However, both approaches remain primarily technical, with limited linkage to clinical outcomes.

Collectively, the available evidence suggests that AI-based systems have the capacity to augment intraoperative perception during esophagectomy by improving anatomical recognition, detecting procedural risk, and contextualizing surgical workflow [18,19,20]. Nevertheless, substantial methodological and translational barriers remain. Future research should prioritize prospective, multicenter validation; standardized and transparent reporting frameworks (including STARD, TRIPOD-AI, and emerging surgical AI guidelines); and evaluation of clinically meaningful endpoints such as nerve injury, complication rates, and operative safety [21]. Until these gaps are addressed, AI systems in esophagectomy should be viewed as promising adjuncts rather than fully validated clinical tools.

Given the sensitivity of surgical video data and regulatory constraints surrounding patient privacy, federated learning represents a promising strategy for future multicenter collaboration. Federated learning enables model training across institutions without transferring raw patient data, allowing decentralized data utilization while preserving confidentiality. Such approaches may help mitigate domain shift, improve generalizability, and facilitate large-scale validation of intraoperative AI systems in esophagectomy without compromising data security.

Bearing in mind the early-phase and heterogeneous nature of all included studies, overall certainty of evidence is considered low to moderate.

5. Limitations

This review has several limitations. First, the available evidence is limited by the early-stage nature of research in this field, with most included studies being retrospective, single-center investigations with small validation cohorts. Second, substantial heterogeneity in artificial intelligence methodologies, datasets, outcome definitions, and reporting standards prevented quantitative synthesis and limited cross-study comparability. Third, the reliance on institution-specific datasets may restrict generalizability across surgical environments. Finally, although comprehensive search strategies were used, the possibility of publication bias cannot be excluded. These limitations highlight the need for larger prospective multicenter studies with standardized reporting and outcome evaluation.

6. Conclusions

Artificial intelligence-based intraoperative video analysis in esophagectomy represents an emerging approach that may enhance anatomical recognition, risk detection, and procedural understanding. Current evidence indicates that AI can support identification of critical structures such as the recurrent laryngeal nerve, detect hazardous intraoperative events, and contextualize surgical workflow in real time. However, available studies are largely proof-of-concept, retrospective, and single-center, with limited independent validation and scarce assessment of clinical outcomes. Before routine clinical implementation, prospective, multicenter studies evaluating patient-centered endpoints and standardized reporting frameworks are required. In the interim, AI should be considered a complementary intraoperative tool with potential to augment, rather than replace, surgeon expertise.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jcm15062169/s1, Table S1: PRISMA Checklist; Table S2: Assessing Risk of Bias and reporting quality for the final study selection; Search strategy; List of articles from each database; Spreadsheet after article screening.

Author Contributions

Conceptualization, V.A. and D.M.; methodology, D.S.; software, G.M.; validation, D.M., V.A. and K.B.; formal analysis, D.S.; investigation, V.A., D.M. and G.M.; resources, K.B. data curation, K.B.; writing—original draft preparation, V.A.; writing—review and editing, F.S.; visualization, G.M.; supervision, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the findings of this systematic review, including the search strategy and screening documentation, are available in the PROSPERO registry under registration number CRD420261295799.

Acknowledgments

During the preparation of this manuscript, the authors used artificial intelligence-based tools for image creation, as noted in the main text, and for spelling and language correction. The authors reviewed, edited, and verified all AI-assisted content and take full responsibility for the accuracy and integrity of the work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
MIE	Minimally invasive esophagectomy
RAMIE	Robot-assisted minimally invasive esophagectomy
ET	Excessive traction
RLN	Recurrent laryngeal nerve
NIM	Nerve integrity monitor
CNN	Convolutional neural network
RNN	Recurrent neural network
LSTM	Long short-term memory
ML	Machine learning
AUC	Area under the curve
IoU	Intersection over Union

References

Teng, Y.; Xia, C.; Cao, M.; Yang, F.; Yan, X.; He, S.; Cao, M.; Zhang, S.; Li, Q.; Tan, N.; et al. Esophageal cancer global burden profiles, trends, and contributors. Cancer Biol. Med. 2024, 21, 656–666. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Lu, S.; Li, K.; Wang, K.; Liu, G.; Han, Y.; Peng, L.; Chen, L.; Leng, X. Global trends of esophageal cancer among individuals over 60 years: An epidemiological analysis from 1990 to 2050 based on the global burden of disease study 1990–2021. Oncol. Rev. 2025, 19, 1616080. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Then, E.O.; Lopez, M.; Saleem, S.; Gayam, V.; Sunkara, T.; Culliford, A.; Gaduputi, V. Esophageal Cancer: An Updated Surveillance Epidemiology and End Results Database Analysis. World J. Oncol. 2020, 11, 55–64. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
van Heijl, M.; van Lanschot, J.J.; Koppert, L.B.; van Berge Henegouwen, M.I.; Muller, K.; Steyerberg, E.W.; van Dekken, H.; Wijnhoven, B.P.; Tilanus, H.W.; Richel, D.J.; et al. Neoadjuvant chemoradiation followed by surgery versus surgery alone for patients with adenocarcinoma or squamous cell carcinoma of the esophagus (CROSS). BMC Surg. 2008, 8, 21. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Achim, F.; Otsuka, K.; Yamashita, T.; Asagoe, Y.; Kurita, D.; Constantin, A.; Constantinoiu, S.; Mohssen, A.; Rosianu, C.; Rotariu, A.; et al. Advances in Minimally Invasive Esophagectomy-An Overview of Recent Developments and a Novel Classification of Innovations in Treatment of Thoracic Esophageal Cancer. Medicina 2025, 61, 2176. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Edmondson, J.; Hunter, J.; Bakis, G.; O’Connor, A.; Wood, S.; Qureshi, A.P. Understanding Post-Esophagectomy Complications and Their Management: The Early Complications. J. Clin. Med. 2023, 12, 7622. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kingma, B.F.; de Maat, M.F.G.; van der Horst, S.; van der Sluis, P.C.; Ruurda, J.P.; van Hillegersberg, R. Robot-assisted minimally invasive esophagectomy (RAMIE) improves perioperative outcomes: A review. J. Thorac. Dis. 2019, 11, S735–S742. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Chowdappa, R.; Dharanikota, A.; Arjunan, R.; Althaf, S.; Premalata, C.S.; Ranganath, N. Operative Outcomes of Minimally Invasive Esophagectomy versus Open Esophagectomy for Resectable Esophageal Cancer. South Asian J. Cancer 2021, 10, 230–235. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Rabbani, S.A.; El-Tanani, M.; Sharma, S.; Rabbani, S.S.; El-Tanani, Y.; Kumar, R.; Saini, M. Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics 2025, 5, 37. [Google Scholar] [CrossRef]
Kenig, N.; Monton Echeverria, J.; Muntaner Vives, A. Artificial Intelligence in Surgery: A Systematic Review of Use and Validation. J. Clin. Med. 2024, 13, 7108. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Furube, T.; Takeuchi, M.; Kawakubo, H.; Noma, K.; Maeda, N.; Daiko, H.; Ishiyama, K.; Otsuka, K.; Kishimoto, Y.; Koyanagi, K.; et al. Impact of Artificial Intelligence on the Timing of Recurrent Laryngeal Nerve Recognition during Robot-Assisted Minimally Invasive Esophagectomy. Ann. Surg. Oncol. 2025, 32, 6366–6373. [Google Scholar] [CrossRef] [PubMed]
Furube, T.; Takeuchi, M.; Kawakubo, H.; Noma, K.; Maeda, N.; Daiko, H.; Ishiyama, K.; Otsuka, K.; Sato, Y.; Koyanagi, K.; et al. Usefulness of an Artificial Intelligence Model in Recognizing Recurrent Laryngeal Nerves During Robot-Assisted Minimally Invasive Esophagectomy. Ann. Surg. Oncol. 2024, 31, 9344–9351. [Google Scholar] [CrossRef] [PubMed]
Brandenburg, J.M.; Schulze, A.; Jenke, A.C.; Bhasker, N.; Bleser, N.; Junger, D.; Stern, A.; Rivoir, D.; Naderi, H.; Fritz-Kebede, F.; et al. Surgical workflow analysis for Surgomics and context-aware assistance in robot-assisted minimally invasive esophagectomy (RAMIE): A retrospective, single-arm, multicenter annotation and machine learning study. Eur. J. Surg. Oncol. 2026, 52, 111174. [Google Scholar] [CrossRef] [PubMed]
den Boer, R.B.; Jaspers, T.J.M.; de Jongh, C.; Pluim, J.P.W.; van der Sommen, F.; Boers, T.; van Hillegersberg, R.; Van Eijnatten, M.A.J.M.; Ruurda, J.P. Deep learning-based recognition of key anatomical structures during robot-assisted minimally invasive esophagectomy. Surg. Endosc. 2023, 37, 5164–5175. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Sato, K.; Fujita, T.; Matsuzaki, H.; Takeshita, N.; Fujiwara, H.; Mitsunaga, S.; Kojima, T.; Mori, K.; Daiko, H. Real-time detection of the recurrent laryngeal nerve in thoracoscopic esophagectomy using artificial intelligence. Surg. Endosc. 2022, 36, 5531–5539, Erratum in Surg. Endosc. 2022, 36, 9483. https://doi.org/10.1007/s00464-022-09705-w. [Google Scholar] [CrossRef] [PubMed]
Takeuchi, M.; Kawakubo, H.; Saito, K.; Maeda, Y.; Matsuda, S.; Fukuda, K.; Nakamura, R.; Kitagawa, Y. Automated Surgical-Phase Recognition for Robot-Assisted Minimally Invasive Esophagectomy Using Artificial Intelligence. Ann. Surg. Oncol. 2022, 29, 6847–6855. [Google Scholar] [CrossRef] [PubMed]
Mela, E.; Tsapralis, D.; Papaconstantinou, D.; Sakarellos, P.; Vergadis, C.; Klontzas, M.E.; Rouvelas, I.; Tzortzakakis, A.; Schizas, D. Current Role of Artificial Intelligence in the Management of Esophageal Cancer. J. Clin. Med. 2025, 14, 1845. [Google Scholar] [CrossRef] [PubMed]
Lopes, S.; Mascarenhas, M.; Fonseca, J.; Fernandes, M.G.O.; Leite-Moreira, A.F. Unveiling the Algorithm: The Role of Explainable Artificial Intelligence in Modern Surgery. Healthcare 2025, 13, 3208. [Google Scholar] [CrossRef]
Hong, Y.; Wang, H.; Zhang, Q.; Zhang, P.; Cheng, K.; Cao, G.; Zhang, R.; Chen, B. Machine Learning and Deep Learning Hybrid Approach Based on Muscle Imaging Features for Diagnosis of Esophageal Cancer. Diagnostics 2025, 15, 1730. [Google Scholar] [CrossRef]
Leivaditis, V.; Maniatopoulos, A.A.; Lausberg, H.; Mulita, F.; Papatriantafyllou, A.; Liolis, E.; Beltsios, E.; Adamou, A.; Kontodimopoulos, N.; Dahm, M. Artificial Intelligence in Thoracic Surgery: A Review Bridging Innovation and Clinical Practice for the Next Generation of Surgical Care. J. Clin. Med. 2025, 14, 2729. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]

Figure 1. Conceptual hierarchy of artificial intelligence, illustrating the relationship between broad AI, machine learning, deep learning, and generative AI models.

Figure 2. Conceptual framework of technologies enabling artificial intelligence integration during esophagectomy, including real-time segmentation, instrument tracking, and decision support.

Figure 3. PRISMA flowchart outlining the systematic search process.

Table 1. Assessing Risk of Bias and reporting quality for the final study selection.

Title	Furube et al. (2025) [12]	Furube et al. (2024) [13]	Brandenburg et al. (2023) [14]	den Boer et al. (2023) [15]	Sato et al. (2022) [16]	Takeuchi et al. (2022) [17]
Bias due to confounding	serious	moderate	moderate	moderate	moderate	moderate
Bias in selection of participants	moderate	moderate	moderate	moderate	moderate	moderate
Bias in classification of intervention	moderate	low	low	low	low	low
Bias due to deviations from intended intervention	low	low	low	low	low	low
Bias due to missing data	moderate	moderate	moderate	moderate	moderate	low
Bias in measurement of outcomes	serious	serious	serious	serious	moderate	moderate
Overall risk of bias	serious	serious	moderate	moderate	moderate	moderate
STARD 2015 Checklist Assessment	63% (17/27)	59% (16/27)	67% (18/27)	60% (18/30)	70% (19/27)	68% (17/25)

Table 2. Summary of published studies evaluating intraoperative artificial intelligence applications in esophagectomy, including AI task focus and key performance metrics.

Study (Year)	AI Task	Key Performance Metrics	Model Architecture/Framework
Furube et al. (2025) [12]	Real-time detection of excessive traction on RLN	Correct detection of unintended nerve traction: 84.4%; Excessive traction risk (ETR) score correlated with traction degree; AI detected ET earlier than NIM signal in representative case (pre-injury potential)	Deep learning-based convolutional neural network (CNN) built upon previously developed anatomical recognition framework; frame-level binary classification model for excessive traction detection with real-time risk score generation
Furube et al. (2024) [13]	RLN segmentation and recognition metrics	AUC: 0.92 (left), 0.88 (right); Dice: 0.72; Sensitivity: 0.86; Specificity: 0.89; IoU: 0.40 ± 0.26 (right), 0.34 ± 0.27 (left); surgeon assistance improved RLN recognition rates and IoU with AI assistance	CNN-based semantic segmentation model for RLN recognition (deep learning segmentation framework; supervised training on annotated RAMIE frames)
Brandenburg et al. (2023) [14]	Surgomic feature recognition (Active Learning)	Mean F1-score: 0.75 ± 0.16 (all features); instrument detection F1: 0.80 ± 0.17; inter-rater agreement κ > 0.82; AL improved rare instrument sample selection and performance vs. EQS	Bayesian ResNet18 backbone combined with Active Learning (AL) framework for surgomic feature classification
den Boer et al. (2023) [15]	Anatomical structure segmentation (Bayesian NN)	Median Dice: 0.79 (azygos/vena cava), 0.74 (aorta), 0.89 (lung); algorithm comparable to expert annotations; inference time ~0.026 s/frame (39 Hz)	Bayesian convolutional neural network for semantic segmentation (uncertainty-aware deep learning architecture)
Sato et al. (2022) [16]	Recurrent laryngeal nerve (RLN) segmentation	Dice coefficient: 0.58 (AI) vs. 0.62 (expert) vs. 0.47 (general surgeons); AI performance superior to general surgeons (p = 0.019)	U-Net-based deep learning semantic segmentation model for RLN detection
Takeuchi et al. (2022) [17]	Surgical phase recognition	Overall accuracy: 84%; precision: ~0.84; per-phase recall: 58–93%	CNN-based model combined with temporal sequence modeling (CNN + LSTM) for surgical phase recognition

Table 3. Intraoperative applications of artificial intelligence in esophagectomy, categorized by application domain, primary function, representative intraoperative use, and typical AI methods.

AI Application Domain	Primary Function	Representative Intraoperative Use	Typical AI Methods
Anatomy detection and recognition	Identification and delineation of critical structures	Real-time recognition of RLN, aorta, azygos vein, lung during dissection	CNN-based deep learning, semantic segmentation, transfer learning
Surgical phase recognition	Temporal classification of procedural steps	Automated identification of operative phases to provide contextual awareness	Deep learning with CNNs and temporal models (e.g., RNN/LSTM)
Pattern and event detection	Detection of predefined intraoperative risk patterns	Identification of excessive traction, bleeding, smoke, or unsafe tissue handling	Deep learning classification models, CNNs, active learning
Instrument detection and tracking	Recognition and localization of surgical tools	Real-time tracking of instruments to infer surgical intent and motion	Computer vision, CNN-based object detection
Intraoperative guidance and decision support	Augmented cognition and risk mitigation	Visual overlays, alerts for nerve traction, anatomy highlighting	AI-augmented computer vision, augmented reality, ML-based risk models

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aleksiev, V.; Markov, D.; Bechev, K.; Stanchev, D.; Shterev, F.; Markov, G. Artificial Intelligence in Esophagectomy: A Systematic Review. J. Clin. Med. 2026, 15, 2169. https://doi.org/10.3390/jcm15062169

AMA Style

Aleksiev V, Markov D, Bechev K, Stanchev D, Shterev F, Markov G. Artificial Intelligence in Esophagectomy: A Systematic Review. Journal of Clinical Medicine. 2026; 15(6):2169. https://doi.org/10.3390/jcm15062169

Chicago/Turabian Style

Aleksiev, Vladimir, Daniel Markov, Kristian Bechev, Desislav Stanchev, Filip Shterev, and Galabin Markov. 2026. "Artificial Intelligence in Esophagectomy: A Systematic Review" Journal of Clinical Medicine 15, no. 6: 2169. https://doi.org/10.3390/jcm15062169

APA Style

Aleksiev, V., Markov, D., Bechev, K., Stanchev, D., Shterev, F., & Markov, G. (2026). Artificial Intelligence in Esophagectomy: A Systematic Review. Journal of Clinical Medicine, 15(6), 2169. https://doi.org/10.3390/jcm15062169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence in Esophagectomy: A Systematic Review

Abstract

1. Introduction

Objective of the Study

2. Materials and Methods

2.1. Study Design and Reporting Standards

2.2. Review Question and PICO Framework

2.3. Literature Search Strategy

2.4. Eligibility Criteria

2.5. Study Selection

2.6. Data Extraction

2.7. Risk of Bias Assessment

2.8. Use of Artificial Intelligence in Figure Creation and Language Editing and Correction

3. Results

4. Discussion

5. Limitations

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI