Setting the Research Agenda for Clinical Artificial Intelligence in Pancreatic Adenocarcinoma Imaging

Simple Summary Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers worldwide, associated with a 98% loss of life expectancy and a 30% increase in disability-adjusted life years. Image-based artificial intelligence (AI) can help improve outcomes for PDAC given that current clinical guidelines are non-uniform and lack evidence-based consensus. However, research on image-based AI for PDAC is too scattered and lacking in sufficient quality to be incorporated into clinical workflows. In this review, an international, multi-disciplinary team of the world’s leading experts in pancreatic cancer breaks down the patient pathway and pinpoints the current clinical touchpoints in each stage. The available PDAC imaging AI literature addressing each pathway stage is then rigorously analyzed, and current performance and pitfalls are identified in a comprehensive overview. Finally, the future research agenda for clinically relevant, image-driven AI in PDAC is proposed. Abstract Pancreatic ductal adenocarcinoma (PDAC), estimated to become the second leading cause of cancer deaths in western societies by 2030, was flagged as a neglected cancer by the European Commission and the United States Congress. Due to lack of investment in research and development, combined with a complex and aggressive tumour biology, PDAC overall survival has not significantly improved the past decades. Cross-sectional imaging and histopathology play a crucial role throughout the patient pathway. However, current clinical guidelines for diagnostic workup, patient stratification, treatment response assessment, and follow-up are non-uniform and lack evidence-based consensus. Artificial Intelligence (AI) can leverage multimodal data to improve patient outcomes, but PDAC AI research is too scattered and lacking in quality to be incorporated into clinical workflows. This review describes the patient pathway and derives touchpoints for image-based AI research in collaboration with a multi-disciplinary, multi-institutional expert panel. The literature exploring AI to address these touchpoints is thoroughly retrieved and analysed to identify the existing trends and knowledge gaps. The results show absence of multi-institutional, well-curated datasets, an essential building block for robust AI applications. Furthermore, most research is unimodal, does not use state-of-the-art AI techniques, and lacks reliable ground truth. Based on this, the future research agenda for clinically relevant, image-driven AI in PDAC is proposed.


Introduction
Pancreatic cancer is one of the deadliest cancers worldwide, with a 5-year survival rate of less than 5% [1]. Pancreatic ductal adenocarcinoma (PDAC), the most common and aggressive type of pancreatic cancer, has become a medical emergency in the past decades. PDAC tumours present highly aggressive behavior, leading to 98% life expectancy loss of the PDAC patient pathway are illustrated in Figure 1, and the clinical touchpoints for potential AI development in each step are explored in the following sections. specific steps of the PDAC patient pathway are illustrated in Figure 1, and the clinical touchpoints for potential AI development in each step are explored in the following sections. Below, the vertical boxes show the actions/guidelines for PDAC used in each step. The width of the streams represents the proportion of patients that go through each branch of the pathway, and the colours of the streams represent the number of AI publications found on that topic. Rx: resection; nCTx: neoadjuvant chemo(radio)therapy; aCTx: adjuvant/induction therapy; Px: palliative care.

Detection
Timely detection is crucial to improve PDAC patients' outcomes, as the 5-year survival increases from only 3% in metastatic patients to 42% when the tumour is still confined to the primary site [14]. According to the Japan Pancreatic Cancer Registry, patients in the earliest disease stage show a survival rate as high as 80.4% but account for only 0.8% of cases [15]. Due to the low incidence of PDAC, defining and screening groups at risk is a vital step to improve patient outcome. Research on risk factors, new screening protocols, and non-invasive tumour biomarkers is on the rise, but so far there are no validated biomarkers or tools for early detection. Therefore, screening is still not part of the PDAC patient pathway as it is cost-prohibitive with current technology. The most used modality for PDAC detection is multi-phase contrast-enhanced CT (CECT). However, early PDAC detection on CECT remains challenging, as lesions are small (size <2 cm), present poorly defined margins, and are more often iso-attenuating [5,16]. Radiologists' sensitivity at detecting lesions with size smaller than 2 cm on CECT has been reported to be as low as 58% [5,16]. Contrast-enhanced MRI is highly effective at detecting tumours that are poorly visible on CECT, but is not yet routinely implemented in the clinic [17]. EUS is a widely accepted modality for the diagnosis of PDAC.
Early detection can be facilitated by the timely identification of secondary imaging signs predictive of PDAC, such as main pancreatic duct cut-off or dilation, parenchymal atrophy, and irregular pancreatic contour [5,18]. These signs are often visible on CECT scans 18 to 12 months prior to clinical diagnosis, but the reported radiologists' sensitivity for their timely detection is only 44%, limiting the chances of early action [18]. Below, the vertical boxes show the actions/guidelines for PDAC used in each step. The width of the streams represents the proportion of patients that go through each branch of the pathway, and the colours of the streams represent the number of AI publications found on that topic. Rx: resection; nCTx: neoadjuvant chemo(radio)therapy; aCTx: adjuvant/induction therapy; Px: palliative care.

Detection
Timely detection is crucial to improve PDAC patients' outcomes, as the 5-year survival increases from only 3% in metastatic patients to 42% when the tumour is still confined to the primary site [14]. According to the Japan Pancreatic Cancer Registry, patients in the earliest disease stage show a survival rate as high as 80.4% but account for only 0.8% of cases [15]. Due to the low incidence of PDAC, defining and screening groups at risk is a vital step to improve patient outcome. Research on risk factors, new screening protocols, and non-invasive tumour biomarkers is on the rise, but so far there are no validated biomarkers or tools for early detection. Therefore, screening is still not part of the PDAC patient pathway as it is cost-prohibitive with current technology. The most used modality for PDAC detection is multi-phase contrast-enhanced CT (CECT). However, early PDAC detection on CECT remains challenging, as lesions are small (size <2 cm), present poorly defined margins, and are more often iso-attenuating [5,16]. Radiologists' sensitivity at detecting lesions with size smaller than 2 cm on CECT has been reported to be as low as 58% [5,16]. Contrast-enhanced MRI is highly effective at detecting tumours that are poorly visible on CECT, but is not yet routinely implemented in the clinic [17]. EUS is a widely accepted modality for the diagnosis of PDAC.
Early detection can be facilitated by the timely identification of secondary imaging signs predictive of PDAC, such as main pancreatic duct cut-off or dilation, parenchymal atrophy, and irregular pancreatic contour [5,18]. These signs are often visible on CECT scans 18 to 12 months prior to clinical diagnosis, but the reported radiologists' sensitivity for their timely detection is only 44%, limiting the chances of early action [18].

Diagnosis
PDAC symptoms are mostly unspecific in early disease stages, and as lesional appearances are heterogeneous on CECT, patients are often initially misdiagnosed with other, more common abdominal diseases with similar symptomatology (e.g., gallbladder diseases, acute or chronic pancreatitis, duodenum cancer) [18,19]. Initially misdiagnosed patients are reported to present higher rates of abdominal pain, weight loss, and acute pancreatitis than correctly diagnosed patients, and are at a higher risk of advanced disease [19]. EUS is also a widely accepted imaging modality for the diagnosis of PDAC, especially for lesions less than 2-3 cm in size in which it reaches superior sensitivity compared to CT [20]. Furthermore, EUS has a high negative predictive value and can be used to reliably exclude pancreatic cancer [20]. Histopathology assessment is the current gold standard for PDAC diagnosis confirmation and is usually based on EUS fine-needle cytology or biopsy. Nevertheless, the morphological distinction of PDAC from other lesions on small biopsies or cytology samples can be challenging, especially given the minimal amount of lesional material that is often contained in these samples [21].

Staging
Following histopathology diagnosis, the most used method for PDAC staging is the TNM classification by the American Joint Committee on Cancer (AJCC). The local tumour extent (T stage), the dissemination to the regional lymph nodes (N stage), and the metastatic spread to distant sites (M stage) are used to stratify patients, determine their prognosis, and indicate treatment and monitoring strategy [19]. Nevertheless, the TNM classification's predictiveness for overall survival (OS) is not reliable [21]. A 2018 multicentre study aiming to validate the AJCC TNM 8th edition in a cohort of 1525 patients receiving pancreatoduodenectomy reported a concordance index of 0.57 (95% CI, 0.55-0.60) for OS prediction [22].

Treatment
The most common treatment options for PDAC are resection and chemo(radio)therapy, in particular using FOLFIRINOX and gemcitabine-abraxane [2]. Surgical resection (Rx) is the only option for potential long-term survival, but as can be seen in Figure 1 is only suitable for a minority (10-15%) of patients (stages I, II). Most patients are diagnosed in later disease stages (III, IV) where Rx is no longer possible due to metastasis or extensive vessel involvement [23]. Imaging assessment of tumour-vascular contact primarily determines eligibility for Rx, but there are no widely accepted, evidence-based guidelines for the appropriate tumour resectability criteria [5,24]. As a result, the 5-year survival rate of resected PDAC patients is only 30-58%, with 69-75% of patients relapsing within two years [1,25].
As illustrated in Figure 1, most patients receive chemo(radio)therapy at some point during treatment [4]. Neo-adjuvant chemo(radio)therapy (nCTx) intends to optimise surgical outcome in patients with resectable disease, while adjuvant chemo(radio)therapy (aCTx) is used to downstage unresectable patients. After aCTx, patients may become resectable and undergo Rx or be referred to palliative care (Px), which is intended to suppress disease-related pain and lengthen the patient's life. Although most patients experience chemotherapy-induced toxicity, often with limited efficacy due to biological resistance, a priori prediction of chemotherapy response is still not possible in current clinical work-up [26,27].

Treatment Monitoring
Following curative resection, histopathology analysis of the resected specimen is performed to confirm the diagnosis of PDAC and to map the extent of disease. This includes the assessment of lymph node metastases (LNM), tumour permeation along lymphatics/blood vessels, and the clearance to the resection margins (resection margin status) [28]. Nevertheless, the prognostic value of these parameters is still controversial, with several studies reporting no significant relationship to survival [28][29][30]. The main reasons for the low predictive power of histopathology findings are the lack of standardised evaluation, consensus definitions, and reporting approaches [31,32].
In patients undergoing chemo(radio)therapy, imaging is critical for determining therapeutic response and selection of the next treatment approach, as acquiring a biopsy could lead to an increase of inflammation [32]. The Response Evaluation Criteria in Solid Tumours (RECIST) 1.1 (2009) is the current standard to evaluate chemo(radio)therapy. This is a purely morphological criteria that quantitatively tracks tumour burden changes based on alterations to the lesions' size. Although RECIST shows some success in monitoring response based on metastases assessment, it is ineffective when considering the primary tumour, as PDAC lesions present poorly defined borders and significant heterogeneity in regression/progression patterns [32]. Furthermore, chemo(radio)therapy often results in necrotic, fibrous, or inflammatory changes, which translate into an apparent enlargement of the lesion in CT/MRI scans that can be misinterpreted as tumour progression [32].
Current histopathological tumour regression grade (TRG) systems for PDAC are based on a semiquantitative evaluation of the destruction of cancer cells, the amount of residual viable cancer, or the extent of fibrosis induced by treatment. However, current TRG systems are based on imprecise, difficult-to-apply criteria, and a standardised and widely accepted grading system for the histological evaluation of TRG in pancreatic cancer has not yet been established [9,33,34]. These factors make RECIST and histopathology TRG insufficient for predicting local oncological response in PDAC patients [31,32].

Materials and Methods
Searches were conducted on PubMed, Web of Science, Cochrane, and Embase on 14 September 2021 and updated on 25 January 2022. Additional information about the search strategy can be found in Appendix A and Table A1. Articles were included for evaluation if patient information was available, cohort size was larger than 20 patients, AI was developed to predict a given outcome related to PDAC, and the proposed AI model used imaging (CT, MRI, EUS, PET-CT, whole-slide images (WSI)) as input. Articles were excluded if the research used non-human subjects, did not show any performance, did not report how results were validated.,or used the same cohort for training and reporting of the results.

Results
A total of 2322 records were retrieved from the electronic databases, and 1076 articles remained after duplicate removal. Titles and abstracts were reviewed on the basis of the inclusion criteria, and 95 articles were eligible for full-text screening. Finally, a total of 69 studies fulfilled the inclusion criteria and were considered for analysis. The flowchart for the inclusion of studies is shown in Figure 2.

Detection
Eleven articles addressed AI for automated PDAC detection (Table 1). Only three articles stratified the results based on tumour size, reporting model performance for the subgroup of lesions with sizes smaller than 2 cm [35][36][37]. Two papers (Alves et al.  [35,38]. The study by Liu et al. (2020) was the only one comparing AI performance to radiologists based on the analysis of radiology reports, but no reader study was conducted [37]. As is shown in Table 1, only three studies externally tested the proposed models, and four articles used internal cross-validation without separate testing set [35][36][37][38][39][40][41]. Cancers 2022, 14, x FOR PEER REVIEW 6 of 20

Detection
Eleven articles addressed AI for automated PDAC detection (Table 1). Only three articles stratified the results based on tumour size, reporting model performance for the subgroup of lesions with sizes smaller than 2 cm [35][36][37]. Two papers (Alves et al. (2022) and Wang et al. (2021)) reported the results for both lesion detection and localization, and only one paper proposed a fully automatic approach (Alves et al. (2022)) [35,38]. The study by Liu et al. (2020) was the only one comparing AI performance to radiologists based on the analysis of radiology reports, but no reader study was conducted [37]. As is shown in Table 1, only three studies externally tested the proposed models, and four articles used internal cross-validation without separate testing set [35][36][37][38][39][40][41].

Diagnosis
Eighteen papers explored AI for differential PDAC diagnosis ( Table 2). The majority of papers (14/18) focused on radiology imaging, mostly (13/14) regarding binary classification between PDAC and another type of lesion, with only one paper tackling multiclass classification [46]. Three publications focused on AI for the histopathological diagnosis of PDAC.  2021) were the first to utilise DL to automatically identify different anatomical tissue structures and diseases on WSI [8,47,48]. AI validation is limited. Only three studies externally tested the proposed models, while nine papers had internal cross-validation, without a separate testing set [8,46,49].

Staging
Thirteen AI papers covered staging (Table 3). Only one publication considered histopathological data. Two articles (An et al., (2021) and Chaddad et al., (2020)) used DL, with the remaining majority using radiomics [63,64]. Most papers considered surrogate end points (histological grade of differentiation, presence of LNM, etc.) as ground truth for model development, with only one considering OS. The study from Chaddad et al., (2021) divided patients into short-and long-term survivors with a set threshold [64]. Only two papers used an external dataset to validate their performance [65,66].

Treatment
Twenty-two studies use pre-treatment imaging to predict treatment response, with the majority of studies (17/22) focusing on patients diagnosed with resectable disease (Table 4). Eleven studies expressed treatment response by predicting OS, of which two (Healy et al., and Zhang et al.) validated the performance in an external cohort [79,80]. Six articles used deep learning (three with the same cohort), with the remaining 16 using radiomics.

Treatment Monitoring
We found two publications regarding treatment evaluation and no publications for follow-up. The study by Janssen et al. (2021) takes a step in the direction of more objective and reproducible TRG systems for patients undergoing nCTx by automatically segmenting relevant structures on WSI of resection specimens [100]. The authors used a cohort of 64 specimens and achieved F1-scores of 0.86 ± 0.09, 0.74 ± 0.12, and 0.86 ± 0.07 for the segmentation of tumour, normal ducts, and remaining non-tumour epithelium, respectively. Nasief et al. (2019) proposed an AI model based on delta radiomics from daily longitudinal scans to predict response to neoadjuvant chemoradiation therapy [101]. This study included 90 patients, divided into good and poor responders based on a modified Ryan Scheme for histopathology-based TRG, and the model achieved an AUC of 0.98 in the independent test set (40 patients) [101].

Discussion
Clinically relevant AI is developed to assist, replace, or go beyond clinicians' knowledge on solving problems that affect patient outcomes. AI can significantly impact healthcare by leveraging big data, especially in neglected diseases such as PDAC, but it is essential that research is performed at high-quality standards and focuses on clinical validity, utility, and usability [10].
There are critical steps in the PDAC patient pathway where clinical guidelines are still lacking. In this review, such moments and subsequent opportunities for AI research are identified in consensus by a consortium of radiologists, pathologists, and AI experts from multiple international institutions. We propose that for radiology and pathology AI to advance PDAC care, future research should focus on early diagnosis, data-driven tumour characterisation, survival-based patient staging, treatment response prediction, and monitoring.
Early detection, arguably the most pressing issue in PDAC management, is closely linked to identifying small lesions and secondary anatomical signs [102]. However, our results show this is still not considered in AI-based detection research, as there are no studies on pre-diagnostic detection of secondary signs, and most studies do not disaggregate performance based on tumour size/stage. Additionally, there is a lack of research on lesion localization and a general absence of well-curated datasets, with positive and negative cases being retrieved from completely different populations, which does not reflect the clinical landscape and can introduce bias. For AI to improve PDAC detection, it is crucial to acquire and make publicly available well-curated, multimodal datasets that contain a significant proportion of small (<2 cm or even <1 cm) tumours, which should be treated as a subgroup of interest when reporting model performance.
Current research separates detection, which is defined as distinction between PDAC patients and healthy controls, from differential diagnosis, defined as distinction between PDAC and other types of pancreatic lesions. Only one study developed AI for simultaneous detection and characterisation of pancreatic lesions on CECT [46]. The remaining publications focused on binary distinction between PDAC and one other malignancy, limiting the proposed models' clinical use. Furthermore, it is important to consider that PDAC diagnosis currently relies on high-quality, adequate imaging with multi-phasic scanning protocols, which may not be widely available due to resource limitations. In the future, research should strive towards a single-use case for radiology-based AI in PDAC diagnosis that includes both the detection of a lesion and its correct classification among a variety of pancreatic diseases in accessible, standard-of-care imaging. The current priority is the curation of large datasets with representative percentages of each lesion type and the integration of different imaging modalities that offer complementary information regarding lesion characterisation.
Research in AI for histopathological PDAC diagnosis is scarce. Only three publications were found to address this topic. While histopathology is considered the gold standard for confirming PDAC diagnosis, it is a time-consuming process that suffers from non-uniform implementation in clinical practice and interobserver variability. Developing powerful AI models for histopathological PDAC diagnosis is fundamental to advance AI research at all steps of the patient pathway. Such models would optimise clinical workflows and empower the generation of reliable ground truth, which could be employed to develop AI with other (non-invasive) modalities, in a timely and cost-effective manner. AI for PDAC staging lacks a solid reference standard. TNM staging and histopathological grade do not correlate sufficiently with OS and suffer from inter-reader variability. Yet, most AI publications (12/13) focused on grade differentiation and LNM prediction. Only one study considered OS as the outcome, dividing patients into short-and long-term survival based on a threshold derived from the development cohort [64]. In the absence of an international consensus that relates surrogate endpoints to survival, AI research using clinically obtained low-and high-grade differentiation and predicting LNM is not clinically relevant. Future AI research should focus on discovering new data-driven staging biomarkers that relate histopathology and imaging to OS.
AI research for treatment response prediction disproportionally focuses on postsurgery patient outcome, with most included papers considering only resectable patients. Given that 80-85% of patients are diagnosed with unresectable disease, AI research on prediction of response to resection will have a minor impact on improving overall PDAC patients' outcomes [103]. Instead, research efforts should focus on later disease stages, predicting response to (neo-)adjuvant/palliative chemo(radio)therapy. While most papers addressing treatment response considered survival as the outcome measure, there were also publications that aimed at predicting resection margin status and histopathologybased treatment effect. AI should not focus on these endpoints as they do not accurately reflect whether a patient responds to treatment. There were no publications considering multiple treatment options, with all studies focusing on the prediction of response to a single treatment regime. Future AI research should consider multiple treatment options for a given patient, providing the most favourable suggestion based on survival as the outcome measure.
AI research for treatment monitoring is lagging behind, as to date only two publications considered post-treatment imaging to evaluate response. The results from a segmentation network approach ) are promising, but they were not validated externally, and further research is necessary to integrate this AI tool into a reliable TRS system for neoadjuvant chemotherapy [100]. Nasief et al. (2019) used longitudinal scans to monitor chemoradiotherapy response, but the authors considered the histopathology-based treatment response as ground truth [101]. Clinically relevant AI applications should directly predict OS and recurrence from large, well-curated radiology and pathology datasets. Additionally, AI algorithms for treatment monitoring should strive to assist clinicians by indicating the best action at a given time-point, such as timely termination of treatment to prevent unnecessary comorbidities, selecting re-staging time points, adjusting the treatment regime, or choosing the optimal schedule for long-term patient follow-up.
Overall, four main research agenda topics emerge from this comprehensive literature review for clinical image-based AI in PDAC (Table 5). First, there is an urgent need for more and good quality data. Large, well-curated, multi-institutional private and public PDAC datasets are essential for AI development and testing. This allows deep neural networks to extract powerful predictive and diagnostic biomarkers that generalise well in multiple cohorts. Second, easily accessible radiomics AI still dominates the field, with comparatively little work on much more powerful deep learning CNNs. As data availability and quality increases, research should focus on developing models that are entirely and exclusively data-driven. Third, the entire research field needs to globally shift to using better-quality ground truths that represent actual clinical endpoints (such as OS and disease-free survival) as the gold standard for model development. Clinical guideline parameters such as TNM staging, histopathology-based tumour response scores, margin status, and RECIST are hardly predictive of patient outcomes and should not be considered a valid outcome for AI model development. AI in PDAC should improve current clinical workflow rather than replicate/automate existing ineffective practices. Finally, the realm of multimodal AI for PDAC remains unexplored. In a complex and heterogeneous disease such as PDAC, combining information from imaging, histopathology, genetics, and clinical records is crucial to discovering meaningful patterns in the data and building robust prediction models. AI-based imaging biomarkers that stratify PDAC phenotypes predictive of outcome can support individualised care, for instance through the development of pharmacogenomic treatment regimes.

-
To acquire more, good quality data coming from large, well-curated, multi-institutional private and public PDAC datasets -To switch focus towards state-of-the-art, entirely data-driven deep learning models - To use better quality ground truths that represent actual clinical endpoints such as overall survival and disease-free survival as the gold standard for model development - To investigate the use of multimodal AI, combining information from imaging, histopathology, genetics and clinical records

Conclusions
In conclusion, the future of AI in PDAC lies in addressing the relevant clinical questions, establishing multi-institutional collaborations for the curation of large-scale datasets, and integrating multiple data modalities. By putting forward these issues in the context of current image-based AI literature for PDAC, we hope to help advance meaningful research that will ultimately translate into the improvement of PDAC outcomes, by helping to select the best treatments, for the right patients, at the right time.

Web of Science
(TS = (("Pancreatic Neoplasms" OR "Carcinoma, Pancreatic Ductal" OR "Pancreatic Intraductal Neoplasms" OR (Pancrea* AND (Neoplasm* OR cancer* OR Carcinoma* OR Adenocarcinoma*)) OR PDAC))) AND TS = (("Artificial Intelligence"OR AI OR CNN OR Convnet OR "Deep Learning" OR "Machine learning" OR "Neural network*" OR pathomic* OR radiomic* OR "supervised Learning" OR "Transfer Learning" OR Unet OR "unsupervised Learning")) Two independent reviewers (M.S. and N.A.) screened the titles and subsequently reviewed all full-text articles based on preselected variables to compare results. A third independent expert reviewer (P.V.) reviewed the full-text articles for digital pathology imaging. Conflicting evaluations were resolved by discussions between the three reviewers.
Inclusion criteria and corresponding number of records excluded at each step (n) were the following: Full-read criteria: (1) Study was published in a peer-reviewed journal and was not an abstract, review paper, conference paper, commentary, editorial, or not available. (n = 293) (2) Study considered patients clinically diagnosed with pancreatic cancer (pancreatic ductal adenocarcinoma or non-specified type of pancreatic cancer). (n = 343) (3) Study used AI to predict an outcome related to one of the following tasks (n = 289): • Detection: determine the presence or absence of PDAC in an input image with or without localization of the tumours. A data selection template was designed to evaluate research characteristics and performance. All variables are evaluated independently by the three reviewers (M.S., N.A., and P.V.).
The template included the following fields: •