Assessing PD-L1 Expression Status Using Radiomic Features from Contrast-Enhanced Breast MRI in Breast Cancer Patients: Initial Results

Simple Summary To our knowledge, this is the first study assessing radiomics coupled with machine learning from MRI-derived features to predict PD-L1 expression status in biopsy-proven triple negative breast cancers and comparing the performance of this approach with the performance of qualitative assessment by two radiologists. This pilot study shows that radiomics analysis coupled with machine learning of DCE-MRI is a promising approach to derive prognostic and predictive information and to select patients who could benefit from anti-PD-1/PD-L1 treatment. This technique could also be used to monitor PD-L1 expression, as it can vary over time and between different regions of the tumor, thus avoiding repeated biopsies. Abstract The purpose of this retrospective study was to assess whether radiomics analysis coupled with machine learning (ML) based on standard-of-care dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) can predict PD-L1 expression status in patients with triple negative breast cancer, and to compare the performance of this approach with radiologist review. Patients with biopsy-proven triple negative breast cancer who underwent pre-treatment breast MRI and whose PD-L1 status was available were included. Following 3D tumor segmentation and extraction of radiomic features, radiomic features with significant differences between PD-L1+ and PD-L1− patients were determined, and a final predictive model to predict PD-L1 status was developed using a coarse decision tree and five-fold cross-validation. Separately, all lesions were qualitatively assessed by two radiologists independently according to the BI-RADS lexicon. Of 62 women (mean age 47, range 31–81), 27 had PD-L1− tumors and 35 had PD-L1+ tumors. The final radiomics model to predict PD-L1 status utilized three MRI parameters, i.e., variance (FO), run length variance (RLM), and large zone low grey level emphasis (LZLGLE), for a sensitivity of 90.7%, specificity of 85.1%, and diagnostic accuracy of 88.2%. There were no significant associations between qualitative assessed DCE-MRI imaging features and PD-L1 status. Thus, radiomics analysis coupled with ML based on standard-of-care DCE-MRI is a promising approach to derive prognostic and predictive information and to select patients who could benefit from anti-PD-1/PD-L1 treatment.


Introduction
In the last decade, immunotherapy has emerged as a key player in the field of oncology, with encouraging results seen in cancers such as melanoma, renal cell carcinoma, nonsmall-cell lung cancer, and bladder cancer [1][2][3]. The identification of immunity-related characteristics has also been extended to breast cancer, wherein the presence of tumorinfiltrating lymphocytes (TILs) in breast tumors has been associated with increased survival and complete response rates after neoadjuvant chemotherapy [4][5][6][7].
The programmed cell death 1 receptor/programmed cell death 1 ligand 1 (PD-1/PD-L1) pathway is responsible for maintaining peripheral tolerance and regulating inflammation. An active antitumoral immune response present in the tumor microenvironment can be regulated via negative feedback by the overexpression of PD-L1. PD-L1 is a transmembrane protein that interacts with PD-1 expressed on activated T cells, causing the inhibition of T cell proliferation, cytokine production, and cytolytic activity, consequently leading to the functional inactivation of T cells [8][9][10].
While tumoral overexpression of PD-L1 has a negative influence on survival and treatment response by decreasing the anti-tumoral inflammatory response, the expression of PD-L1 on immune cells and the re-activation of the host's antitumor immune response through immunotherapy targeting PD-L1 can improve patient outcomes. Breast cancer subtypes, such as triple negative and HER2-positive breast cancers, can overexpress PD-L1 on either breast cancer cells or on TILs. In patients with triple negative breast cancer in particular, the expression of PD-L1 mainly occurs on tumor-infiltrating immune cells rather than on tumor cells [11,12]. Recent trials assessing immunotherapy via PD-L1 blockade in patients with triple negative breast cancer, with the aim of re-activating the host's antitumor immune response, have shown promising results [13][14][15]. Several other studies have reported that PD-L1 polymorphisms not only significantly influence the breast cancer stage, but also the effectiveness of chemotherapy and overall and disease-free survival after tumor resection [11,14,16].
PD-L1 status is currently assessed using Food and Drug Administration (FDA)approved companion diagnostic immunohistochemical (IHC) assays. While IHC is the gold standard, it has several limitations. First, inter-test heterogeneity in PDL-1 IHC assessment has been reported due to the differences in the type of antibody used, scoring algorithms, and cut-off points [16]. Second, the expression of PD-1 and PD-L1 can vary over time and between different regions of interest [17]. Third, IHC staining results are usually derived from samples from biopsy or surgical specimens, which may only be a snapshot of a potentially heterogenous tumor [18,19]. Therefore, the development of imaging biomarkers that can enable non-invasive and repeatable assessment of PD-L1 expression derived from the entire tumor is desirable.
High-throughput computational approaches such as advanced machine learning (ML) that can extract numerous characteristics from images of the entire tumor, whether or not they are detectable by the human eye, and that correlate these characteristics with molecular profiles, have heralded a new field of research termed "radiomics". Radiomic biomarkers have not only demonstrated strong prognostic performance in large cohorts of patients across several cancer types but have also been associated with underlying mutation and gene-expression patterns [20,21]. Radiomic biomarkers can be used for multiple purposes, such as treatment planning [22][23][24][25][26], risk assessment [27,28], and outcome prediction [29][30][31][32][33].
Initial results involving radiomics based on positron emission tomography/computed tomography (PET/CT) or CT imaging to predict PD-L1 expression status in non-small cell lung cancer, oesophageal cancer, and pancreatic cancer have shown promising results [34][35][36]. More recently, there has been a rapid rise in interest in the assessment of PD-L1 expression in breast cancer, especially in aggressive subtypes such as triple negative breast cancer [37,38], where PD-L1 is expressed in up to 20% of cases [12] and may thus potentially serve as a target for PD-1/PD-L1 pathway-directed immunotherapy. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is the most sensitive and accurate test for breast cancer diagnosis, characterization, and response assessment. Artificial intelligence (AI)-enhanced DCE-MRI has shown potential to further improve molecular breast cancer subtyping, treatment planning, and treatment monitoring [39]. To date, the potential of radiomics analysis coupled with ML based on DCE-MRI for the prediction of PD-L1 expression status in triple negative breast cancer has not been explored. Thus, the aim of this study was to assess whether radiomics analysis coupled with ML based on standard-of-care DCE-MRI can predict PD-L1 expression status in patients with triple negative breast cancer, and to compare the performance of this approach with radiologist review.

Patient Selection
This was an institutional review board-approved and Health Insurance Portability and Accountability Act-compliant retrospective study for which the need for written informed consent was waived. The study sample consisted of patients with biopsy-proven breast cancer with PD-L1 status assessed using the Ventana PD-L1 SP142 assay who underwent state-of-the-art multiparametric MRI with DCE and T2-weighted imaging using a dedicated breast coil, either at our institution or elsewhere. Breast biopsy was either performed with a vacuum-assisted device or in the form of a spring-loaded core needle biopsy using either ultrasound, stereotactic, or MRI guidance. Multiple samples were taken for each biopsy (usually 4-5 for core needle biopsy and 6-9 for vacuum-assisted biopsy).
Of 156 consecutive patients with PD-L1 SP142 assay assessment, 91 patients were excluded because they did not have a preoperative MRI. Of 65 patients with IHC-proven PD-L1 status and preoperative MRI, three patients were further excluded: one for significant motion between pre-contrast and post-contrast data, one whose MRI protocol was incomplete (missing pre-contrast phase), and one whose study had pre-and post-contrast phases acquired at different matrix sizes. Of the 62 patients included in the final analysis, 39 (63%) had MRI studies performed at our institution and 23 (37%) had studies performed elsewhere.

Breast Imaging Acquisition
Breast MRI examinations were performed on either a 1.5 T (32/62 patients, 52%) or a 3 T scanner (30/62 patients, 48%) using an 8-channel or 16-channel dedicated surface breast coil. Patients underwent a state-of-the-art breast multiparametric MRI protocol in line with international guidelines [40][41][42]. The imaging protocol is summarized in Table 1.

Qualitative Imaging Assessment
Two fellowship-trained breast radiologists (R1: RLG, and R2: JR) with six and seven years of experience, respectively, interpreted the MR images independently, blinded to IHC results.
On post-contrast T1-weighted images, lesion depth (anterior, middle, or posterior depth) was recorded for each lesion, as this has been shown to be correlated with malignancy [43]. Qualitative image assessment was performed according to the Breast Imaging-Reporting and Data System (BI-RADS) lexicon; specifically, lesion shape, margin, and internal enhancement characteristics were assessed for mass lesions, and distribution and type of enhancement were assessed for non-mass enhancements [42]. Lesion size was measured on cranio-caudal, antero-posterior, and latero-medial planes. On T2-weighted images, signal intensity (hypo-, iso-, hyperintense) was recorded. Background parenchymal enhancement and fibroglandular tissue of the contralateral breast were also assessed using maximum intensity projection images and non-fat saturated T1-weighted images, respectively. Peritumoral oedema and paraseptal oedema (considered as diffuse trabecular oedema) were assessed on T2-weighted images as present or absent. Skin invasion was assessed on contrast enhanced T1-weighted images. The axilla was assessed for pathologically enlarged lymph nodes with suspicious morphology including the loss of fatty hilum on T1 non-fat saturated imaging.

Histopathologic Assessment of PD-L1 Expression Status
The VENTANA PD-L1 (SP142) IHC assay was used for the assessment of PD-L1 on formalin-fixed, paraffin-embedded tissue sections from the primary breast cancer or from soft tissue/chest wall recurrence in three patients with a history of ipsilateral mastectomy with reconstruction. PD-L1 positivity was defined as PD-L1 staining in tumor-infiltrating immune cells occupying ≥1% of the tumor area [44] (Figure 1).

Radiomics Analysis
Digital Imaging and Communications in Medicine (DICOM) images from DCE-MRI and non-contrast T1-weighted imaging were loaded into the open-source image processing tool OsiriX. 3D image segmentation was performed to include the entire enhancing biopsy-proven malignant lesion using the ITK-SNAP software (License: GNU General Public License. 2004) [45]. Semi-automatic tumor segmentation was performed using the soft thresholding mode with the soft binary threshold function, using user-supplied upper and lower threshold values selected by R1 [46]. Since 3D segmentation of the whole tumor was performed, previously biopsied areas were by default included in the whole-tumor ROI. All 3D segmentations were first delineated automatically by means of threshold or clustering, and consequently manually corrected by R1 (Figure 2). Radiomic features extracted from the tumor region of interest (ROI) were calculated automatically.

Radiomics Analysis
Digital Imaging and Communications in Medicine (DICOM) images from DCE-MRI and non-contrast T1-weighted imaging were loaded into the open-source image processing tool OsiriX. 3D image segmentation was performed to include the entire enhancing biopsyproven malignant lesion using the ITK-SNAP software (License: GNU General Public License. 2004) [45]. Semi-automatic tumor segmentation was performed using the soft thresholding mode with the soft binary threshold function, using user-supplied upper and lower threshold values selected by R1 [46]. Since 3D segmentation of the whole tumor was performed, previously biopsied areas were by default included in the whole-tumor ROI. All 3D segmentations were first delineated automatically by means of threshold or clustering, and consequently manually corrected by R1 (Figure 2). Radiomic features extracted from the tumor region of interest (ROI) were calculated automatically.

Radiomics Analysis
Digital Imaging and Communications in Medicine (DICOM) images from DCE-MRI and non-contrast T1-weighted imaging were loaded into the open-source image processing tool OsiriX. 3D image segmentation was performed to include the entire enhancing biopsy-proven malignant lesion using the ITK-SNAP software (License: GNU General Public License. 2004) [45]. Semi-automatic tumor segmentation was performed using the soft thresholding mode with the soft binary threshold function, using user-supplied upper and lower threshold values selected by R1 [46]. Since 3D segmentation of the whole tumor was performed, previously biopsied areas were by default included in the whole-tumor ROI. All 3D segmentations were first delineated automatically by means of threshold or clustering, and consequently manually corrected by R1 (Figure 2). Radiomic features extracted from the tumor region of interest (ROI) were calculated automatically. Relative enhancement maps (% increase in signal from Preto Post-Contrast) were calculated as: Pre Contrast where the post-contrast data were taken as the phase nearest to 2 min post-contrast. Enhancement maps were then reduced to 32 grey levels prior to radiomic feature calculations. Radiomic features were calculated using CERR [47]. One hundred and one features were calculated in six classes (22 first-order, 26 based on grey-level cooccurrence matrices, 16 based on run length matrices, 16 based on size zone matrices, 16 based on neighborhood grey level dependence matrices, and 5 based on neighborhood grey tone difference matrices) (Supplementary Table S1).
To account for potential differences in enhancement characteristics between 1.5 T and 3.0 T scans, ComBat harmonization [48] was performed on the radiomic features to remove site effects (in this study, site effects were due to the two different magnetic field strengths). ComBat harmonization is well-established and has been shown to perform well at removing unwanted inter-site variations [49]. In our study, parametric prior distributions were assumed, and the batch size was 2 (reflecting the two different magnetic field strengths of 1.5 T and 3 T). ComBat uses an empirical Bayes framework to improve the variance of additive and multiplicative site-dependent effects. It estimates a statistical distribution for the parameters of interest by assuming that the data share the same common distribution. Therefore, in our study, information from all cases was used to inform the statistical properties of field strength effects. Harmonization is successful if it removes the noise associated with field strength while maintaining the desired biological variation.
Univariable analysis was then employed to determine features with significant differences between the two groups. The number of features forwarded to model development was then reduced by utilizing correlation analysis. For any pair of parameters that were highly positively correlated (r > 0.9) or highly negatively correlated (r < −0.9), the parameter with the lowest area under the receiver operating curve (AUROC) was removed. A predictive model was then developed in MATLAB (MathWorks, Natick, MA, USA) using a coarse decision tree and five-fold cross-validation. A coarse decision tree was employed since decision trees are known to be easy to read and interpret. The maximum number of Cancers 2021, 13, 6273 6 of 13 splits employed was limited to four to reduce the possibility of overparameterization. This process was then repeated 1000 times to provide final diagnostic metrics.

Statistical Analysis
Statistical analysis was conducted using SAS (version 9.4, SAS Institute, Cary, NC, USA). Continuous variables were summarized using means (±standard deviation) and medians (range); categorical variables were summarized using proportions. Univariable analysis using the Chi-square test or Fisher's exact test was performed to assess associations between the qualitatively assessed imaging parameters with PD-L1 expression status (positive vs. negative). The Wilcoxon rank-sum test was used to assess associations between continuous variables and PD-L1 expression status. p-values < 0.05 were considered significant. To determine inter-reader agreement, weighted Cohen's κ was used to assess ordinal parameters, while simple Cohen's κ was used to assess the inter-reader agreement for nominal parameters.

Radiomics Analysis and ML
At univariable analysis, only five (out of 101) radiomic features were significantly different between PD-L1− and PD-L1+ patients: variance (from the first order class), run length variance (from the run length matrix class), large zone emphasis (from the size zone matrix class), large zone low grey level emphasis (from the size zone matrix class), and zone level variance (from the size zone matrix class) ( Table 2). Table 3 demonstrates strong correlations between large zone emphasis, large zone low grey level emphasis, and zone level variance; thus, utilizing univariable area under the receiver operating curve (AUROC) values, only large zone low grey level emphasis was retained from large zone emphasis, large zone low grey level emphasis, and zone level variance. The final coarse decision tree model therefore included three parameters: variance (from the first order class), run length variance (from the run length matrix class), and large zone low grey level emphasis (from the size zone matrix class). The final model reached a sensitivity of 90.7%, specificity of 85.1%, positive predictive value of 88.8%, negative predictive value of 87.8%, and accuracy in prediction of PD-L1 status of 88.2% (Table 4). By extracting the three stated radiomics features after identical pre-processing and employing the developed decision tree, we were able to assess the validity of the final model.

Quantitative Assessment Using BI-RADS
No significant associations were found between any qualitatively assessed lesion parameters and PD-L1 expression status. Table 5 shows the results from univariable analysis according to qualitative assessments by the two radiologists (R1 and R2), with p-values ranging from 0.08 to >0.9. Examples are reported in Figure 3. Supplementary  Table S2 shows the inter-reader agreement between R1 and R2. For margin assessment, while there was agreement for 44/62 patients, the κ value was only 0.24.

Discussion
In this feasibility study, we examined whether radiomics analysis coupled with ML based on standard-of-care pre-treatment DCE-MRI can predict PD-L1 status as assessed with PD-L1 SP142 in patients with triple negative breast cancer, and we compared this approach with qualitative radiologist assessment. Results show that DCE-MRI-based ra-

Discussion
In this feasibility study, we examined whether radiomics analysis coupled with ML based on standard-of-care pre-treatment DCE-MRI can predict PD-L1 status as assessed with PD-L1 SP142 in patients with triple negative breast cancer, and we compared this approach with qualitative radiologist assessment. Results show that DCE-MRI-based radiomics analysis coupled with ML was promising; the final predictive model yielded an accuracy in the prediction of PD-L1 status of 88.2%, sensitivity of 90.7, specificity of 85.1, PPV of 88.8, and NPV of 87.8. On the other hand, no significant associations were found between any of the qualitatively assessed features on DCE-MRI and PD-L1 expression (p-values ranging from 0.08 to >0.9).
In the pre-treatment setting, the non-invasive and accurate assessment of PD-L1+ TILs can be valuable to provide predictive information with respect to the amenability of triple negative breast cancer to cytotoxic chemotherapy. Several trials have also demonstrated promising results in breast cancer treatment with immunotherapy targeting PD-L1 blockade [13][14][15]. In addition, PD-L1 status was promising to predict prognosis, as PD-L1+ triple negative breast cancers were associated with significantly better progression-free survival compared with PD-L1− cancers (progression-free survival of 7.2 months vs. 5.5 months).
Currently, PD-L1 status is assessed using FDA-approved companion diagnostic IHC assays. However, it remains desirable to develop imaging biomarkers that can be used to assess PD-L1 expression of the whole tumor non-invasively and at repeated timepoints. Radiomics analysis is non-invasive, relatively fast and easy to perform, and potentially more easily standardized than IHC assays [21]. A robust, validated radiomic-based signature may be used in the future to predict PD-L1 status rapidly after a routine clinical imaging examination. Radiomics assessment can also be repeatedly performed throughout the duration of treatment, without requiring additional invasive testing procedures to obtain tumor tissue from surgery or biopsy, thereby reflecting dynamic PD-L1 expression for treatment stratification and for monitoring the therapeutic efficacy of checkpoint inhibitors to potentially instruct drug replacement and discontinuation. For this reason, we aimed to assess the PD-L1 expression rate through a radiomics-based model in this study.
Recently, a radiomics-based biomarker (Rad score) was shown to predict the presence of tumor-infiltrating CD8 +TILs in hepatocellular carcinoma and to identify potential candidates for immunotherapy [50]. A study by Wen et al. [34] showed moderate performance for PD-L1 prediction in 220 patients with oesophageal cancer using PET/CT data, with an AUC of 0.692 for the validation set. A study by Jiang at al. [35] in a larger cohort of 399 patients with stage I-IV non-small cell lung cancer assessed ML models from CT-, PET-, and combined PET/CT-derived features to predict PD-L1 assessment, showing that CT-derived features achieved the highest predicting efficacy in discriminating PD-L1 expression level higher than either 1% or 50%. As for predictive models based on PET-derived features, the prediction of outcomes was better than a random guess but significantly worse than CT-derived predicting models. When combined with CT-derived features, PET-derived features showed little improvement in distinguishing PD-L1 expression level over 1%, resulting in AUCs of 0.86, 0.62, and 0.85, respectively, for CT, PET, and PET/CT-derived features. With regards to the prediction of strong PD-L1 expression levels (higher than 50%), the AUC score was improved to 0.91, 0.75, and 0.88 for features derived from CT, PET, and PET/CT, respectively. A study by Iwatate et al. [36] in 107 patients with pancreatic cancer showed that PD-L1 predictive models built with imaging features extracted from CT scans demonstrated an AUC of 0.683, sensitivity of 0.417, and specificity of 0.93. Lastly, a study by Yoon et al. [51], whose aim was to assess PD-L1 expression in 153 patients with advanced stage lung adenocarcinoma, found that quantitative CT radiomic features helped predict PD-L1 expression, whereas none of the qualitative imaging findings were associated with PD-L1 expression. The AUC of the Rad-score used to predict PD-L1 expression was 0.667. The results from our study and from others highlight the necessity for the implementation of AI-enhanced imaging to derive better clinical value and to shape the way we care for our patients towards the goal of achieving precision medicine in breast cancer.
Our results show that radiomics analysis and ML can quantify tumor phenotypical differences and provide non-invasive predictive information with regards to PD-L1 expression. When compared with qualitative assessment performed by two experienced radiologists, radiomics coupled with ML was superior to predict PD-L1 positive or negative status. Inter-reader agreement between the two radiologists was low, probably due to the low number of lesions with circumscribed or spiculated margins (seven for R1 and eight for R2) as compared with lesions with irregular margins that were predominant. Our results also show overall better performance of our ML model in the prediction of PD-L1 status compared with previous studies, probably due to the fact that radiomics features were extracted from MRI images rather than PET/CT or CT images in our study. Breast MRI is the most sensitive modality for breast cancer detection [52] and provides not only excellent morphologic information due to higher tissue contrast but also functional information related to vascularization with dynamic imaging; moreover, it is probably the imaging modality for which data for AI studies on breast cancer is most commonly available [39].
This study has several limitations. First, its retrospective nature and relatively small patient cohort precluded a separate validation cohort. However, this is the first imaging study to address this question of the feasibility of radiomics based on pre-treatment standard-of-care DCE-MRI to predict PD-L1 status in patients with triple negative breast cancer. Second, PD-L1 status was assessed using only one PD-L1 IHC assay, namely PD-L1 SP142. Therefore, the results from this study, albeit promising, must be validated in larger prospective studies with different available PD-L1 IHC assays. Another limitation is that ROIs for the radiomics analysis were obtained once. However, it should be noted that ROIs were acquired with semi-automatic segmentation using the soft thresholding mode in ITK-SNAP (license: GNU General Public License. 2004) [45] and consequently they were only manually corrected by R1 if necessary, which rendered this analysis less operator-dependent.

Conclusions
In conclusion, this feasibility study demonstrated that radiomics analysis coupled with ML based on standard-or-care DCC-MRI can predict PD-L1 status, and thus may be used as a possible alternative to IHC assays to identify patients who could benefit from anti-PD-1/PD-L1 treatment. Meanwhile, the use of standard, qualitatively assessed BI-RADS descriptive characteristics was not predictive of PD-L1 status.