Development of High-Resolution Dedicated PET-Based Radiomics Machine Learning Model to Predict Axillary Lymph Node Status in Early-Stage Breast Cancer

Simple Summary Accurate clinical axillary evaluation plays an important role in the diagnosis of and treatment planning for breast cancer (BC). This study aimed to develop a machine learning model integrating dedicated breast PET and clinical characteristics for prediction of axillary lymph node status in cT1-2N0-1M0 BC non-invasively. The performance of this integrating model in identifying pN0 and pN1 with the AUC was 0.94. We achieved an NPV of 96.88% in the cN0 and PPV of 92.73% in the cN1 subgroup. The higher true positive and true negative rate could delineate clinical subtypes and apply more precise treatment for patients with early-stage BC. Abstract Purpose of the Report: Accurate clinical axillary evaluation plays an important role in the diagnosis and treatment planning for early-stage breast cancer (BC). This study aimed to develop a scalable, non-invasive and robust machine learning model for predicting of the pathological node status using dedicated-PET integrating the clinical characteristics in early-stage BC. Materials and Methods: A total of 420 BC patients confirmed by postoperative pathology were retrospectively analyzed. 18F-fluorodeoxyglucose (18F-FDG) Mammi-PET, ultrasound, physical examination, Lymph-PET, and clinical characteristics were analyzed. The least absolute shrinkage and selection operator (LASSO) regression analysis were used in developing prediction models. The characteristic curve (ROC) of the area under receiver-operator (AUC) and DeLong test were used to evaluate and compare the performance of the models. The clinical utility of the models was determined via decision curve analysis (DCA). Then, a nomogram was developed based on the model with the best predictive efficiency and clinical utility and was validated using the calibration plots. Results: A total of 290 patients were enrolled in this study. The AUC of the integrated model diagnosed performance was 0.94 (95% confidence interval (CI), 0.91–0.97) in the training set (n = 203) and 0.93 (95% CI, 0.88–0.99) in the validation set (n = 87) (both p < 0.05). In clinical N0 subgroup, the negative predictive value reached 96.88%, and in clinical N1 subgroup, the positive predictive value reached 92.73%. Conclusions: The use of a machine learning integrated model can greatly improve the true positive and true negative rate of identifying clinical axillary lymph node status in early-stage BC.


Introduction
The axillary lymph node (ALN) is the first station of breast lymphatic drainage [1]. Sentinel lymph node excision biopsy (SLNB) and surgical axillary lymph node dissection (ALND) are the gold standard for diagnosing pathological node status (pNx) in early-stage breast cancer (BC). However, both methods are invasive with the risk of pain, numbness, and lymphedema. In addition, a previous study showed that two-thirds of cN0 patients were diagnosed to be pN0 after SLNB [2], indicating that two-thirds of early-stage BC received overtreatment. Therefore, improving the true positive and true negative rate of clinical axillary lymph node evaluation can screen out the negative (surgical resection can be omitted), positive (ALND can be performed), and uncertain patients (SNLB can be performed); however, this may require an independent and objective tool to delineate subtypes and provide precise treatment [3]. 18 F-fluorodeoxyglucose ( 18 F-FDG) positron emission tomography (PET) can provide comprehensive functional information about tumors, such as heterogeneity, metabolism, aggressiveness, and proliferation. Newer dedicated PET (D-PET), including dedicated breast PET (Mammi-PET) and dedicated axillary lymph node PET (Lymph-PET), is an advanced screening method with a spatial resolution higher than of whole-body PET/computed tomography (CT) (WB-PET/CT). Since WB-PET/CT is performed with patients in the supine position, there is a collapse of breast volume and blurring owing to respiratory motion [4,5]. In contrast, Mammi-PET comprises a single ring detector that translates axially over the length of the breast; the prone position enables full-breast volume imaging by avoiding breast compression [6]. The Lymph-PET device contains movable double-planar confronted detectors with an axilla capability view for precisely detecting hot lesions.
Compared with tissue-based biomarker testing, algorithm-based medical imaging features have inherent advantages because of being real-time, non-invasive, independent of sampling bias, and not limited to the portion of tested tissue [7]. PET radiomics features provide a complementary tool to extract high-dimensional and valuable data, such as tumor heterogeneity and shape, from images; they may be used alone or in combination with demographic, histologic, or proteomic data for clinical problem solving [7].
This study aimed to develop a scalable, non-invasive and robust machine learning model for the prediction of pNx using D-PET radiomics integrating the clinical characteristics in early-stage BC. Furthermore, we validated the potential effectiveness of this model in cN0 and cN1 subtypes to provide a positive predictive value (PPV) for cN1 and a negative predictive value (NPV) for cN0 patients.

Patients
This prospective study was approved by the Institutional Ethics Committee. Informed consent was obtained from each patient before participation in the study. In this study, we enrolled women (age, ≥18 years) with newly diagnosed, histologically confirmed, unilateral invasive cT1-2N0-1M0 BC. Both the primary tumor and ALN status were assessed using ultrasound (US). Tumor staging was based on the eighth edition of the American Joint Committee on Cancer staging manual [8]. The pNx was based on the microscopic assessment of at least one lymph node that was sampled using fine-needle aspiration (FNA), SLNB, or ALND, and the clinical N category (cNx) was based on physical examination (PE) and US. The cN0 was defined as no regional lymph node metastases detected on PE and US, whereas the cN1 was defined as metastases to movable ipsilateral level II axillary lymph node(s) [8]. The patient recruitment process, exclusion criteria, and study workflow are presented in Figure 1. Patients were randomly divided into two sets with a ratio of 7 to 3: (1) training set, on which the best-fitting prediction models were built and tested, and (2) an internal validation set, on which performance and goodness of fit were assessed. This study conformed to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis reporting guidelines [9].

Conventional Examination and Evaluation
All patients underwent routine PE and US, D-PET, and core needle biopsy for diagnosing invasive primary carcinoma of the breast. The PE of lymph nodes was considered positive if, on inspection and palpation, a large ALN was found, and negative if no ALN could be palpated. US findings were considered positive if ALN (of any size) was detected and negative if not. The primary lesion was evaluated using US, and its long and short diameters were recorded; the site of the primary lesion was also recorded-central, medial, lateral, and diffuse distribution.

Surgery Procedure and Pathological Evaluation
For determining the hormone receptor (HR), human epidermal growth factor receptor 2 (HER2) expression, and Ki-67 proliferative index, immunohistochemistry (IHC) was performed on hematoxylin and eosin-stained sections of the tissue obtained from the primary tumor using core needle biopsy. The cutoff value for estrogen receptor (ER) positivity and progestogen receptor (PR) positivity was established at 10% of the tumor cells with positive nuclear staining. The HER2 status was considered positive if on IHC, the score was 3+ or if a score of 2+ on IHC was confirmed using fluorescence in situ hybridization (FISH). HER2 copy number >6.0 or HER2/CEP17 (chromosome enumeration probe-17) ratio >2.0 was defined as FISH positive [10].
For cN0 patients, ALN was pathologically assessed using SLNB alone or SLNB and ALND. For cN1, ALN was pathologically confirmed using FNA. If FNA was negative, SLNB was performed. If FNA was positive, neoadjuvant therapy was initiated. For patients with no more than two positive lymph nodes on SLNB, the decision whether or not to perform axillary dissection depended on the operation type (breast-conserving therapy or mastectomy) and individual pathological characteristics. Based on pathological examination, ALNs were classified as macro-metastasis (>2.0 mm), micro-metastasis (0.2-2.0 mm), and isolated tumor cells (ITCs, <0.2 mm) according to the tumor-node-metastasis staging system. It should be noted that both ITCs and micro-metastases were considered negative in the final statistical analysis.

D-PET Examination and Evaluation
All patients were advised to fast for at least 4 h before the procedure. They were injected with 110-130 MBq 18 F-FDG. Blood glucose levels <10 mmol/L were ensured in all patients. After a resting (tracer distribution) period of 60 min, Mammi-PET (Oncovision, Valencia, Spain) and Lymph-PET were performed sequentially.
For Mammi-PET acquisition, the patient was prone positioned on an imaging table with the breast hanging freely through an aperture in the table. The total acquisition time was 5-10 min/breast (depending on the breast length). When the bilateral breast imaging was completed, the bilateral axillary regions were scanned using Lymph-PET. For Lymph-PET, the patient sat down comfortably on a fixed chair with raised upper arm, which was supported by a dedicated bracket. The total acquisition time was 3 min/axilla.
To evaluate Lymph-PET images and quantify single-voxel maximum standard uptake value (SUVmax), commercial Medical Image Merge (version 6.5.4; MIM Software Inc., Beachwood, OH, USA), a professional image processing software certified by the United States Food and Drug Administration, was used. Two nuclear medicine physicians with 10 years of experience in PET/CT who were blinded to study-related information besides the laterality of BC analyzed the images separately. Elliptic-shaped region of interest was manually delineated, and 18 F-FDG uptake (SUVmax) was calculated in the delineated region of interest. The highest SUVmax was selected as the study value when multiple lymph nodes were detected. The SUVmax cutoff value of lymph node on Lymph-PET was set at 0.27 according to a previous study [11]. SUVmax of ≥0.27 was considered positive, while that of <0.27 was considered negative.

Tumor Segmentation
Tumor was visualized and segmented on the Mammi-PET images by the aforementioned experienced nuclear medicine physicians independently using PET Edge with the MIM software. PET Edge is a gradient-based semi-automatic contouring algorithm that uses the maximum spatial gradient to detect boundaries between the tumor and normal tissue, free of different reconstruction algorithms, imaging techniques, and sphere diameter effects. It is a more accurate, consistent, and robust method for contouring tumor volumes on PET images compared with methods using visual judgment and SUV threshold [12].

Quantitative Radiomics Feature Extraction
Quantitative radiomics features (n = 851) were extracted from tumor images using the Pyradiomics package and 3D Slicer image computing platform [13,14]. All radiomics features could be extracted from both original segmented images and wavelet filtered images, except for shape features, which were independent of intensity values and therefore could only be extracted from original images. The feature extraction and its definition were in accordance with the Imaging Biomarker Standardization Initiative [15].

Model Development and Validation
The least absolute shrinkage and selection operator (LASSO) logistic regression with 10-fold cross-validation via minimum criteria was performed to select optimal features for predicting ALN status in the training set [16,17]. The prediction models were developed by the multivariable regression with the Akaike's information criterion (AIC). They were then applied to differentiate pN0 from pN1 patients, and the prediction score (Pre-score) was calculated for each patient using the linear fusion of the selected non-zero features and their coefficients.
The performance of the prediction model was evaluated using the receiver operating characteristic (ROC) analysis and compared using the DeLong test in both the training and validation sets. The area under the curve (AUC) with 95% confidence interval (CI), sensitivity, specificity, accuracy, PPV, and NPV were calculated to assess model performance.
The clinical utility of the models was determined and compared using the decision curve analysis (DCA) and clinical impact curve (CIC). The DCA was used for quantifying the net benefit of the patient under different threshold probabilities in the queue, and the CIC was used for estimating the number of patients who would be declared high risk for each risk threshold by the Combined Model and demonstrating the proportion of true positive patients [18].

Nomogram Development and Validation
We developed an individualized nomogram based on the prediction model with the highest AUC and clinical utility to provide a visually quantitative tool for predicting the ALN status in early-stage BC patients in the training set [19]. Calibration curves, reflecting the agreement between the predicted probability of the nomogram and actual probability, were plotted using 1000 bootstrap resamples based on the internal (training set) and external (validation set) validity.

Statistical Analysis
Univariate and multivariate analyses were performed using R software (version 4.0, http://www.r-project.org accessed on 15 June 2021). The comparison between the two groups was performed using Fisher's exact test or χ 2 test for categorical variables and independent t-test or Mann-Whitney U test for continuous variables. A two-sided p < 0.05 indicated statistical significance. Intra-and inter-class correlation coefficients (ICCs) were used to evaluate the consistency and reproducibility of the intra-and inter-observer agreement of the radiomics features. An ICC value of >0.75 indicated good reliability.

Demographic and Clinicopathological Characteristics
We screened 420 patients with cT1-2N0-1M0 BC between 9 September 2019 and 30 September 2020. Finally, a total of 290 women (mean age, 50.46 ± 10.58 years; range, 28-78 years) with invasive lesions (invasive ductal carcinoma, 283; invasive lobular carcinoma, 7) were enrolled in this study. The patients' demographic and clinicopathologic characteristics were separately compared between the training (n = 203, 70.00%) and validation (n = 87, 30.00%) sets to identify potential diagnostic biomarkers for the ALN status (Table 1). There were no significant differences in primary tumor's site, location, ER level, and HER-2 status between the pN0 and pN1 groups in the univariate analysis (p > 0.05). The findings on Lymph-PET, US, and PE were significantly related to the ALN status in both the training and validation sets (p < 0.05).

Feature Selection and Model Development
A total of 851 radiomics features comprising shape (n = 14), first-order statistics (n = 18), texture (n = 75; 24 gray level co-occurrence matrix (GLCM) features, 14 gray level dependence matrix (GLDM) features, 16 gray level run length matrix (GLRLM) features, 16 gray level size zone matrix (GLSZM) features, and 5 neighboring gray tone difference matrix (NGTDM) features), and wavelet features obtained from the filters (H: High pass filter, L: Low pass filter) applied in the x, y, z directions (n = 744) were separately extracted from the tumor regions with increased 18 F-FDG uptake segmented by two nuclear medicine physicians, and minutely described in Table S1. Initially, 7 of the 12 clinicopathologic markers, 34 of the 851 radiomics features, and 19 of the 864 combined features (Lymph-PET finding, 1; clinicopathologic markers, 12; and radiomics features, 851) were separately selected by the LASSO regression ( Figure 2). Subsequently, three independent prediction models were developed using the most valuable 4 clinicopathologic markers, 10 Mammi-PET radiomics parameters, and 11 combined features selected by the multivariable regression with the AIC for differentiating pN0 from pN1 patients in the training set. For all prediction models, pN1 BC patients generally had higher pre-scores calculated using the following formulas than pN0 BC patients (p < 0.05) ( Figure 3, Table 2).    According to the DeLong test, the Combined Model, which comprised six clinicopathologic factors and five Mammi-PET radiomics parameters, showed the highest AUC, best predictive accuracy, and NPV among the three models in both the training set (AUC: 0.94, accuracy: 87.68%, NPV: 84.85%, p < 0.05) and validation set (AUC: 0.93, accuracy: 87.36, NPV: 93.18%, p < 0.05). The detailed statistical results of the models' performance in discriminating pN0 from pN1 patients are summarized in Table 3, and its corresponding ROCs are shown in Figure 4a,b. The DCA showed that the Combined Model was the most reliable and valuable tool to predict the ALN status when the ALN metastasis (ALNM) threshold probability was greater than 10% (Figure 4c). The CIC of the Combined Model presented the risk stratification in predicting 1000 people, including the estimated number of people who would be declared a high risk for ALN metastasis and true positive cases under each threshold probability ( Figure 4d). Furthermore, we validated the potential effectiveness of the Combined Model in the total dataset. Among 290 patients, 107 (36.90%) had cN0 and the remaining 183 (63.10%) had cN1 on the basis of baseline clinical and imaging data. The Combined Model was highly effective in identifying pN0 from cN0 patients with an AUC of 0.83 and an NPV of 96.88%, while identifying pN1 from cN1 patients with an AUC of 0.90 and a PPV of 92.73% (Figure 4e,f, Table 4).

Nomogram Development and Validation
With the results above, we developed an individualized nomogram using the Combined Model's risk features for visualization (Figure 5a). Then, the risk probability of ALNM for each patient could be calculated directly according to the nomogram. The optimal threshold to discriminate between pN0 and pN1 was 0.59. The calibration curves demonstrated a good agreement between the prediction probability of ALNM by the nomogram and the actual observation of ALN metastasis in both the training and validation sets (Figure 5b,c). The nomogram presented the ALNM probability of an early-stage breast cancer patients was 0.806, which was confirmed as pN1 stage by surgery. Calibration curves of nomogram in the training set (b) and validation set (c), respectively. The * represents. p value < 0.05, the ** represents. p value < 0.01, *** represents. p value < 0.001. The X-axis represents the predicted probability of ALNM estimated by nomogram, whereas the Y-axis represents the actual ALNM rates. The solid line represents the ideal reference line that predicted ALN status corresponds to the actual outcome, the short-dashed line represents the apparent prediction of nomogram, and the long-dashed line represents the ideal estimation. Calibration curves show that the actual probability corresponded closely to the prediction of nomogram.

Discussion
Routine primary lesion and ALN status assessment includes US, mammography (MMG), and magnetic resonance imaging (MRI). Although US remains one of the key tools for ALN assessing, it has the limitation of being subjective [20]. A systematic review analyzing the use of US showed significant variation among institutions, with overall sensitivity and specificity ranging between 26% and 76% and between 88% and 98%, respectively [21]. A more recent meta-analysis involving 21 studies found that the assessment of abnormal nodes using US had a median sensitivity and specificity of 64% and 82%, respectively [22]. Although MMG is suitable for examining breast tissue, it is not considered reliable for ALN evaluation because a part of the axillary area might not be visible on routine MMG [23]. MRI is used to assess newly diagnosed BC and examine the response to neoadjuvant treatment; however, it may also provide insufficient imaging of the axillary region [24]. Thus, the routine non-invasive techniques for assessing ALN status have a much lower accuracy for N0-1 identification in early-stage BC.
Recently, the WB-PET-based radiomics model showed promising results in predicting occult lymph node metastasis in cN0 tumors, including lung cancer, cervical cancer, and esophageal adenocarcinoma [25][26][27]. Hasan et al. analyzed the textural features of WB-PET/CT coming from 124 breast cancer patients and showed the gray-level zone length matrix (GLZLM) could be the predictive parameter of ALN, but the AUC was only 0.64 [28]. Similarly, in the study by Bong [29], the primary tumor of 100 invasive ductal BC patients was analyzed using the WB-PET/CT-based radiomics model to predict ALN status. The results showed that the AUC, sensitivity, specificity, and accuracy of the Radiomics Model in predicting ALN metastasis were 0.890, 90.9%, 71.4%, and 80%, respectively. These studies indicate that 18 F-FDG PET-based radiomics model-derived biomarkers can enable lymph node assessment that is non-invasive, repeatable, and independent of sampling bias. Furthermore, the newer dedicated breast PET has demonstrated higher spatial resolution and uptake sensitivity in lesions [30]. First, limited spatial resolution and partial volume effects constitute a challenge for small lesions. Owing to the lower post-reconstruction voxel resolution (4 × 4 × 4 mm) and the smaller breast tumor tissue size relative to the total field of view, the primary tumor in WB-PET comprises a relatively small fraction of the total voxel volume. Hence, Hatt et al. limited their analysis to metabolically active volumes >3 cm 3 [31]. In contrast, D-PET had a higher in-tumor resolution (1 × 1 × 1 mm post-reconstruction) that greatly decreased the threshold for tumor volume to 0.064 cm 3 [32]. Consequently, D-PET showed an accuracy comparable to MRI and improved sensitivity comparable to WB-PET for quantifying primary lesions [33]. Second, the overall 18 F-FDG uptake values (SUVmax, SUVmean, and SUVpeak) in the lesion were higher with D-PET than with WB-PET, indicating that with D-PET an image with a higher signal-noise ratio could be obtained and relatively lower active lesions could be detected. Finally, the improvement in spatial resolution highlighted the spatial heterogeneity within the primary breast tumor. The observed qualitative differences in spatial and signal intensity heterogeneity in D-PET may be largely driven by the higher voxel resolution and tumor tissue fraction. In a comparative study, spatial heterogeneity features showed statistically significant differences between D-PET and WB-PET [6]. The precise quantification of tumor heterogeneity may allow an accurate prediction of ALN metastasis. In summary, highresolution D-PET could detect smaller and lower active lesions, thereby making radiomics analysis more feasible and reliable.
In earlier studies, the AUC for predicting ALNM ranged from 0.90 to 0.92 when using MRI-based radiomics [34,35] and from 0.89 to 0.90 when using US [36,37]. A retrospective study analyzed US features of 1328 cT1-2N0 BC and established nomograms for ALNM prediction. The AUC of the prediction model and external validation group was 0.802 and 0.73, respectively [38]. Another study using deep learning algorithms based on US images established an ALNM prediction model and got AUC of 0.805 [39]. Other studies also attempted to establish more excellent prediction non-invasively using clinical feature, pathological type, molecular subtype, and radiological data. However, the AUCs were only about 0.74 to 0.83 [40][41][42]. Notably, this integrated model enabled an objective and unbiased assessment; it may help in the clinical stratification of lesions for better treatment planning. In the cN0 subgroup (n = 107), where avoiding invasive assessment of ALN could be beneficial, the NPV was 96.88%, thereby indicating that this algorithm may have the potential to screen out patients in whom axillary surgery can be avoided. To our knowledge, no non-invasive method could achieve such a high NPV. Meanwhile, in the cN1 subgroup (n = 183), where ALNs were assessed using US and PE, the biggest concern was to identify the true positive node. Using the integrated model, we achieved a high PPV of 92.73%. These encouraging results show that a machine learning integrated model based on radiomics could independently predict lymph node status, modify clinical decisions, or affect patient outcomes 'over and above' conventional approaches.
This study had some limitations. First, this study was conducted at a single center and had a retrospective design, which could have led to a selection bias. Second, we excluded patients with multifocal breast lesions and bilateral disease because it was difficult to determine the lesion that would lead to ALN metastasis. Third, although internal validation was performed in the test cohort, validation in an external cohort was required to evaluate the transferability of the radiomics model. In addition, further controlled prospective studies are necessary to refine the predictive accuracy of this integrated model.

Conclusions
In this study, we developed a machine learning integrated model based on radiomics of 18 F-FDG Mammi-PET, US, PE, Lymph-PET, and clinical characteristics for non-invasively identifying pNx of ALN in early-stage BC (cT1-2N0-1M0). The AUC was 0.94 (95% CI, 0.91-0.97). Using our integrated model, we achieved an NPV of 96.88% in the cN0 subgroup and a PPV of 92.73% in the cN1 subgroup. The use of the machine learning integrated model can greatly improve the true positive and true negative rate of identifying ALNM to delineate clinical subtypes and deliver precise treatment to patients with early-stage BC.