CANTO-RT: One of the Largest Prospective Multicenter Cohort of Early Breast Cancer Patients Treated with Radiotherapy including Full DICOM RT Data

Simple Summary Radiation therapy (RT) is one of the corner stones of the local treatment of breast cancer (BC). Toxicity factors related to RT and their consequences are poorly known because of limited DICOM data and limited analyses on contouring, dose distribution and the RT technique. This manuscript describes the methodology used and provides the first characterization of the study population and RT data in CANTO-RT (CANcer TOxicities RadioTherapy). To our knowledge, our study is the largest available multicenter prospective multicenter cohort of early breast cancer with full DICOM RT data (files (CT, RT Structure, RT Dose, RT Plan)). This study answers to a concern about toxicity factors related to radiotherapy and their consequences and aims to identify predictors of development and the persistence of long-term toxicities in breast cancer patients. Further long-term projects (heart, lung, skin, fatigue) and follow up is ongoing. Abstract This article describes the methodology used and provides a characterization of the study population in CANTO-RT (CANcer TOxicities RadioTherapy). CANTO (NCT01993498) is a prospective clinical cohort study including patients with stage I-III BC from 26 French cancer centers. Patients matching all CANTO inclusion and exclusion criteria who received RT in one of the 10 top recruiting CANTO centers were selected. Individual full DICOM RT files were collected, pseudo-anonymized, structured and analyzed on the CANTO-RT/UNITRAD web platform. CANTO-RT included 3875 BC patients with a median follow-up of 64 months. Among the 3797 patients with unilateral RT, 3065 (80.4%) had breast-conserving surgery, and 2712 (71.5%) had sentinel node surgery. Tumor bed boost was delivered in 2658 patients (68.5%) and lymph node RT in 1356 patients (35%), including internal mammary chain in 844 patients (21.8%). Most patients (3691 (95.3%)) were treated with 3D conformal RT. Target volumes, organs at risk contours and dose/volume histograms were extracted after quality-control procedures. CANTO-RT is one of the largest early BC prospective cohorts with full individual clinical, biological, imaging and DICOM RT data available. It is a valuable resource for the identification and validation of clinical and dosimetric predictive factors of RT and multimodal treatment-related toxicities.


Introduction
Breast cancer (BC) is the leading cancer in women throughout the world. with 2.3 million new cases diagnosed and 685,000 related deaths in 2020 [1]. Efforts over the last two decades to reduce breast cancer mortality focus on early detection and treatment [2]. About 80% of breast cancer patients can expect long-term disease-free survival. In industrialized countries, about 5 million women live with a history of breast cancer and are at risk of facing treatment for long-term toxicity. Post-cancer is therefore an important part of their lives [3]. It has become a priority to reduce treatment-related toxicities in the management of breast cancer patients. Radiation therapy (RT) is one of the corner stones of local treatment of BC. Various meta-analyses of long-term follow-up have demonstrated an overall survival benefit from radiotherapy (RT) [4,5]. However, toxicity factors related to RT and their consequences are poorly known because of limited DICOM (Digital Imaging and Communications in Medicine) data with limited analyses on contouring (target and organs at risk volumes), dose distribution, RT technique and quality involving precise calculation and delivery of the planned dose. This understanding is nevertheless essential to characterize radiation-induced toxicities, to better understand treatment related toxicities and to identify the predictive factors for the occurrence of these toxicities. CANcer TOxicities (CANTO) (NCT01993498, UNICANCER 0140/1103, 2011-A01095-36 ('study of chronic toxicity of treatment of patients with localized breast cancer') is a multicenter prospective cohort study with the primary objective of identifying factors predictive of chronic toxicity in patients treated for stage I-III breast cancer [6]. Within CANTO, detailed RT data were collected for a subset of patients representing CANTO-RT (CANcer TOxicities RadioTherapy), a large multicenter prospective cohort of early breast cancer (BC) patients treated with RT that aims to identify predictors of the development and persistence of long-term toxicities. In this paper, we describe the methodology used to collect RT data (full DICOM) and to ensure RT data quality control to provide a first characterization of the study population and RT data in CANTO-RT.

Study Design
CANTO (NCT01993498) is a French prospective longitudinal multicenter cohort study designed to evaluate chronic toxicities in patients treated for non-metastatic BC diagnosed and enrolled between 2012 and 2018, in 26 French centers. The details on the CANTO study procedures have previously been published in accordance with the French national regulatory requirements, good clinical practice guidelines and European General Data Protection Regulation (GDPR) as previously described by Vaz et al. [6]. This study, sponsored by Unicancer, enrolled 12,012 patients. In the database lock of December 2020, data from 2012 to 2017 were obtained corresponding to 9599 patients.

Study Population
The subset of patients matching all CANTO inclusion and exclusion criteria, who received RT in one of the 10 top recruiting CANTO centers with a minimum follow up of 3 years, and who were still in follow up at the time of the database lock were selected for CANTO-RT (Figure 1 Flowchart). Patients included were followed for 10 years as part of the study, with a minimum of 36 months follow-up. CANTO-RT patients met the following inclusion criteria: female patients aged 18 years and over covered by the national social security system, with histologically proven non-metastatic invasive BC (cT0-3, cN0-3) without previous cancer treatment. Conventional or hypofractionated RT was prescribed according to local standard-of-care. Eligible patients had breast/chest wall +/− lymph node RT with curative intent.
Cancers 2023, 15, 751 3 of 13 security system, with histologically proven non-metastatic invasive BC (cT0-3, cN0-3) without previous cancer treatment. Conventional or hypofractionated RT was prescribed according to local standard-of-care. Eligible patients had breast/chest wall +/− lymph node RT with curative intent. Figure 1. CANTO-RT Flowchart. RT: Radiation Therapy. * Inclusion criteria in CANTO: Female, 18 years of age and older, with infiltrating breast cancer diagnosed by cytology or histology, Tumor cT0-3, cN0-3, M0 before any treatment including surgery for breast cancer, patient fluent in French, free and informed consent for additional biological samples. † Inclusion criteria in CANTO RT: among the top 10 CANTO recruiting centers for transferring RT files to Aquilab, CT/RT in the same center + part of the selected centers, follow-up >3 years. ** Exclusion criteria in CANTO: Metastatic breast cancer; local recurrence of breast cancer; previous cancer within 5 years prior to cohort entry other than basal cell skin cancer or in situ cervical epithelioma; blood transfusion within the last 6 months; persons deprived of liberty or under guardianship (including curatorship).

Data Collection
Patients and multimodal treatment characteristics as well as paraclinical parameters including blood chemistries, exams, or toxicity data, etc., were collected prospectively ( Figure 2) and were the same as described in [6]. Patients were assessed at diagnosis (baseline), 3-6 (M0), 12 (M12), 36 (M36) and 60 (M60) months after completion of chemotherapy or RT, whichever came last. In this study, radiotherapy data were exported in standardized Digital Imaging and Communications in Medicine (DICOM) format by each investigating hospital to the UNITRAD online platform hosted by AQUILAB Onco Place™, a company with health-data-hosting authorization. All data were automatically pseudoanonymized and converted to homogeneous naming. We prospectively assessed data at diagnosis (baseline), 3-6 (M0), 12 (M12), 36 (M36) and 60 (M60) months after completion of chemotherapy or RT, whichever came last. Organizational structure was previously described [6] and a summary of the data collection is presented Figure 2.

Data Collection
Patients and multimodal treatment characteristics as well as paraclinical parameters including blood chemistries, exams, or toxicity data, etc., were collected prospectively ( Figure 2) and were the same as described in [6]. Patients were assessed at diagnosis (baseline), 3-6 (M0), 12 (M12), 36 (M36) and 60 (M60) months after completion of chemotherapy or RT, whichever came last. In this study, radiotherapy data were exported in standardized Digital Imaging and Communications in Medicine (DICOM) format by each investigating hospital to the UNITRAD online platform hosted by AQUILAB Onco Place™, a company with health-data-hosting authorization. All data were automatically pseudo-anonymized and converted to homogeneous naming. We prospectively assessed data at diagnosis (baseline), 3-6 (M0), 12 (M12), 36 (M36) and 60 (M60) months after completion of chemotherapy or RT, whichever came last. Organizational structure was previously described [6] and a summary of the data collection is presented Figure 2.
In CANTO RT, individual full DICOM RT data (CT, RT Structure, RT Dose, RT Plan) were collected, pseudo-anonymized, structured and analyzed on the CANTO-RT/UNITRAD web platform using AQUILAB Onco Place™ and Analytics Dose module ( Figure 3). In the Analytics Dose module, RT data were extracted, filtered and grouped according to sets of constraints by volumes (mean dose, median dose, DX%: dose covering X % of the volume expressed in Gy, VX Gy: volume receiving at least X Gy expressed in %, near-min dose, near-max dose). In CANTO RT, individual full DICOM RT data (CT, RT Structure, RT Dose, RT Plan) were collected, pseudo-anonymized, structured and analyzed on the CANTO-RT/UNI-TRAD web platform using AQUILAB Onco Place™ and Analytics Dose module ( Figure  3). In the Analytics Dose module, RT data were extracted, filtered and grouped according to sets of constraints by volumes (mean dose, median dose, DX%: dose covering X % of the volume expressed in Gy, VX Gy: volume receiving at least X Gy expressed in %, nearmin dose, near-max dose). We collected the platform RT data, the treated side (right, left, bilateral), whether or not there was the presence of a tumor bed boost, lymph node levels treated (none, level 1 to 4, interpectoral, Internal mammary chain), techniques (3D, IMRT: intensity-modulated radiotherapy), and the start and end dates of RT. The list of target volumes and organs at risk has been harmonized to have a homogeneous naming of each volume during extractions and analyses according to the following: CTVp_breast (Clinical Target Volume primary), CTVp_tumorbed, CTVp_thoracicwall, CTVn_interpectoralis (Clinical Target Volume nodal), CTVn_IMN (Internal Mammary Nodal), CTVn_L1, CTVn_L2, CTVn_L3, CTVn_L4, CTVn_Ltot, Heart, left anterior descending (LAD) coronary, Lung_right, Lung_left, Lungs, Humeral Head, Controlateral Breast, External, Spinal_cord, Thyroid, BrachialPlexus, and Esophagus.  In CANTO RT, individual full DICOM RT data (CT, RT Structure, RT Dose, RT Plan) were collected, pseudo-anonymized, structured and analyzed on the CANTO-RT/UNI-TRAD web platform using AQUILAB Onco Place™ and Analytics Dose module ( Figure  3). In the Analytics Dose module, RT data were extracted, filtered and grouped according to sets of constraints by volumes (mean dose, median dose, DX%: dose covering X % of the volume expressed in Gy, VX Gy: volume receiving at least X Gy expressed in %, nearmin dose, near-max dose). We collected the platform RT data, the treated side (right, left, bilateral), whether or not there was the presence of a tumor bed boost, lymph node levels treated (none, level 1 to 4, interpectoral, Internal mammary chain), techniques (3D, IMRT: intensity-modulated radiotherapy), and the start and end dates of RT. The list of target volumes and organs at risk has been harmonized to have a homogeneous naming of each volume during extractions and analyses according to the following: CTVp_breast (Clinical Target Volume primary), CTVp_tumorbed, CTVp_thoracicwall, CTVn_interpectoralis (Clinical Target Volume nodal), CTVn_IMN (Internal Mammary Nodal), CTVn_L1, CTVn_L2, CTVn_L3, CTVn_L4, CTVn_Ltot, Heart, left anterior descending (LAD) coronary, Lung_right, Lung_left, Lungs, Humeral Head, Controlateral Breast, External, Spinal_cord, Thyroid, BrachialPlexus, and Esophagus. We collected the platform RT data, the treated side (right, left, bilateral), whether or not there was the presence of a tumor bed boost, lymph node levels treated (none, level 1 to 4, interpectoral, Internal mammary chain), techniques (3D, IMRT: intensity-modulated radiotherapy), and the start and end dates of RT. The list of target volumes and organs at risk has been harmonized to have a homogeneous naming of each volume during extractions and analyses according to the following: CTVp_breast (Clinical Target Volume primary), CTVp_tumorbed, CTVp_thoracicwall, CTVn_interpectoralis (Clinical Target Volume nodal), CTVn_IMN (Internal Mammary Nodal), CTVn_L1, CTVn_L2, CTVn_L3, CTVn_L4, CTVn_Ltot, Heart, left anterior descending (LAD) coronary, Lung_right, Lung_left, Lungs, Humeral Head, Controlateral Breast, External, Spinal_cord, Thyroid, BrachialPlexus, and Esophagus.
Data were extracted, filtered and grouped according to sets of constraints by volumes (mean dose, median dose, DX% (dose covering X % of the volume expressed in Gy), VX Gy (volume receiving at least X Gy expressed in %), near-min dose, near-max dose).
Characteristics of the patients (age, medical history, clinical examination and concomitant treatments), tumors (including TNM, histology, HER2, estrogen and progesterone receptor), paraclinical examinations (blood/plasma tests, bone densitometry, cardiac echography or myocardial scintigraphy in case of treatment with anthracyclines/trastuzumab/RT to the left breast and/or Internal mammary chain), type of breast (lumpectomy, total mastectomy) and lymph node surgery (sentinel node, axillary dissection), chemotherapy, targeted anti-HER2 therapies and endocrine therapy were recorded from the CANTO data.

Data Management and Quality Control
Quality control of clinical data was performed regarding RT data available (laterality, type of mammary and lymph node surgery) on Aquilab Onco Place™ versus December 2020 database lock of CANTO CRF (Case Report Form). All inconsistencies were corrected by the participating centers after reopening the files on the Aquilab Onco Place™ database before dose extractions. Quality control of dosimetric data was performed after a first extraction of Dmean and D95% of the volume CTVp_Breast or Chestwall for all patients with CTV delineated. We highlighted some dose inconsistencies and identified them by manually opening the dosimetry to understand their origin. A low dose away from the usually prescribed 50 Gy could indicate severe hypofractionation (used for partial breast irradiation protocols NCT01024582 and NCT01247233) or a dosimetry offset on the centering scanner (patient error or DICOM error).

Statistical Analysis
We described characteristics and RT data available in CANTO-RT using parameters such as mean, median or inter quartile range (IQR) and the dispersion parameters as standard deviation (SD) and range for the quantitative variables, as well as the frequency (%) for the categorical variables (Table S1: List of main variable) All analyses were conducted using SAS (Statistical Analysis System), version 9.4.

Summary of RT Data Available
An overview of the CANTO-RT comprehensive RT data in terms of target volumes and OAR available for dose extraction is provided in Table 3.

Discussion
CANTO-RT is one of the largest prospective multicenter cohorts of early breast cancer patients treated with RT including full DICOM RT and standardized longitudinal data. The CANTO-RT tumor characteristics were consistent with known contemporary epidemiology [7]. In our cohort, 3D conformal irradiation was the technique mostly used, whereas IMRT was limited during this period. The percentage of IMRT techniques is not homogeneous and varies by center, and the uptake of this technology stays unevenly spread around Europe [8]. Our series shows that depending on the treated side, OAR are not delineated in the same proportions. For example, heart was more often delineated to the left side (90.1%) than to the right side (59.5%), which probably shows a concern regarding the mean cardiac dose from irradiation of a left-sided breast cancer much higher than that for a right-sided breast cancer [9]. However, we know that depending on the anatomy, the dose to the heart, especially in cases of irradiation of the internal mammary nodal chain (IMN), is not null set even when treating right-sided BC [10]. We have also shown a heterogeneity of practice in the delineation rate of clinical target volumes (CTV) treated, which varied from 52% to 91% of the cases. The absence of delineation of a treated CTV didn't allow for the proper appreciation of target volume coverage. As expected, tumor bed CTV had the highest rate of delineation (91%), while it was just the opposite for Chestwall CTV.
CANTO-RT has several strengths: It is one of the largest prospective multicenter cohorts in BC with full DICOM RT data ever published with the presence of a centralized database and is available on a single platform (Aquilab™) with innovative tools (Analytics). Second, CANTO-RT followed standard methodological quality criteria for observational studies [11,12]. The patient population has well-described inclusion and exclusion criteria: treatment information and patient-reported outcomes were reported with the use of standardized CRFs, and the length of observation has sufficient duration to apprehend treatment-related toxicity. Third, electronic transfer of DICOM data and quality control methods optimized the quality of RT data available, avoiding manual reporting of complex values to be found in a RT technical file. Thus, CANTO-RT reports on RT data available in one of the largest databases in the world with individual full DICOM RT files (CT, RT Structure, RT Dose, RT Plan) and with contemporary RT techniques. Initiatives to centralize information available on large-scale RT exist in some countries but not for a long duration, due to the technological challenges imposed by the volume of this data. The REQUITE cohort has recruited 4400 patients and is one of the largest multicenter cohorts of cancer patients treated with RT with standardized longitudinal data collection, but it mixes several tumor sites and is not specific to breast cancer (2057 patients) [13]. Other BC studies are retrospective (case-control study) and use outdated RT techniques with a reconstructed mean heart dose (MHD) derived from two-dimensional (2D) data using typical anatomy rather than individual CT-based information [14,15]. Unlike CANTO-RT, these studies are based on dosimetric estimates that are too imprecise to improve the assessment of the benefit/risk balance of RT in personalized medicine. In most trials, we just have the information of RT as yes/no. However, the evaluation of toxicity, volumes, doses, fractionation and techniques must be taken into account. Breast cancer treatments are multimodal and it is important to do analyses integrating the different treatment parameters to better understand the toxicities specific to each treatment and the links between them.
We admit some limitations. First, radiation therapy practices have already changed. Large, prospective and randomized phase III trials have demonstrated that hypofractionated treatment results in equivalent tumor controls, better or improved acute and late toxicity, better or improved breast cosmesis compared to conventionally-fractionated regimens for early-stage breast cancer [16][17][18]. Hypofractionated whole-breast irradiation has become the new standard of care for breast conservation therapy; preferred regimens are 40 Gy in 15 fractions. Caution should be taken when comparing trends in dose according to calendar years, since the change of fractionation regimens (from 50 Gy/25 to 40 Gy/15 and today 26 Gy/5 in some cases) will by itself lead to a reduction in physical dose. In addition, fractionation is unspecified for a significant rate of RT (24.5%) because of missing CTV breast or chest wall without the possibility of extracting the dose and deducing fractionation. Other practices were changing during the inclusion period, e.g., the tumor bed boost delivery, which is less prescribed in patients older than 50 or 60 [19], and IMRT techniques which are more often used nowadays as they have shown similar results in locoregional tumor control but show superior results in planning target volume coverage [20]. Then, the sub-group of patients selected for CANTO-RT was restricted to the top 10 recruiters for a convenience sample and could have introduced bias. Lastly, there are biases inherent in the delineation of OAR and target volumes during RT treatment planning: missing volumes and variability between the institutions and observers [21]. The guidelines for radiation therapy for early BC stay heterogeneous [22][23][24][25][26]. CANTO-RT could be a tool for comparing practices, and such international bases would be desirable in the future.
The use of this database will allow for the analysis of the dose-effect relationship of radiation received in the organs of women in the CANTO-RT cohort with a possible correlation to the toxicities graded during their prospective follow-up. There are several ongoing projects, such as heart, skin, lung toxicity analyses. CANTO-RT will try to improve knowledge on the relationship between RT toxicities and systemic treatments and the role of potential modifiers of this dose-response such as chemotherapy and hormonal therapy. Other objectives could be the use of statistics and artificial intelligence (Machine, Deep or/and Reinforcement Learning) combined with dosimetry reconstruction approaches to supplement the dosimetric data of the CANTO-RT database during collaborative projects. This cohort, with a large amount of data collected on characteristics, clinical, paraclinical, biological and RT data, will help improve the knowledge needed to develop personalized medicine for BC patients.

Conclusions
We successfully established CANTO-RT, a prospective cohort of 3875 early breast cancer patients with full individual clinical and DICOM RT data available showing an important heterogeneity in volumes contoured. CANTO-RT is a valuable resource, open for collaborative projects, for the identification and validation of clinical and dosimetric predictive factors of RT-related toxicities. Further long-term projects and follow up are ongoing, and we hope to expand the collection of RT data.