Methods of Esthetic Assessment after Adjuvant Whole-Breast Radiotherapy in Breast Cancer Patients: Evaluation of the BCCT.core Software and Patients’ and Physicians’ Assessment from the Randomized IMRT-MC2 Trial

Simple Summary To validate the BCCT.core software, the present analysis compares the esthetics assessment by the software in relation to patients’ and physicians’ rating in breast cancer patients after surgery and adjuvant radiotherapy. Agreement rates of the different assessments and their correlation with breast asymmetry indices were evaluated. The assessments of the software and the physicians were significantly correlated with all asymmetry indices, while for patients’ self-assessment, this general correlation was first seen after 2 years. Only a slight agreement between the BCCT.core software and the physicians’ or patients’ assessment was seen, while a moderate and substantial agreement was detected between the physicians’ and the patients’ assessments. The BCCT.core software is a reliable tool to measure asymmetries, but may not sufficiently evaluate the esthetic outcome as perceived by patients. It may be more appropriate for a long-term follow-up, when symmetry seems to increase in importance. Abstract The present analysis compares the esthetics assessment by the BCCT.core software in relation to patients’ and physicians’ ratings, based on the IMRT-MC2 trial. Within this trial, breast cancer patients received breast-conserving surgery (BCS) and adjuvant radiotherapy. At the baseline, 6 weeks, and 2 years after radiotherapy, photos of the breasts were assessed by the software and patients’ and physicians’ assessments were performed. Agreement rates of the assessments and their correlation with breast asymmetry indices were evaluated. The assessments of the software and the physicians were significantly correlated with asymmetry indices. Before and 6 weeks after radiotherapy, the patients’ self-assessment was only correlated with the lower breast contour (LBC) and upward nipple retraction (UNR), while after 2 years, there was also a correlation with other indices. Only a slight agreement between the BCCT.core software and the physicians’ or patients’ assessment was seen, while a moderate and substantial agreement was detected between the physicians’ and the patients’ assessment after 6 weeks and 2 years, respectively. The BCCT.core software is a reliable tool to measure asymmetries, but may not sufficiently evaluate the esthetic outcome as perceived by patients. It may be more appropriate for a long-term follow-up, when symmetry appears to increase in importance.


Introduction
Due to the early detection of breast cancer, increasing rates of breast conservation, and improvements in long-term survival, the esthetic outcome of breast-conserving therapy is coming more and more into focus for both caregivers and caretakers. In terms of radiation oncology, the use of boost irradiation is a major determinant of esthetic outcome [1][2][3]. Thus, for the evaluation of new radiotherapy techniques, the esthetic outcome is undoubtedly a highly important outcome parameter. Different methods are reported in the literature to measure the esthetic outcome: subjective approaches, such as patients' self-assessment and third-party assessment performed by a single physician or a panel of physicians, and objective techniques, primarily measuring symmetry [4][5][6]. Subjective panel evaluation and self-assessment continue to be the most commonly used methods for evaluating esthetic outcome [7]. To improve the reproducibility and comparability of subjective esthetics assessments, the standardized four-point Harvard Scale was introduced by Harris et al., and broadly used by the research community [8]. Nevertheless, in breast cancer therapy, the reproducibility of subjective assessments of esthetic outcome still remains limited [9,10]. Furthermore, a subjective evaluation performed by a panel of experts is an expensive, difficult, and time-consuming procedure [10]. Therefore, a reliable, highly reproducible, and simple method for esthetic assessment is urgently needed for future research approaches. The BCCT.core (Breast Cancer Conservative Treatment. Cosmetic Result) software is a cost-free, semi-automated, and easy-to-use tool, which provides a highly reliable and reproducible assessment of esthetic outcome [11][12][13][14]. In previous studies, the validity of the BCCT.core software was tested by comparing the results of the program to a subjective panel assessment [11,12,15]. Furthermore, the agreement of the software results with the patient perspective was tested in comparison to the BCTOS (Breast Conservative Treatment Outcome Scale) Aesthetic Status [13]. However, the number of cases tested in these trials were rather small, consisting of 30 to 128 patients. In the present study, the esthetic outcome data of the prospective, two-armed, randomized phase III IMRT-MC2 trial were used for an in-depth validation of the BCCT.core software with a larger number of cases. For this purpose, the software results were analyzed in comparison to physicians' assessments and patients' self-assessment of the esthetic outcome.

Results
The IMRT-MC2 trial randomized 502 patients to the control arm (3-D-CRT-seqB) and the experimental arm (IMRT-SIB) in a 1:1 ratio. For a total of 472 patients, 433 patients, and 378 patients, a complete assessment of esthetic outcome was available at the baseline, at 6 weeks, and 2 years after radiotherapy, respectively. These patients were eligible for the present analysis. For the three time points, the characteristics of patients, tumors, and treatments are summarized in Table 1. At the baseline, the median age of the participants was 55 years. Most patients presented with T1 stage disease (74%), and were staged pN0 (80%). In most cases, the tumor was located in the upper outer quadrant of the breast (61%).

Correlation between Breast Asymmetry Indices and Overall Esthetic Scores
The correlations between different breast asymmetry indices and overall esthetic outcomes of the BCCT.core software, as well as between breast asymmetry indices and patients' or physicians' assessment scores, at the baseline, 6 weeks, and two years after radiotherapy are shown in Table 2.
For the overall esthetic score of the BCCT.core software, a significant correlation was seen with all asymmetry indices at all time points. The highest Pearson correlation score was detected for the lower breast contour (LBC), which indicates the difference between the level of the inferior breast contours (Pearson coefficient: 0.669, 0.554, and 0.688 for the three time points, respectively; p < 0.001). Table 2. Correlations between breast asymmetry indices and overall esthetic outcomes of BCCT.core, and between breast asymmetry indices and patients' or physicians' assessment scores, at the baseline, 6 weeks, and 2 years after radiotherapy (four-point scale: excellent-good-fair-poor).

BCCT.Core
Patients A Pearson correlation score was used to determine the correlation between breast asymmetry indices and overall cosmetic outcomes of BCCT.core, as well as between breast asymmetry indices and patients' or physicians' assessment scores at the baseline, 6 weeks, and 2 years after radiotherapy. A two-sided significance test was used to compute the statistical significance. Statistically significant p-values (a ≤ 0.05) are presented in bold. Abbreviations: n: number of valid assessments; pBRA: breast retraction assessment; LBC: lower breast contour; UNR: upward nipple retraction; BCE: breast compliance evaluation; BCD: breast contour difference; BAD: breast area difference; BOD: breast overlap difference; Bold: The bold is for highlighting significant values.
At the baseline, 6 weeks, and two years after radiotherapy, the overall physicians' assessment of esthetic outcome also significantly correlated with all breast asymmetry indices. For physicians' assessments of esthetics, the Pearson correlation coefficients were lower than the correlation coefficients seen for the cosmetic score of the BCCT.core software, indicating a stronger correlation of asymmetry indices with the score of the BCCT.core software.
Only for LBC and upward nipple retraction (UNR) was a significant correlation with the patients' self-assessment of esthetics seen at all time points. However, two years after radiotherapy, all asymmetry indices were significantly correlated with the patients' self-assessment.
When the four-point scale of esthetics was dichotomized into a two-point scale, no correlation between breast asymmetry indices and patients' self-assessment of esthetics was seen for the baseline and 6 weeks after radiotherapy time points. Otherwise, no differences were seen between the analysis of the four-point and the two-point scale (Table 3). Table 3. Correlations between breast asymmetry indices and overall esthetic outcomes of BCCT.core, and between breast asymmetry indices and patients' or physicians' assessment scores, at the baseline, 6 weeks, and 2 years after radiotherapy (two-point scale: excellent/good-fair/poor).

BCCT.Core
Patients A Pearson correlation score was used to determine correlation between breast asymmetry indices and overall cosmetic outcomes of BCCT.core, as well as between breast asymmetry indices and patients' or physicians' assessment scores, at the baseline, 6 weeks, and 2 years after radiotherapy. A two-sided significance test was used to compute the statistical significance. Statistically significant p-values (a ≤ 0.05) are presented in bold. Abbreviations: N: number of valid assessments; pBRA: breast retraction assessment; LBC: lower breast contour; UNR: upward nipple retraction; BCE: breast compliance evaluation; BCD: breast contour difference; BAD: breast area difference; BOD: breast overlap difference; Bold: The bold is for highlighting significant values.

Agreement of BCCT.Core Software Results with Patients' and Physicians' Assessment Scores
The agreement of the BCCT.core software results for esthetics with patients' or physicians' assessment scores, as well as the agreement of patients' assessment scores with physicians' assessment scores, are depicted in Table 4 for the baseline, 6 weeks, and two years after radiotherapy. Table 4. Agreement of BCCT.core software results with patients' or physicians' assessment scores, and agreement of patients' assessment scores with physicians' assessment scores, at the baseline, 6 weeks, and 2 years after radiotherapy (four-point scale: excellent-good-fair-poor). At the baseline, there was only a slight agreement between the BCCT.core software results and patients' self-assessment scores (weighted Kappa (wk) = 0.109; p = 0.003), as well as between patients' and physicians' assessment scores (wk = 0.100; p < 0.001). In the time points of 6 weeks and 2 years after radiotherapy, only a slight agreement was seen for the BCCT.core esthetic results and physicians' assessment scores (wk = 0.084; p = 0.001, wk = 0.138; p < 0.001, respectively). For the correlation of BCCT.core results and patients' self-assessment two years after therapy, there was also only slight agreement (wk = 0.111; p = 0.002). On the other hand, moderate and even substantial agreement was seen for patients' and physicians' assessment scores both 6 weeks (wk = 0.572; p < 0.001) and two years (wk = 0.625; p < 0.001) after radiotherapy, respectively.
In general, an increase in agreement was seen over time: With increasing time since therapy, the agreement also increased, with the highest agreement rates presented after two years, and the lowest values at the baseline. After dichotomizing the four-point scale of esthetics, there was no substantial change in the results of the analysis. However, the level of agreement was generally lower (Table 5).

Discussion
As the overall esthetic score of the BCCT.core software is primarily based on asymmetry measurements, a strong correlation with breast asymmetry indices was detected in the present analysis at all time points. This effect was expected, and was strongest for LBC, which is in line with results of Yu et al., also reporting the highest Pearson correlation score for this asymmetry index [16]. Physicians' assessment of esthetics also significantly correlated with breast asymmetry indices. Although the effect was not as strong as for the BCCT.core software, it demonstrates that physicians' assessments also appear to be predominantly influenced by asymmetry. This is in line with results of Lyngholm et al., who analyzed late morbidity, esthetic outcome, and body image of 214 patients from the Danish Breast Cancer Cooperative Group after breast-conserving therapy. This study demonstrated that breast asymmetry measured by BRA was the only factor that correlated with physicians' assessments of esthetic outcome [17].
For patients' self-assessment, a significant correlation with asymmetry indices at the baseline and 6 weeks after radiotherapy mark was only seen for LBC and UNR in the present study. Other asymmetry scores, such as the pBRA, significantly correlated with patients' assessments only after two years. Assumingly, for patients immediately after surgery and radiotherapy factors, other than asymmetry, such as scars, fibrosis, hyperpigmentation, or even pain, may have a stronger influence on esthetic self-assessment. Then, as time passes, the symmetry of the breast may rise again in importance. This effect may also explain the stabilization of the esthetic outcome 3 years after therapy, as it is found in most previous studies [18][19][20][21][22]. As an uneven lower breast contour and nipple position, measured with LBC and UNR, are the most eye-catching parameters of asymmetry, it may be reasonable that a correlation of these factors and patients' esthetics assessments was detected earlier after treatment. Only a weak correlation of patients' esthetics rating with asymmetry indices, as seen in the present work, was also observed by Yu et al.; at a median follow-up of two years after breast-conserving therapy, the authors evaluated the esthetic outcome rated by the 51 patients themselves, and did not find any significant correlation with breast asymmetry measurements [16]. Some authors argue that assessment of esthetics with a four-point scale is less appropriate for patients, and that these global rating categories, when used by patients themselves, may only reflect perceived differences between treated and untreated breasts [23,24]. In an analysis of Patterson et al., 94% of patients rated the esthetic outcome after breast-conserving therapy as good or excellent, although 50% of patients reported relevant differences between the treated and the untreated breast [25]. However, rating by physicians may be considered to be fairly subjective as well.
The present analysis indicates a stronger agreement for physicians' assessments of esthetic outcome (moderate and substantial agreement 6 weeks and 2 years after therapy, respectively) than for the assessment with the BCCT.core software (only slight agreement). This is in line with similar agreement rates detected in previous studies; in an analysis of Heil et al.'s study, which included 128 patients, a slight to fair agreement was seen between the BCCT.core results and patients' assessments of esthetics, tested with the BCTOS Aesthetic Status questionnaire [13]. A study of Yu et al., including 51 patients, also demonstrated only a slight agreement between BCCT.core software results and patients' self-assessment 2 years after whole-breast radiotherapy [16].
In the current analysis, the observed effect of a stronger agreement with physicians' assessments than with the BCCT.core results may be due to a more holistic and less asymmetry-based evaluation of the breast by the physicians, taking into account other relevant factors, such as scars and pigmentation. Representative examples of agreement and disagreement of the software results with patients' and physicians' assessment scores are depicted in Figure 1. However, two years after radiotherapy, an increase in agreement rates, as compared to patients' self-assessment, was detected for both the physicians' assessment and the assessment by the software. This observed increase in agreement levels for the software results with patients' self-assessment may be due to a rising importance of symmetry for the patients with time. Therefore, the BCCT.core software appears to be a better measurement method for long-term follow-up of the esthetic outcome, rather than evaluating the esthetic results immediately after therapy.
When comparing our findings to the results of the previous studies of Haloua et al., Cardoso et al. and Heil et al., we could not confirm similar high rates of agreement of physicians' assessment and BCCT.core software results (reported Kappa: 0.27 to 0.64) [11,12,15,26]. This may partly be explained by the fact that, in those studies, the physicians' rating was performed by a panel, rather than by a single physician, as was the case in the present prospective phase III trial. For the current analysis, physicians' assessment was performed by live observation and palpation, as recommended by previous authors [27], to detect outcome variables such as edema, fibrosis, and telangiectasia more easily. In this setting, a panel evaluation was not feasible. Although intra-rater agreement in third-party assessment of the esthetic outcome is described to be substantial in the literature [28], a suboptimal intra-rater agreement may have weakened the present analysis. Unfortunately, intra-rater agreement was not tested in the present work. Moreover, the rating physician was one of the treating radiation oncologists. This may have influenced the physicians' assessment towards a more positive esthetic outcome.
A strength of the present study, in addition to the large number of patients assessed, is the use and comparison of all three methods of esthetics assessment (subjective, objective, and self-assessment) as recommended in the literature [7]. In particular, the results of patients' self-assessment provide a highly relevant reference for the present analysis, as well as for future studies. Furthermore, in the current analysis, the application of the standardized four-point scale of Harris et al., in combination with photographic examples [8,18], as recommended by Vrieling et al. [24], makes it easy to compare our results to those of other authors.
Cancers 2022, 14, x FOR PEER REVIEW 9 of 16 depicted in Figure 1. However, two years after radiotherapy, an increase in agreement rates, as compared to patients' self-assessment, was detected for both the physicians' assessment and the assessment by the software. This observed increase in agreement levels for the software results with patients' self-assessment may be due to a rising importance of symmetry for the patients with time. Therefore, the BCCT.core software appears to be a better measurement method for long-term follow-up of the esthetic outcome, rather than evaluating the esthetic results immediately after therapy. According to our analysis, the BCCT.core software is an excellent tool for reliably measuring asymmetry, which appears to be a strong influencing factor for physicians' assessment of esthetics. However, in the first weeks after breast-conserving therapy, patients' self-assessment appears to depend less on asymmetry indices. Patients' self-assessment is certainly the most relevant method to measure the esthetic outcome, even if this is not objective. Nevertheless, the reliability of patients' assessment and its use for validation purposes is controversially discussed in the literature, as subjective assessments may not necessarily agree with the objective scores, and have very low reproducibility values [11,13,24,29]. In the literature, the application of both a qualitative and a quantitative method of measuring the esthetic outcome is recommended, especially when skin changes such as scars are relevant, since a single assessment that covers all of this complexity will probably never exist [18,24].

Description of Analyzed Patients
The present investigation was based on the esthetic outcome of participants of the IMRT-MC2 prospective, two-armed, multi-center, randomized phase III trial at the baseline of, as well as 6 weeks and 2 years after, whole-breast irradiation (WBI).  [30,31]. Detailed information about target volume delineation, dose prescription, dose constraints, and treatment planning have been published previously [30,31].
All participants of the IMRT-MC2 trial with a complete assessment of esthetic outcome were included in the present analysis. All patients of the present analysis gave their written informed consent to participate in the study, and were characterized by the following criteria: all patients had to have an indication for adjuvant WBI with boost irradiation to the former tumor bed. Patients needed to be aged ≥ 18 years and < 70 years, or aged ≥ 70 years with one of the following risk factors: multifocal disease, tumor stage ≥ T2, extensive intraductal component, lymphangiosis and resection margins ≤ 3 mm, Karnofsky Performance Score > 70%, no metastatic disease (M0), no previous radiotherapy of the same breast or thorax, no other malignancies in the previous 5 years, no pregnancy [30].

Software Analysis of Esthetic Outcome Using BCCT.Core Software
Each patient of the present analysis received a standardized photographic documentation of the breasts shortly before radiotherapy (baseline), as well as 6 weeks and 2 years after irradiation. Using the following same procedure for every woman, frontal photographs of both breasts excluding the face were taken: standing position with hanging arms, equal standardized exposure to light, equal distance of 2.5 m from camera to patient, equal background in blue color. The sternal notch, as well as a point 25 cm below, were marked with red dots to allow for the correction of the magnification of the photographs. For each patient and for every single time point, these photographs were analyzed using the BCCT.core software (Breast Research Group, Porto, Portugal) to calculate breast asymmetry indices and an overall score of the esthetic result. This software was developed to summarize all objective symmetry measurements ever described in one single tool [5][6][7]9,15,18,32]. After importing a standardized photograph into the software, predetermined points are designated by the user, and the breast contour is semi-automatically delineated (Figure 2). In a following step, the BCCT.core software calculates different asymmetry indices, including breast volume, skin color, and scars. Finally, an algorithm combines these indices in an overall esthetic result, displayed on a four-point scale (1 = excellent, 2 = good, 3 = fair, 4 = poor). All measurements and calculations were performed by the same investigator (TF) who was blinded regarding timepoint and randomization.

Patients' and Physicians' Assessment of Esthetic Outcome
Shortly before radiotherapy (baseline), as well as 6 weeks and 2 years after irradiation, the esthetic outcome was assessed by both the treating physician and the patient using the Radiation Therapy Oncology Group/Harvard Scale, comparing the treated breast to the untreated one. The esthetic outcome was scored according to a four-point scale: excellent (no visible treatment sequelae at first sight), good (minimal changes), fair (the treated breast is different but not seriously distorted), or poor (the treated breast is seriously distorted) [8]. The rating physicians were all radiation oncologists, and had at least 2 years' experience in the field.

Statistical Methods
For the agreement analyses of patients' and physicians' subjective scoring and the objective software scoring of esthetic outcome, absolute agreement rates (a), Kappa (k), and weighted Kappa (wk) statistics were used [33,34]. The agreement analysis was performed separately for the 3 different times of assessment: baseline (before radiotherapy), 6 weeks, and 2 years after radiotherapy. To interpret the Kappa and weighted Kappa coefficients, we used the definition recommended by Seigel et al.: ≤ 0 indicates poor agreement, 0.01-0.20 slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, 0.81-0.99 almost perfect agreement, and 1.00 perfect agreement [35]. Using absolute agreement rates (a), Kappa (k), and weighted Kappa (wk), we tested agreement of the esthetic scores (four-point scale) of the BCCT.core software and the patients' subjective scoring, of the BCCT.core software and the physicians' subjective scoring, and of the patients' and the physicians' subjective scoring. In a second step, the four-point scale of the patients', physicians', and software assessment was dichotomized into "excellent/good" and "fair/poor" esthetic outcomes, and the statistical analysis After the user has adjusted the red dots to the most medial and lateral point of the breast outline, the BCCT.core software is able to auto adjust the breast outline and to calculate several asymmetry indices, as well as an overall score (four-point scale: excellent-good-fair-poor).

Patients' and Physicians' Assessment of Esthetic Outcome
Shortly before radiotherapy (baseline), as well as 6 weeks and 2 years after irradiation, the esthetic outcome was assessed by both the treating physician and the patient using the Radiation Therapy Oncology Group/Harvard Scale, comparing the treated breast to the untreated one. The esthetic outcome was scored according to a four-point scale: excellent (no visible treatment sequelae at first sight), good (minimal changes), fair (the treated breast is different but not seriously distorted), or poor (the treated breast is seriously distorted) [8]. The rating physicians were all radiation oncologists, and had at least 2 years' experience in the field.

Statistical Methods
For the agreement analyses of patients' and physicians' subjective scoring and the objective software scoring of esthetic outcome, absolute agreement rates (a), Kappa (k), and weighted Kappa (wk) statistics were used [33,34]. The agreement analysis was performed separately for the 3 different times of assessment: baseline (before radiotherapy), 6 weeks, and 2 years after radiotherapy. To interpret the Kappa and weighted Kappa coefficients, we used the definition recommended by Seigel et al.: ≤0 indicates poor agreement, 0.01-0.20 slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, 0.81-0.99 almost perfect agreement, and 1.00 perfect agreement [35]. Using absolute agreement rates (a), Kappa (k), and weighted Kappa (wk), we tested agreement of the esthetic scores (four-point scale) of the BCCT.core software and the patients' subjective scoring, of the BCCT.core software and the physicians' subjective scoring, and of the patients' and the physicians' subjective scoring. In a second step, the four-point scale of the patients', physicians', and software assessment was dichotomized into "excellent/good" and "fair/poor" esthetic outcomes, and the statistical analysis was repeated for this two-point scale.
For the correlation analyses of different breast asymmetry indices and subjective patients', subjective physicians', and objective software scores for overall esthetic outcome (four-point scale), a Pearson correlation coefficient was used [36]. The correlation analysis was performed separately for the three different times of assessment, as mentioned above. Again, the four-point scale of the esthetic outcome was dichotomized into "excellent/good" and "fair/poor", and the correlation analysis was repeated for the two-point scale.
Data analyses were performed with the IBM Statistical Package for Social Sciences software, version 19 (SPSS Inc., Chicago, IL), to calculate Kappa, weighted Kappa, the Pearson coefficient, and p-values. A two-sided significance test was used to compute the statistical significance. The level of statistical significance was set at p ≤ 0.05.

Conclusions
The presented data demonstrate that an assessment by the BCCT.core software alone may not be sufficient to evaluate the esthetic outcome in the way that it is perceived by the patients. The BCCT.core software is a good and reliable tool to measure objective asymmetries, but it should be complemented by physicians' assessment and patients' self-assessment to take the subjective perception of esthetics into account. The BCCT.core software may be more appropriate for measuring the esthetic outcome in a long-term follow-up, when symmetry appears to become more important for patients.

Funding:
The IMRT-MC2-trial was funded by the German Aerospace Center (DLR)/Federal Ministry of Education and Research (BMBF) (01ZP0504). The funding source had no role in the study design, collection, analysis, and interpretation of data. We acknowledge financial support by the Medical Faculty of Heidelberg University within the physician scientist program.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics committee of Heidelberg University (S-041/2009).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All data presented in this article are available on request to the editor or the reviewers. Open Access publishing is supported.