Patient-reported outcome measurement compared with professional judgment of cosmetic results after breast-conserving therapy

Background In the present study, we set out to compare patient-reported outcomes with professional judgment about cosmesis after breast-conserving therapy (bct) and to evaluate which items (position of the nipple, color, scar, size, shape, and firmness) correlate best with subjective outcome. Methods Dutch patients treated with bct between 2008 and 2009 were analyzed. Exclusion criteria were prior amputation or bct of the contralateral breast, metastatic disease, local recurrence, or any prior cosmetic breast surgery. Structured questionnaires and standardized six-view photographs were obtained with a minimum of 3 years’ follow-up. Cosmetic outcome was judged by the patients and, based on photographs, by 5 different medical professionals using 3 different scoring systems: the Harvard scale, the Sneeuw questionnaire, and a numeric rating scale. Agreement was scored using the intraclass correlation coefficient (icc). The association between items of the Sneeuw questionnaire and a fair–poor Harvard score was estimated using logistic regression analysis. Results The study included 108 female patients (age: 40–91 years). Based on the Harvard scale, agreement on cosmetic outcome between the professionals was good (icc: 0.78). In contrast, agreement between professionals as a group compared with the patients was found to be fair to moderate (icc range: 0.38–0.50). The items “size” and “shape” were identified as the strongest determinants of cosmetic outcome. Conclusions Cosmetic outcome was scored differently by patients and professionals. Agreement was greater between the professionals than between the patients and the professionals as a group. In general, size and shape were the most prominent items on which cosmetic outcome was judged by patients and professionals alike.


INTRODUCTION
Breast-conserving therapy (bct) has become a widely used treatment for patients with breast cancer or ductal carcinoma in situ. The principle of bct is wide local tumour excision and postoperative whole-breast irradiation. The goals of bct are to achieve optimal local tumour control and a good cosmetic outcome 1 . However, breast deformation will likely occur after this dual treatment modality, particularly during the first 3 post-treatment years 2,3 .
Because of the ongoing surgical and radiotherapeutic developments in bct and the increasing use of oncoplastic surgery, it is important to continue evaluating cosmetic results not only shortly after the primary treatment, but also in the years thereafter. Although the oncologic results of bct are routinely being assessed during long-term follow-up, evaluation of cosmetic results is not yet a standard part of the treatment and follow-up process.
Cosmetic results can be evaluated using various methods. The Harvard scale, which compares an overall cosmetic impression of the treated breast with the untreated breast, is the most widely used, validated, and subjective scoring system, categorizing results as excellent, good, fair, or poor 4,5 . Using that scale, "good" or "excellent" cosmetic outcomes have been reported in 71%-93% of patients [6][7][8][9][10] . Alternatively, a more detailed questionnaire developed Sneeuw et al. 11 addresses the surgical scar; breast size, shape, firmness, and color; and nipple position.
The primary aim of the present study was to compare patient satisfaction about the cosmetic outcome after bct with professional judgment. The secondary objectives were to evaluate which items of the Sneeuw questionnaire correlated best with the scores resulting from use of the Harvard scale.

METHODS
In this cross-sectional study, patients and professionals used the Harvard scale, the Sneeuw questionnaire, and a numeric rating scale to rate the cosmetic results of bct.

Ethics Approval
Review by the Medical Ethical Review Board of Máxima Medical Center [mmc (10 July 2012)] confirmed that the study did not require formal ethics approval. Data were collected anonymously and were treated according to current applicable Dutch law and in accordance with the Declaration of Helsinki 12 .

Patients
Patients treated for invasive breast cancer or ductal carcinoma in situ by bct at the breast clinic of mmc in 2008 and 2009 were included in the study. Patients with a prior amputation or bct of the contralateral breast, metastatic disease at presentation or during follow-up, local recurrence, or any prior plastic surgery to the breast were excluded. Between September and October 2012, 3-4 years after their bct, patients were asked to participate in the study.

Photographs and Questionnaires
After obtaining informed consent, patients received questionnaires about the cosmetic outcome after bct, and medical photographs were taken of both breasts. Standardized 6-view photographs (frontal with arms up, frontal with arms at sides, left and right lateral, and left and right lateral oblique) were obtained, omitting the patient's face. All photographs were saved anonymously into the mmc's digital program Vision Review Web Client.
Five medical professionals (2 breast surgeons, 1 breast radiologist, 1 plastic surgeon, and 1 breast oncology nurse) were asked to participate and to judge the cosmetic outcomes based on the photographs. Patients did not view their photographs, but provided scores based on their own opinion. Patients and professionals were both asked to evaluate the cosmetic outcome using 3 different scoring systems: n The Harvard scale Four outcomes are possible. An "excellent" cosmetic result score is assigned when the treated breast is nearly identical to the untreated breast [ Figure 1(A-F)]. A "good" cosmetic score is assigned when the treated breast is slightly different from the untreated breast [ Figure 2(A-F)]. A "fair" score indicates that the treated breast is clearly different from the untreated one, although not seriously distorted [ Figure 3(A-F)]. A "poor" score is assigned when the treated breast is seriously distorted [ Figure 4(A-F)]. n Sneeuw et al. questionnaire The appearances of the treated and untreated breasts are compared based on the surgical scar; breast size, shape, firmness, and color; and nipple position. Four answers are possible for each item: no difference, small difference, moderate difference, or large difference. n A numeric rating scale scored cosmesis of the treated breast from 1 ("very poor") to 10 ("excellent").

Statistical Analysis
The Explora software application [NVZ Plus, Utrecht, Netherlands (https://www.nvz-ziekenhuizen.nl/trainingen/ explora/)] was used to collect and process the results of the questionnaires. Data were analyzed using the IBM SPSS Statistics software application (version 22: IBM, Armonk, NY, U.S.A.). Data are expressed as means with standard deviation and medians with range, as appropriate.
Rating agreement between the professionals and between the patients and the professionals as a group are expressed as intraclass correlation coefficients with corresponding 95% confidence intervals (cis), using the 2-way mixed model in IBM SPSS Statistics. A logistic regression analysis was performed to identify the items in the Sneeuw questionnaire that were associated with "poor" results on the Harvard scale by assessor type (professional or patient). For that purpose, the Harvard scale was dichotomized into excellent/good ("good") and fair/poor ("poor"). A Likert scale for the Sneeuw questionnaire items was included in the regression model as a continuous predictor. The corresponding odds ratios and 95% confidence intervals were calculated. Statistical significance was accepted at a 2-sided p value less than 0.05.

General Information
Of 279 patients consecutively treated by bct during the study period, 196 were eligible for the study, and 112 (40%) agreed to participate. A reason for nonparticipation in the study was given by 60 patients (Table i).

Outcome
Using the Harvard scale, 62% of patients classified their cosmetic outcome as good or excellent. The average judgment of good or excellent by the professionals was 56%. On the Harvard scale, agreement between the professionals was substantial, with an intraclass correlation coefficient of 0.78 (95% ci: 0.72 to 0.83). Agreement between the professionals as a group and the patients was fair to moderate (intraclass correlation coefficients: 0.38-0.50; Table iii).
Table iv illustrates the agreement between the professionals and the patients for the most relevant items of the Sneeuw questionnaire on cosmetic outcome. In particular, fair correlation between the professionals and the patients was observed for the items "size" and "shape." Table v shows odds ratios for the most relevant items of the Sneeuw questionnaire compared with outcomes defined on the Harvard scale as "poor." Generally, a poor FIGURE 1 (A-F) Example of a patient who scored her outcome as "excellent." The mean Harvard score as judged by professionals was also "excellent." rating for "size" and "shape" on the Sneeuw questionnaire was strongly associated with a poor outcome rating on the Harvard scale. Figure 5 illustrates the correlations of the numeric ratings by individual patients with the average ratings by the 5 professionals. Among the lower outcome ratings (<7), scores given by the professionals were slightly higher than those given by the patients. Among the higher ratings (>7), the opposite was observed: scores given by the patients were slightly better than those given by the professionals.

DISCUSSION
In the present study, we compared professional judgment and patient satisfaction with the patient's cosmetic outcome after bct. Further, we evaluated how the items of the Sneeuw questionnaire best correlated with scores on the Harvard scaling system.
The present study shows that the judgments of cosmetic outcome after bct by various professionals are quite comparable and accord with one another. However, agreement on outcomes between patients and the professionals as a group was only moderate. Based on a numeric rating scale, patients give higher ratings when they consider the outcome good or excellent, but lower ratings when they consider the outcome moderate or poor. Patients and professionals both regarded size and shape of the treated breast as the most important aspects when judging cosmetic outcome.
The literature shows that the cosmetic effects of bct can be evaluated in various ways. Cardoso et al. 5 discussed current methods of esthetic evaluation after bct and the lack of a "gold standard." Furthermore, their review provided a set of recommendations that could be used as guidance for the esthetic evaluation of bct. They concluded that patient self-evaluation, observer evaluation, patient digital photographs, and timing FIGURE 2 (A-F) Example of a patient who scored her outcome as "good." The mean Harvard score as judged by professionals was also "good." are all components that should be evaluated when scoring the esthetic outcome of bct. On the other hand, the Harvard scale has been reported to be the scale most widely used for such clinical evaluation 4,5 . In the present study, use of all of the foregoing methods was analyzed.
Objective measurement of cosmetic outcome with specific software, as performed by Cardoso et al. 13 and Yu et al. 14 , is considered to be the most accurate evaluation of asymmetry only. In the present study, we choose to use subjective tools rather than an objective tool to examine cosmetic outcome after bct because we were interested in more aspects than asymmetry alone. Furthermore, we were interested in studying the grading of cosmetic results by the patients and the professionals, and any correlations. In our FIGURE 3 (A-F) Example of a patient who scored her outcome as "fair." The mean Harvard score as judged by professionals was also "fair." opinion, such an evaluation could only be accomplished using subjective tools.
The resulting data show that cosmetic outcome by the Harvard scale, when scored by patients, was excellent or good in 62% of evaluations, and that approximately 70% of the patients gave the outcome a score of 7 or more on a numeric scale. Those data closely accord with results in other studies, which reported excellent or good outcomes in 57%-88% of patients, at a median follow-up in the range of 2-5 years 8,11,14-21 .
The original study by Sneeuw et al. 11 showed good correlation between professional assessments of cosmetic results, in accord with our results. However, in the present study, agreement about outcome between FIGURE 4 (A-F) Example of a patient who scored her outcome as "poor." The mean Harvard score as judged by professionals was also "poor." Distance to the hospital a patient and the professionals as a group appeared to be less and only moderate. We found only a few reports of this specific comparison in the literature. One study reported results of assessments by patients and by oncologists, showing low agreement between them, as observed in the present series 6 .
Our study has some limitations. First is the possible selection bias, given that 60 patients refused participation for specific reasons, and some hundred or more did not participate for unknown reasons. Next, the cross-sectional nature of the design might result in a heterogeneous study population because of variation in the time elapsed since surgery. One of the recommendations in the review by Cardoso et al. 5 involved the importance of standardizing the timing of image capture. In our study, cosmetic outcomes were evaluated after at least 3 years of follow-up, because progression of fibrosis is known to play a part in cosmesis. Images should be acquired before any treatment and at 1 year after radiotherapy. Ideally, images should be repeated at 5 and 10 years' follow-up. Obtaining long-term follow-up pictures as often as possible is important, because the esthetic result continues to change over time 2,3 . Another possible flaw in the study is the low number of patients included as a reference group for the items in the logistic regression analysis. Consequently, the corresponding 95% cis are relatively broad, and thus the level of the association for items with a "poor" result on the Harvard scale might have low reliability.

CONCLUSIONS
Scores for cosmetic outcome after bct differ between patients and professionals. Agreement between the professionals is greater than agreement between the patients and the professionals as a group. Size and shape are the most prominent items on which cosmetic outcome is judged. Further research is desirable to develop a good, simple tool to score the cosmetic outcome of bct more extensively during the entire follow-up period.
Notably, a discussion of the patient's expectations of the cosmetic effects of bct, before treatment, is important. As a result of the present research, we have built the Harvard scale and the numeric rating scale into our electronic patient file to evaluate cosmetic outcome scores by patients and professionals during the follow-up process. By taking that step, discussion about the cosmetic effects of bct will most probably become easier.

CONFLICT OF INTEREST DISCLOSURES
We have read and understood Current Oncology's policy on disclosing conflicts of interest, and we declare that we have none.   Unable to calculate because of low numbers. c Calculated to estimate the relative risk of a poor Harvard score (that is, the fair and poor ratings combined) with a higher rating on the 6-item questionnaire. OR = odds ratio; CI = confidence interval.