Combined performance of physical examination , mammography , and ultrasonography for breast cancer screening among Chinese women : a follow-up study

Breast cancer is by far the most common female cancer (21.6/100,000) and one of the primary causes of death (5.7/100,000) among Chinese women, with an estimated 169,452 new cases and 44,908 deaths in 2008 1. The total annual number of new cases in regions with limited health resources continues to increase 1,2. The causes of breast cancer remain unclear. Early detection is believed to be the best means of reducing mortality and improving postoperative quality of life for patients with breast cancer 3. Mammography (mam) is the only evidence-based screening technology available for that purpose, as verified by randomized clinical trials in Western countries 4. Although organizations have released conflicting recommendations on the risks and benefits of mam in clinical and societal contexts 5, many Western countries have successfully implemented mass mam screening for breast cancer 6. However, evidence to justify population-based breast cancer screening by mam for women in China is currently lacking. To date, no randomized controlled trials of mam screening have been conducted among Chinese women, and evidence that such screening would reduce breast cancer mortality in China is insufficient. One disadvantage of mam is its limitations in detecting small lumps in dense breast tissue, because the dense parenchyma in women before menopause can obscure tumour shadows 7,8. Dense parenchyma is common in Chinese women 9. Additionally, mass mam screening in China would likely not be cost effective. Wong and colleagues 10 developed a Markov model to simulate mam screening, breast cancer ABSTRACT


INTRODUCTION
Breast cancer is by far the most common female cancer (21.6/100,000) and one of the primary causes of death (5.7/100,000) among Chinese women, with an estimated 169,452 new cases and 44,908 deaths in 2008 1 .The total annual number of new cases in regions with limited health resources continues to increase 1,2 .
The causes of breast cancer remain unclear.Early detection is believed to be the best means of reducing mortality and improving postoperative quality of life for patients with breast cancer 3 .Mammography (mam) is the only evidence-based screening technology available for that purpose, as verified by randomized clinical trials in Western countries 4 .Although organizations have released conflicting recommendations on the risks and benefits of mam in clinical and societal contexts 5 , many Western countries have successfully implemented mass mam screening for breast cancer 6 .However, evidence to justify population-based breast cancer screening by mam for women in China is currently lacking.To date, no randomized controlled trials of mam screening have been conducted among Chinese women, and evidence that such screening would reduce breast cancer mortality in China is insufficient.
One disadvantage of mam is its limitations in detecting small lumps in dense breast tissue, because the dense parenchyma in women before menopause can obscure tumour shadows 7,8 .Dense parenchyma is common in Chinese women 9 .Additionally, mass mam screening in China would likely not be cost effective.Wong and colleagues 10 developed a Markov model to simulate mam screening, breast cancer

Objective
We aimed to determine which combination of physical examination (pe), mammography (mam), and ultrasonography (us) would optimize breast cancer detection in China.

Methods
We conducted a trial of screening with pe, mam, and us among Chinese women 25 years of age and older.All initial screenings using the three modalities were completed within 30 days of each other, and subjects were followed approximately 1 year later.The performances of the three screening methods used alone, in parallel, or in series were compared.Data were analyzed using exact confidence intervals (cis) and the McNemar test.

Results
Between March 2009 and July 2011, 3028 eligible women completed all study examinations.At a mean follow-up of 1.3 years, 33 breast cancers were identified in the study population.Mammography detected 28 cancers; us, 24 cancers; and pe, 22 cancers.During the follow-up period, 2 false-negative cases occurred clinically.The highest sensitivity for breast cancer screening (93.9%) was achieved by paralleling mam with us, but came at the cost of a higher recall rate (12.15%).
Using us alone at the first stage, followed by mam when indicated, offered high specificity (99.4%) and the lowest recall rate (1.82%), which were not reached at the expense of sensitivity (84.8%).Used in series, us and mam achieved a sensitivity similar to that for the same modalities used in parallel (McNemar p > 0.05).
diagnosis, and treatment in a hypothetical cohort of Chinese women and reported that, compared with no screening, at least US$61,600 would be spent for each quality-adjusted life year, which is significantly higher than the standard of US$50,000 per qualityadjusted life year accepted by the World Health Organization 11 .As a result, mass mam screening has not been recommended in China, especially in regions with limited health resources.
Apart from mam, physical examination (pe) and ultrasonography (us) are the techniques most commonly used to diagnose breast cancer.So far, no studies have proved a survival benefit for screening by pe, whether performed by physicians or patients, in Asian countries [12][13][14] .Ultrasonography has been used widely in China because it is safe, cheap, and convenient, and it detects lumps more sensitively in dense breast tissue 15 .However, although us has been suggested both in the West and in Asia as an adjunct tool to elevate the cancer detection rate of mam or pe alone 7,8,16,17 , its utility as a population-based breast cancer screening method is doubtful because of its high rates of both false positives and false negatives 18 .To avoid such problems, combined screening modalities, including parallel and series approaches, are currently used in Asian countries.In developed regions (Japan, for example), screening mam paralleled with us is recommended for breast cancer control 16 ; in developing regions (for instance, China), mass screenings begin with pe, which is followed with imaging techniques (mam or us) if suspicious symptoms are found 19 .
However, the screening performances of various combination approaches (parallel and series) of pe, mam, and us for breast cancer have not been compared in a Chinese population.Therefore, to determine the combination that optimizes breast cancer detection in China and other Asian countries with limited health resources, we conducted a blind comparison trial with a 1-year follow-up.

Study Subjects
Study subjects consisted of an organized screening population (orsp-healthy physical examinees recruited from the Qingyang community in Chengdu by local government) and an opportunistic screening population (opsp-outpatients recruited from the Chengdu Women's and Children's Central Hospital).Women who were 25 years of age or older, who had resided in local communities for more than 3 years, and who were willing to participate in the screening were eligible for the study.Pregnant or lactating women were asked to defer their participation.Women with existing untreated malignancies, known metastatic disease, or psychiatric conditions that precluded fully informed consent were excluded from the study.

Study Protocol and Screening Procedure
The study protocol was reviewed and approved by the Institutional Review Board of Sichuan University, and written informed consent was obtained from all study participants.The trial consisted of two rounds of breast cancer screening.At the primary screening, all eligible women underwent pe, mam, and us within a period of 30 days.One year later, a second screening with pe was performed to detect false-negative cases.

Mammography
All mam examinations were performed using a Giotto mam unit (Giotto 2000: I.M.S. Internazionale Medico Scientifica, Bologna, Italy) and a dedicated cassette (Fujifilm, Tokyo, Japan).Conventional 4-view film or screen mammograms (mediolateral-oblique and craniocaudal views) were obtained and read independently by 2 radiologists who had more than 10 years of clinical experience in breast imaging.Magnified spot views were added if necessary.The reader was blinded to the results of other screening examinations.Diagnoses by mam were coded according to the Breast Imaging Reporting and Data System (bi-rads) categories: 1, negative; 2, benign findings; 3, probably benign findings, short-term follow-up recommended; 4, suspicious abnormality, biopsy recommended; and 5, highly suggestive of malignancy.

Ultrasonography
An experienced physician who was blinded to the results of the other screening examinations used a color Doppler us system with a 12-MHz transducer (Siemens G60S: Siemens Medical Solutions, Malvern, PA, U.S.A.) to systematically examine the entire breast and regional lymphatic areas.The examiner had more than 10 years of clinical experience in breast imaging.The us images were also interpreted using the bi-rads categories.

Physical Examination
Clinical pe of the breasts and regional lymphatic areas (including the lateral and medial borders and axilla) was performed by a surgeon who had more than 20 years of clinical experience in breast diseases.The examiner was blinded to the results of the mam and us examinations.The pe findings were classified into 5 categories: 1, negative; 2, benign findings; 3, probably benign findings; 4, suspicious abnormality; and 5, highly suggestive of malignancy 17 .

Breast Biopsies
Women in diagnostic categories 3-5 after the first screening were recalled to undergo the next examination.The diagnoses of women who were rated category 3 by any screening technique were compared with the results of the other two examinations to determine whether short-term follow-up (repeat examination after 6 months) or biopsy was indicated.Women who were rated category 4 or 5 by any of the screening techniques were recommended to undergo a biopsy.Fine-needle aspiration cytology, core-needle biopsy, or surgical biopsy was selected according to the patient's condition.Not all solid masses were subjected to biopsy.

Follow-Up and Validation of Examination Diagnoses
All normal and benign results were followed by a telephone interview and a second round of screening with pe.For each subject, the final diagnosis was reached by either histology (mostly for positive outcomes) or follow-up (mostly for negative outcomes).If no evidence of breast cancer was found at the primary screening, but breast cancer was identified clinically (for example, it became palpable) during the 1-year follow-up or at the second screening, we reviewed the follow-up data to determine whether it constituted a false negative case or a new tumour.

Clinical Management
All breast cancers detected in the present study were transferred to West China Hospital or Sichuan Cancer Hospital and treated according to the criteria of the National Comprehensive Cancer Network.

Screening Modalities
We designed 11 different screening modalities (single, in parallel, and in series) and tested them statistically using data from the 3 initial examinations and the follow-up.For the single and parallel modalities, we regarded women who were rated category 4 or 5 by any screening method as positive cases and recommended that they undergo biopsy; for series modalities, we regarded women who were rated category 3 or higher at the first stage of screening as suspicious cases, and we assumed that they would undergo the next stage of screening.Diagnostic categories 1-3 for each screening modality were defined as negative results; categories 4 and 5 were defined as positive results.

Statistical Analysis
Data were input into EpiData 3.1 (EpiData Association, Odense, Denmark) with dual-entry verification, consistency, and logic error checking.Statistical analyses were performed using the SPSS software application (version 18.0: SPSS, Chicago, IL, U.S.A.), with p < 0.05 as the threshold of significance.Descriptive analyses used analysis of variance and the chi-square test to estimate the statistical differences in the risk factors for breast cancer of the women of various demographics undergoing the screening trial.The tumour sizes, lymph node statuses, and TNM stages of breast cancers detected in screening populations were compared by chi-square test.Using the results of biopsies and 1-year follow-ups as the "gold standard," the number of true positives, true negatives, false positives, and false negatives for each screening method or each combination (parallel or series) was determined.The cancer detection rate and the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of each screening modality for breast cancer were also calculated.McNemar tests were performed to compare the statistical differences in sensitivity and specificity between screening modalities for subgroups of women identified by age, body mass index, and breast density.

General Characteristics of Study Subjects
Between March 2009 and July 2011, 3028 eligible women (orsp and opsp) underwent 2 rounds of breast cancer screening, and all were subsequently followed for more than 1 year (mean observation period: 1.3 years).Compared with subjects from the opsp, those from the orsp were younger and more likely to have a lower body mass index and dense breasts.The proportions of postmenopausal women and of women with histories of benign breast disease were higher in the opsp group than in the orsp group (Table i).
Among the 3028 eligible women, 33 with breast cancer (1.1%) were indentified, including 18 (54.5%)with early-stage disease (TNM 0, i, and ii).Another 840 women (27.7%) were found to have benign breast disease.Compared with the women having benign or normal results, the women with cancer were significantly older and had a higher mean body mass index.Compared with women in the benign and normal groups, those in the group found to have cancer were more likely to have fatty breasts (bi-rads density category 1) and to be postmenopausal.Women found to have benign breast disease were particularly liable to relapse (

Breast Cancers Detected by the Various Screening Methods
Compared with breast cancers detected in the opsp, cancers detected in the orsp were mostly identified in dense breasts (bi-rads density categories 2-4) and were significantly smaller and at earlier TNM stages ( Of the 33 breast cancers, 31 (93.9%) were detected by primary screening.The other 2 were identified clinically during the follow-up period and were considered false negatives, because those women had undergone 3 examinations during the first round of screening, each with a negative result.At the second round of screening in October 2010, 1 woman with a pathology diagnosis of carcinoma in situ (TNM 0) was found; we regarded that finding as a new case and excluded it from the statistical analysis.

Performance Characteristics of Screening Modalities
Our study analyzed 11 screening modalities, including single modalities, parallel modalities, and series modalities (Table iii).Among the single modalities, mam identified the fewest suspicious cases (n = 105) and detected the most cancers (n = 28).More category 3 results were diagnosed by pe and us, which could not discern benign from malignant lumps.The sensitivity of mam for breast malignancy was 84.8%, which was higher than that for us (72.7%) or pe (66.7%).However, the specificity of mam (98.4%) was lower than that for us (99.4%) or pe (99.2%).
Compared with the single modalities, the 3 parallel modality combinations were more sensitive and detected more cancers, but their specificities were lower and more suspicious cases were recalled.The 4 series modality combinations had the highest specificities and lowest recall numbers, but their sensitivities and cancer detection rates were lower.Modality 6 (mam in parallel with us) identified the most cancer cases (31 of 33) and had the highest sensitivity (93.9%), but at the cost of a higher recall rate (12.15%).Modality 9 (us alone at the start, and then mam when indicated) offered high specificity (99.4%) and the lowest recall rate (1.82%); it was also comparatively sensitive (84.8%).The sensitivity (84.4%) and cancer detection rate (0.92%) of modality 9 were higher than those of modality 3 (us alone) and equivalent to those of modality 2 (mam alone).Modality 11, a combination of pe, mam, and us, did not improve screening performance compared with modality 9 (Table iii).
We observed no significant difference in sensitivity between modality 6 and modality 9 for any subgroup of women identified by age, body mass index, or breast density.Modality 9 had a higher specificity than modality 6 in most subgroups (Table iv).

DISCUSSION
Although China has a relatively low incidence of breast cancer (21.6/100,000), the incidence has recently climbed rapidly, as has the proportion of young women diagnosed with breast cancer 1,2 .The purpose of screening is to detect cancer at an early stage (while the tumour is small and lymph nodes are not yet involved), thereby improving postoperative quality of life, reducing mortality, and lightening the financial burden of prolonged cancer treatment.However, to date, no guidelines have been set for early breast cancer detection in China.
The screening strategies of Western countries may not be suitable for China because of the many differences between Asian and Caucasian women with respect to the physiology of mammary glands (breast size and density) and the clinical characteristics of breast cancers (peak age incidence) 9 .The strategies of wealthier countries, including developed Asian countries, may also be unsuitable for China because of differences in health care resources.Thus, we conducted a prospective blind trial to determine the screening modality or combination of modalities that optimizes breast cancer detection in China.
Our study identified 33 cancers, with 28 being detected by mam, 24 by us, and 22 by pe.Our results show that mam is more sensitive than us or pe among Chinese women, although no method found all of the cancers, which is consistent with previously published studies in Asian women 20,21 .More early-stage cancers with microcalcifications, small size, and no lymph node involvement were identified by mam, which is consistent with other evidence that has made mam the method of choice for breast cancer screening in Western countries 3 .However, 3 cancers in dense breasts were detected by us but not by mam in our trial.Physical examination did not identify cancers missed by mam.Ultrasonography has proven effective in detecting clinically and mammographically occult cancers in the dense parenchyma of women before menopause [7][8]16,17 , a common circumstance among Chinese women with breast cancer. And u is a useful tool for differentiating cystic lesions from solid tumours in the breast, potentially lowering the rate of unnecessary biopsies for benign lesions by up to 25% 22 .However, compared with mam, the sensitivity of us depends more on the operator's experience and the equipment used 21,22 .
On the other hand, the recall rate with mam was lower than those with us and pe.The poor capacity of us or pe to differentiate benign from malignant tumours led to higher recall and biopsy rates, which brought about overdiagnosis and overtreatment.False-positive outcomes may affect the well-being and behavior of women and may also lead to greater expense 23 .Thus, the use of us or pe alone as a screening method for the general population is not practical at present because of inadequate sensitivity and unsatisfactory false-positive rates.
We investigated whether a combination of screening methods might be superior to any single modality.We analyzed 8 combinations of screening modalities, including parallel and series approaches, to determine which combination achieves both high sensitivity and high specificity among Chinese women.Compared with the single modalities, the parallel combinations increased sensitivity and decreased specificity, and the series modes decreased sensitivity and increased specificity.Of the combined modalities, the highest sensitivity and highest cancer detection rate were achieved by mam in parallel with us (modality 6), a result that accords with results from previous studies in Asia 16,17,24 .It is worthwhile to note that us alone at the first stage, followed by mam when indicated, (modality 9) yields a similar cancer detection rate and a lower recall rate.In modality 9, women with category 3 us results ("probably benign") entered the next stage of screening by mam to determine whether a lump was malignant, reducing the false-positive and false-negative diagnoses.Women who were rated bi-rads category 3 or higher by the given screening modality were recalled for further examination (either short-term follow-up and re-examination, or biopsy).The screening accuracies (sensitivity and specificity) of both modality 6 (mam and us in parallel) and modality 9 (us followed by mam when indicated) were acceptable.However, accuracy is not the only relevant factor in making a modality practical and beneficial for nationwide mass cancer screening.Other factors that need to be considered include the modality's likely effects on breast cancer mortality, financial cost, population selection, technical disparities, appropriate intervals between screenings, and optimal start age for screening.Reducing breast cancer mortality is the most important index of cancer screening, but it takes several decades to show true effectiveness.

Current OnCOlOgy-VOlume
With respect to screening cost, mam in China costs about 240 yuan (approximately US$36.00), while us costs about 90 yuan (approximately US$13.50).Thus, modality 6 would be the most expensive of the screening modalities tested and could likely be implemented only in a few developed regions in China.The screening cost of modality 9 would be lower than that of mam alone (modality 2), and we expect that it would result in better screening compliance because us requires less exposure to radiation than mam and is available at more hospitals 25 .However, economic evaluations are needed to further investigate whether modality 9 would be suitable as a national breast cancer screening strategy for a general population or for high-risk populations in developing or underdeveloped regions in China and other Asian countries.
Our trial included both orsp and opsp.More earlystage cancers of small size and with an absence of lymph node involvement were detected in the orsp, but the overall breast cancer detection rate was higher in the opsp.Further evaluations are needed to determine which screening mode is more cost effective.
The strengths of the present study include the use of blinding to avoid examiner bias, and the fact that screening performances were calculated on the basis of either histology or follow-up results, which revealed the true performances of the various screening modalities.However, the study also has several limitations.First, it was conducted at a single site, which might have resulted in selection bias.Large-sample and multicentre clinical studies, or a meta-analysis, are needed to verify the general applicability of our findings.Second, the proportion of early-stage cancers detected in our study was small, which may have lessened the power of the study to compare the performances of various screening modalities.Third, some subgroup analyses were based on small sample sizes, which limited their statistical power.

CONCLUSIONS
Screening with us alone at the first stage, followed by mam when indicated, may optimize breast cancer detection in regions of China with limited health resources.This finding may also be applicable to other developing East Asian countries.However, analyses of cost effectiveness and survival benefits are needed to more clearly address whether such a strategy would ultimately reduce mortality in a cost-effective way.

b
Calculated as true positives divided by the sum of true positives and false negatives.c Calculated as true negatives divided by the sum of true negatives and false positives.d Calculated as true positives divided by the sum of true positives and false positives.e Calculated as true negatives divided by the sum of true negatives and false negatives.f Calculated as the sum of the true positives and true negatives divided by all detections.ci = confidence interval; pe = physical examination; mam = mammography; us = ultrasonography.Current OnCOlOgy-VOlume 19, Supplement 2, July 2012 Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).

Table i ). Current OnCOlOgy-VOlume 19, Supplement 2, July 2012
Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).aWithstandarddeviation.bByanalysis of variance.cp<0.05-That is, statistically significant difference between the subgroups.dBychi-squaretest.eByFisherexact test.bi-rads=BreastImaging Reporting and Data System (categories: 1, negative; 2, benign findings; 3, probably benign findings, short-term follow-up recommended; 4, suspicious abnormality, biopsy recommended; and 5, highly suggestive of malignancy).Current OnCOlOgy-VOlume 19, Supplement 2, July 2012Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).
Table ii).Of the 33 breast cancers, 22 were recognized by pe, 28 by mam, and 24 by us.The mam modality recognized more breast cancers with small size (<2 cm), negative lymph nodes, and early TNM stage (0, i, or ii).One cancer at TNM stage i was detected only by mam, and not by us or pe.Compared with mam, us showed a poor capacity to detect cancers with negative lymph nodes and at early TNM stages.However, us detected 3 cancers in dense breasts that went undetected by mam and pe.The pe modality detected no cases that were missed by mam, and it missed more cancers within dense breasts.All tumours identified by pe were at least 1 cm in diameter (Tableii).

table ii
Diagnostic yields for 33 breast cancers by screening population and screening method a Proportion of specified screening population within each subgroup.bDetectionratefor the specified screening method within each subgroup.cp<0.05 (Indicates a statistically significant difference between the various screening populations for that subgroup).pe=physicalexamination; mam = mammography; us = ultrasonography; bi-rads = Breast Imaging Reporting and Data System (categories: 1, negative; 2, benign findings; 3, probably benign findings, short-term follow-up recommended; 4, suspicious abnormality, biopsy recommended; and 5, highly suggestive of malignancy).eS27CurrentOnCOlOgy-VOlume19, Supplement 2, July 2012Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).

19, Supplement 2, July 2012
Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).

table iv
Sensitivity and specificity of mammography (mam) paralleled with ultrasonography (us) compared with us followed by mam in various subgroups among 3028 women with confirmed 33 breast cancer cases Not computed because of a small sample number.Modality 6 = mam paralleled with us; Modality 9 = us alone at the first stage, followed by mam when indicated.
a By McNemar test, p < 0.05 for modality 6 compared with modality 9 (statistically significant difference).b eS30 Current OnCOlOgy-VOlume 19, Supplement 2, July 2012 Copyright © 2012 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).