Visual Intratumor Heterogeneity and Breast Tumor Progression

Simple Summary This study investigates the role of visual intratumor heterogeneity (ITH) in breast cancer progression. By analyzing histologic images from the Carolina Breast Cancer Study (CBCS) and the Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data using advanced image processing and machine learning techniques, we developed a measure of tumor heterogeneity based on visual features. Our findings indicate that tumors with low visual heterogeneity exhibited a higher risk of recurrence and were more likely to come from patients whose tumors comprised of only one subclone or had a TP53 mutation. Conversely, high visual heterogeneity was correlated with a more favorable prognosis. These results suggest that visual heterogeneity provides complementary information to molecular markers. A comprehensive understanding of both the visual and molecular aspects of heterogeneity has the potential to offer novel insights for treatment strategies. Abstract High intratumoral heterogeneity is thought to be a poor prognostic indicator. However, the source of heterogeneity may also be important, as genomic heterogeneity is not always reflected in histologic or ‘visual’ heterogeneity. We aimed to develop a predictor of histologic heterogeneity and evaluate its association with outcomes and molecular heterogeneity. We used VGG16 to train an image classifier to identify unique, patient-specific visual features in 1655 breast tumors (5907 core images) from the Carolina Breast Cancer Study (CBCS). Extracted features for images, as well as the epithelial and stromal image components, were hierarchically clustered, and visual heterogeneity was defined as a greater distance between images from the same patient. We assessed the association between visual heterogeneity, clinical features, and DNA-based molecular heterogeneity using generalized linear models, and we used Cox models to estimate the association between visual heterogeneity and tumor recurrence. Basal-like and ER-negative tumors were more likely to have low visual heterogeneity, as were the tumors from younger and Black women. Less heterogeneous tumors had a higher risk of recurrence (hazard ratio = 1.62, 95% confidence interval = 1.22–2.16), and were more likely to come from patients whose tumors were comprised of only one subclone or had a TP53 mutation. Associations were similar regardless of whether the image was based on stroma, epithelium, or both. Histologic heterogeneity adds complementary information to commonly used molecular indicators, with low heterogeneity predicting worse outcomes. Future work integrating multiple sources of heterogeneity may provide a more comprehensive understanding of tumor progression.


Introduction
Breast tumor evolution has been described using an evolutionary model, wherein successive selective sweeps result in more aggressive phenotypes [1,2].Some have hypothesized that heterogeneity across a tumor mass would offer a selective advantage, with some clones having 'fitness' to resist cell death or to increase proliferation.Accordingly, at least one previous study has suggested that intratumor heterogeneity (as measured by genetic markers or ER status) is associated with poor prognosis [3,4].However, the scope and timing of observed heterogeneity may also have significance for outcomes.Early in cancer development, heterogeneity provides a wide range of phenotypes that permit evolutionary selection [5]; later, heterogeneity potentially preserves therapy-sensitive cells and is the basis for 'adaptive therapy' regimens [6].
In addition to temporal dynamics, heterogeneity can occur on multiple scales (e.g., genomic, epigenetic, microenvironmental).For instance, different tumor cell populations may carry distinct mutations and copy number alterations [7], while the same mutation can give rise to disparate phenotypes depending on the cell of origin [8].Assessing heterogeneity at multiple levels is therefore important for understanding potential effects on cancer behavior and outcome.Previous research has primarily focused on measurement of heterogeneity at the cellular and genomic levels, in large part because spatial resolution cannot be resolved from bulk sequencing [9,10].Single-cell and spatially resolved transcriptomic studies provide an alternative, and have been used to map phenomena such as immune cell infiltration [11,12], evolution of copy number instability [13], and cell-cell interactions within tumors [14], but they remain technically challenging and difficult to apply in large samples.Thus, there remains a need for additional tools that can measure broader, tissue-level scales of heterogeneity from routinely collected tumor data, such as hematoxylin and eosin (H&E)-stained tissue microarray (TMA) core images.In this analysis, we aimed to (1) develop a flexible, visual (as opposed to molecular) indicator of tumor heterogeneity and (2) explore associations between histologic heterogeneity, clinical characteristics, and outcomes.Additionally, most studies of heterogeneity have focused on genetic or molecular markers of heterogeneity, and few studies have evaluated the histologic/visual heterogeneity of tumors as a function of stage and other prognostic indicators.Histologic heterogeneity reflects a different scale of tumor evolution relative to molecular assessments, providing an indication of the spatial distribution of evolving phenotypes, and it may therefore add value to other assessments of heterogeneity.
To evaluate histologic heterogeneity, we studied breast cancer hematoxylin and eosin (H&E)-stained tissue microarray (TMA) core images using a population-based sample of incident invasive breast cancers.Using data from the Carolina Breast Cancer Study (CBCS), a population-based study representing the range of tumor stage and molecular subtype in North Carolina and oversampling young women and Black women to ensure representation of these two understudied groups, we evaluated multiple TMA images from each patient (n = 3 cores per patient on average, n = 1655 patients) to identify histologic heterogeneity between cores.Using computer vision tools, we quantitatively characterized visual intraiumor heterogeneity (visual ITH) using a Normalized Merge Level (NML) score that reflects dissimilarity across image features.High NML scores can be interpreted as a low level of visual similarity within a single patient relative to the level of similarity between different patients.A binary variable, visual ITH group (homogeneous vs. heterogeneous), was created based on the NML score to evaluate how visual ITH related to clinical variables, such as recurrence, ER status, and tumor grade, and molecular indicators of heterogeneity, such as number of subclones.

Study Population
The study population of the Carolina Breast Cancer Study Phase 3 (CBCS3) was sampled using Rapid Case Ascertainment with oversampling for Black women and women under the of age 50, both of whom tend to be under-represented in breast cancer research data.All cases (aged 20-74) were diagnosed with first, primary invasive breast cancer between 2008 and 2013.The study population and methods have been described in detail elsewhere [15][16][17][18][19][20].The current analytic sample was restricted to cases with at least two pre-treatment H&E-stained 1 mm cores on tissue microarrays (TMA), with most cases having three TMA cores (n = 3 cores per case, on average).A total of 1655 unique patients with 5907 TMA cores in CBCS3 were included.All study procedures were approved by the University of North Carolina (UNC) School of Medicine Institutional Review Board, and patients provided written informed consent.
Recurrence data were available for CBCS3.Recurrence-free survival (RFS) was defined as the time between date of diagnosis to first local, regional, or distant recurrent breast cancer and verified through medical record review.Follow-up was complete through October 2019, with at least a 5-year follow-up for all women.Among the 1655 participants, 171 recurrences were identified, with missing recurrence observations for 363 participants.
We also applied visual ITH analysis to the Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data, which have been described in previous studies [21,22].A total of 1102 patients, which have both whole-slide images and clinical data available, were included in the analysis.For each patient, a breast pathologist looked at the corresponding whole-slide image and identified 2 to 8 'virtual cores', each of which is approximately the same size as the CBCS TMA core.Therefore, the analysis involved 1102 patients with 6556 virtual TMA cores.All TCGA samples were processed under the approval of the respective Institutional Review Boards, and patients provided written informed consent.The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga(accessed on 4 February 2022).

Color Normalization and Image Segmentation
Color normalization [23] (see Figure 1A) was first applied to standardize the stain color intensity of the TMA core images because many factors (e.g., different patient populations, different scanners, different staining techniques) can cause artifacts that are undesirable in building a generalizable machine learning model.The appearance of the tissue samples can vary greatly without normalization, making it difficult for a model trained on one cohort to generalize to others.
An image segmentation method [24] was then applied using QuPath 0.3.2[25] to classify each pixel of a TMA core into one of the three classes: epithelium, stroma, or background.Image segmentation enables the study of visual characteristics of different tissue types (especially epithelium and stroma).Adipocyte tissue was combined with background as one class when applying segmentation.The segmentation resulted in three sets of images for each core: (1) the original image with background removed; (2) an epithelial region of interest image; and (3) a stroma-only region of interest with epithelium excluded.Examples of the three types of images we used in this study are shown in Figure 1B.To avoid confusion, when we refer to the tumor, we mean the epitheliumenriched but stroma-and immune-inclusive components of cellularity.When we refer to the epithelium, we are specifically indicating those components of the tumor that are epithelial in morphology.The first step in this pipeline is color normalization, which uses the method proposed in [23] to remove undesirable variations in the core images.
(B) The normalized core images were then segmented [24] into three types of core images: original, epithelium, and stroma.(C) Since the core images are large, directly dealing with them is not feasible.Each core image was divided into smaller patches of 200 × 200 pixel size in this step.(D,E) Patches with more than 90% background and patches with artifacts were removed to keep only informative patches.(F) Convolutional Neural Networks (VGG16 [26]) were used to extract 512-dimensional feature vectors from the patches.(G) Feature vectors of all patches of a core were averaged to obtain one feature vector for each core image.

Core Feature Extraction
Convolutional Neural Networks (CNNs) are widely applied to extract image features from medical images and have been successful in many downstream tasks compared to cell-by-cell morphology features [27][28][29][30][31][32].However, directly applying CNN models to core images is infeasible due to the large size (2600 × 2600 pixels on average per core).Thus, we split the cores into smaller 200 × 200 patch pixels (Figure 1C).Patches with more than 90% background and patches with artifacts (e.g., bubbles, tissue folding, etc., such as Figure 1C) were removed (Figure 1D,E).Remaining patches were input into a CNN model, VGG16 [26], to extract feature vectors.The VGG16 model was pre-trained on a large image dataset: ImageNet [33].Each patch was summarized as one 512-dimensional feature vector by the VGG16 model, and the feature vector of each core was generated by averaging the feature vectors of all the patches of the core (Figure 1F,G).Therefore, each core image was represented by a 512-dimensional feature vector after the feature extraction process.The same procedure was applied to all three types of core images (original, epithelium, and stroma) to extract core feature vectors.

Feature Clustering
To view the visual ITH, the degree to which the tumor was visually dissimilar across different spatially sampled regions, we applied a hierarchical clustering method [34] on the extracted core features and set the total number of clusters to the number of patients in the study population (n = 1655).Each core was assigned a cluster label by the clustering method based on the similarity of core features.We used Euclidean distance with Ward's linkage for the analysis [35].
Patients were defined as having low visual ITH if the core features were similar to those of the same patient (low within-person variability), but different from those of a different patient (high between-person variability).If the histologic features of a patient are visual intratumor heterogeneous, the clustering result will not aggregate cores from the same individual patient, showing random clustering patterns instead.These patients had high visual ITH.

Measure of Visual Intratumor Heterogeneity
To quantify visual ITH across cores for a patient, we developed a Normalized Merge Level (NML) score.This method is based on hierarchical clustering results and can be viewed as a tree, with each leaf node representing one core and the root node representing one big cluster that contains all the cores.The clustering result is a specific level of the tree that has 1655 (the number of patients in the CBCS3 data) clusters.If a patient has homogeneous histologic features, its cores will merge at a low level of the tree, indicating that the core features of the patient are highly similar.However, if the tumor of a patient is visually heterogeneous, the cores will merge at a relatively high level of the tree.Therefore, the merge level of a patient, which is defined as the level of the tree where all cores of a patient merge, can be a measure of visual ITH, with a high merge level value representing heterogeneity and a low value indicating homogeneity (see a toy example in Figure 2).

Core ID
For Subject 1 with cores of IDs: {2,5,8}, the merge level is 4.For a merge level with different numbers of cores per patient, it is important to address how criterion merge levels vary with different numbers of cores.Patients with more cores tend to have larger merge levels, while patients with fewer cores tend to have smaller ones, which causes bias in the criterion in measuring visual ITH.To alleviate the bias, we normalized the merge level criterion by dividing by the number of cores of the patient, which we refer to here as the Normalized Merge Level (NML).Based on the clustering tree, the NML score of each patient can be calculated to represent its visual ITH score, with higher visual ITH representing more heterogeneity.
Since each core includes a different representation of different tissue types (Figure 1B), the clustering method was applied separately to original images as well as QuPath segmented epithelial and stroma components to calculate three visual ITH NML scores for each patient.These visual ITH scores were the basis for our visual ITH group variable, a binary variable based on visual ITH score.

Measure of Molecular Heterogeneity
DNA from 344 formalin-fixed paraffin-embedded tumor cores and paired normal blood samples (where available) was isolated and sequenced using a custom Agilent panel, targeting exons of 1200 genes (UNCSeq) at an average of 400X depth.Paired-end FASTQ files were generated from runs on Illumina sequencers (NovaSeq, HiSeq, NextSeq, or MiSeq-Nano) using standard protocols at UNC, and then aligned to the human reference genome 38 (GrCh38) with BWAmem.After sorting and indexing, we realigned paired tumor and normal BAMs with ABRA2 [36] and calculated somatic variants with Mutect2 [37], Strelka2 [38], and Cadabra [36].We removed any variants covered by fewer than 15 total reads, supported by <5 alternate reads, or with a variant allele frequency <15%, as the probability of false positive calls is higher for such variants.We also calculated allelespecific copy number with cnvkit [39].Aligned BAM files from normal samples were used to generate a pooled reference of expected sequencing coverage across target regions; then, we compared this with tumor coverage to estimate copy number.Change-points were detected using circular binary segmentation.Tumors with a high standard deviation (>2) and median absolute deviation (>1) read coverage were removed from the copy number analyses (N = 33), leaving 344 tumors profiled for mutations and 311 for copy number.
We hypothesized that tumors with low visual ITH would also show low molecular ITH, defined as being comprised of fewer tumor subclones.Therefore, we estimated tumor subclone number and composition using PyClone-VI [40], a Bayesian algorithm which clusters mutations into distinct subclonal populations according to variant allele frequency and allele-specific copy number.After mapping each mutation to the overlapping copy number segment from the same sample, we clustered mutations with PyClone-VI and extracted the total number of clusters observed in each sample.Because few samples had more than two subclones, we dichotomized clonality as 'one subclone' or 'more than one subclone'.In addition, given our interest in the TP53 gene as a key driver of tumor heterogeneity and clonal expansion, we categorized tumors as 'TP53 mutation' versus 'no TP53 mutation'.

Statistical Analysis
The patients in this study population were divided into two groups based on the visual ITH score: low visual ITH versus high visual ITH (heterogeneous).Kernel density estimation was applied to estimate the distribution of the visual ITH scores, and then a cutoff was selected to divide the scores into two groups.If there was an obvious bimodal pattern, the cutoff was selected as the value that separates the two modes.If not, the median was selected as the cutoff.For CBCS3, an obvious bimodal pattern was observed, and the cutoff was set at 3.75.For TCGA-BRCA, no such pattern was observed, so the median cutoff of 1.28 was used.The relationship between the visual ITH group of a patient and their clinical variables were analyzed by using a generalized linear model.Three different visual ITH binarized variables were created based on the three types of core images (original, epithelium, and stroma) and assessed with the following statistical analyses.
Generalized linear models were used to calculate Relative Frequency Differences (RFDs) and corresponding 95% confidence intervals (CIs) as measures of association between visual ITH groups and variables of interest.RFDs were estimated based on a general linear model with binomial distribution and identity link, and were interpretable as the percentage difference in heterogeneity between index and referent groups.The following variables were studied in association with visual ITH group: age at diagnosis (≥50, <50), race (self-reported Black, non-Black (>98% white)), tumor grade (low, intermediate, high), PAM50 risk of recurrence (ROR) (low, medium, high), tumor size (≤2 cm, >2 cm), PAM50 intrinsic breast cancer subtype (Luminal A, Luminal B, HER2-enriched, basal-like, normallike) [41], ER status (negative, positive), node status (negative, positive), and previously identified immune subtypes [42].Multivariate models were adjusted for age and race, or the models were adjusted for node status, tumor size, ER status, and tumor grade.Kaplan-Meier curves were used to compare mean time to recurrence between the visual ITH groups.Hazard ratios (HRs) and 95% CIs were calculated using Cox proportional hazard models.The assumption of proportionality was assessed via the Wald p-value.Associations between visual ITH and molecular indicators (subclone number and TP53 mutation) were also assessed but, given the smaller number of patients with available DNA data, we used logistic regression to ensure model convergence.
To evaluate the generalizability of this approach to a different dataset, we calculated the visual ITH scores for patients using TCGA-BRCA data.The scores were calculated based on histopathology for 1102 patients, with 6556 representative core TMA images.Visual ITH scores based on original TMA images were generated for statistical analysis to compare with the corresponding results based on CBCS3 data.Molecular heterogeneity was again defined based on the number of tumor subclones (one subclone vs. more than one; with all mutations from whole-exome sequencing clustered in PyClone-VI) and TP53 mutation status (mutation vs. no mutation).All statistical analyses were performed in R version 4.1.1.

Visual ITH, Patient, and Tumor Characteristic
We calculated visual ITH scores based on histopathology for 1655 patients, with 5907 representative TMA images from CBCS3.We extracted image features from these core images three times, once for the original image and separately for masked images that contained only epithelial or stromal compartments.The visual ITH score used tree-cluster distance, adjusted for number of cores per patient.Examples of participant cores with high visual ITH scores (heterogeneous group) and low visual ITH scores (homogeneous group) are shown in Figure 3.There is visual similarity between cores of the same patient in the left panel, while there is no obvious similarity on the right.We found that in original images, 729 (44%) had visually heterogeneous tumors, with similar results for epithelium (45%) and a slightly higher frequency of visual ITH for stroma (51%).Visual ITH from original images was evaluated in association with patient age at diagnosis, race, tumor grade, ROR, PAM50, ER, and recurrence in CBCS3 (Figure 4).The frequencies, percentages, Relative Frequency Differences (RFDs), and corresponding 95% CIs were calculated based on models without adjustment (reduced), and after adjusting for age and race (Adjust 1); and tumor grade, ER status, node status, and tumor size (Adjust 2).Low visual ITH score was associated with younger age, Black race, high tumor grade, high ROR, basal-like PAM50 type, negative ER status, and recurrence.In the reduced model, these associations were significant, though some were attenuated after adjusting for age and race and after adjusting for tumor grade, ER status, node status and tumor size.The associations were mostly significant when adjusted for race and age (Adjust 1) models, while the patterns were mostly not significant when further adjusted for grade, ER, node and size models (Adjust suggesting that visual ITH is a correlate of these factors and is not independent.We performed sensitivity analyses to see if the results differed by the components of core images (stroma and epithelium).Visual ITH based on epithelium and stroma had a similar pattern of association with original image visual ITH (i.e., low visual ITH was associated with younger age, Black race, high tumor grade, high ROR, basal-like PAM50 type, negative ER status, and recurrence (Table 1)).Some associations were not significant when restricted to epithelium or stroma features, but the magnitude and direction of the association was unchanged.Adjusting for age and race did not substantially change the results (Table 2).
Applying the same analysis to TCGA-BRCA, we found that a higher proportion of participants 894 (71%) had high visual ITH.Visual ITH was evaluated in association with patient age at diagnosis, race, ROR, PAM50, ER, and recurrence (Table 3).The patterns of association were similar to CBCS3, with low visual ITH associated with younger age, Black race, high ROR, Basal-like PAM50 type, negative ER status and recurrence; however, the associations with binarized recurrence in TCGA were not significant.We conducted an analysis to study the association between immune subtype (low vs. high) for both the CBCS3 and TCGA-BRCA data.For CBCS3, the quiet and innate classes were combined to form the low group, while the adaptive class constituted the high group.We observed that visual intratumor homogeneity was associated with a higher adaptive immune response in CBCS3; however, this association was not observed in TCGA-BRCA.The results are presented in Table 3.The inconsistent results could be due to the fact that TCGA-BRCA has no obvious immune quiet samples, as the tumors were much more advanced.Therefore, the detected immune response in CBCS3 could result from a contrast between the quiet group and the adaptive group.

Visual ITH and Recurrence-Free Survival
Comparing yes/no recurrence does not account for time to recurrence or loss to follow up, and we therefore also used time-to-event analyses to evaluate the relationship between visual ITH and recurrence.The CBCS3 identified 171 recurrences during the first 5 years of follow up in the study population.We assessed associations between visual ITH and recurrence using both Kaplan-Meier analyses and Cox proportional hazards models, without including treatment as a covariate.We found that lower visual ITH was associated with recurrence (HR = 1.62, 95% CI = 1.22-2.16).The association remained significant when restricting to the epithelial portions of images only, suggesting that higher visual ITH predicted better Recurrence-free survival (RFS) (Figure 5).A similar survival analysis for TCGA-BRCA using the progression-free interval was performed.We observed a tendency towards a similar pattern.Although the trend was in the same direction, with visually homogeneous tumors being more adverse, the association was not significant.The Kaplan-Meier curves are shown in Figure A1 in Appendix A.
Table 3. Comparisons of associations between visual intratumor heterogeneity and clinical characteristics in TCGA-BRCA and CBCS3.Relative Frequency Differences (RFDs) and 95% CIs for patient age, race, ROR, PAM50 type, ER status, immune infiltration class, and recurrence across visual intratumor homogeneous and heterogeneous groups were calculated using generalized linear models.For both datasets, the results were generated based on reduced models without adjustment.Referent groups for each individual model are indicated in the figure, and sample size (N) and percentages are listed.Referent group: visual intratumor heterogeneous group for all models.

Visual ITH, Genomic Instability, and Clonality by DNA Sequencing
To determine whether visual indicators of ITH reflected molecular differences, we assessed associations between visual ITH, number of tumor subclones (clonal vs. multiclonal), and TP53 mutation status (Table 4).Low visual ITH was positively associated with tumor clonality in the original images (OR = 1.54, 95% CI = 0.98-2.45)and stromal segmented images (OR = 1.60, 95% CI = 1.01, 2.53).The association between epithelialbased visual ITH and clonality was null.Low visual ITH also showed a positive association with TP53 mutation status for original (OR = 1.98, 95% CI = 1.26-3.13)and stromal (OR = 1.72, 95% CI = 1.10, 2.70) components.Associations tended to be in the same direction, but were attenuated and non-significant when restricting to the epithelial compartment only, suggesting that the combination of epithelial and stromal features is an important contributor.A possible explanation for the association between good prognosis and stromal-based heterogeneity is that less aggressive tumors have higher proportions of stromal tissue [43], leading to greater potential to observe stromal histologic heterogeneity.In TCGA, where virtual cores were placed in epithelial regions, low-visual-ITH tumors were more likely clonal and TP53-mutated, though both associations were weak and null, as in estimates for CBCS epithelial regions.

Discussion
Tumor evolution over time is an important feature influencing cancer outcomes [2].For example, ER+ breast cancers can develop endocrine resistance in response to therapeutic pressures, rendering endocrine therapy less effective [44], or accrue additional actionable mutations [45].However, few studies have evaluated changes in tissue appearance over time (longitudinally or in cohorts as a function of stage or other markers of progression).Our results in a large, diverse cohort of cancer patients suggest that histologic heterogeneity/homogeneity can be measured in histopathologic images and that the development of homogeneity (i.e., low visual ITH) reflects a late stage of tumor evolution, possibly as a result of a selective sweep and a single cancer phenotype overtaking the histologic field.Tumors with low visual ITH, especially when defined by a combination of epithelium and stroma, had more advanced clinical characteristics, evidence of monoclonality, and were more likely to have TP53 mutations.Another key finding from our analysis was that associations between visual and molecular heterogeneity were strongest for visual components, including stroma.This suggests that the visual shape and boundary of a tumor may be indicators of molecular progression and differentiation, and emphasizes the importance of considering interactions between tumor and non-tumor cells.
Previous studies of intratumor heterogeneity have primarily emphasized molecular markers.For example, in studies by Keenan et al. and Pereira et al., ITH was defined according to Mutant-Allele Tumor Heterogeneity (MATH) score, which is based on the distribution of somatic variant allele frequencies [46,47].These studies found higher genetic ITH to be a poor prognostic indicator, associated with higher risks of recurrence and more aggressive tumor features.While these results may seem to conflict with those from our study, differences may be explained by the scale at which heterogeneity was measured: namely, our study defined heterogeneity at a tissue-level scale rather than a molecular one.Molecular heterogeneity reflects the degree of genetic instability in the tumor; as evolution progresses, DNA repair defects accumulate, favoring error-prone repair and an accumulation of multiple low-frequency variant alleles [48].This heterogeneity (i.e., late heterogeneity) may be secondary to an initial selective sweep for driver mutations like TP53.It has not yet been determined which molecular markers of heterogeneity, or whether early vs. late heterogeneity, have greater prognostic value, as opposed to those that reflect stochastic drift in the population of cancer cells.In contrast, visual or tissuelevel heterogeneity reflects the spatial distribution and organization of both tumor and non-tumor compartments without direct consideration of the identity and phenotypes of component cells.
Prior work has suggested that as tumors progress, the level of spatial heterogeneity decreases while molecular heterogeneity increases [49], in principle, because tumor regions outcompete other cell types and outgrowth precludes the existence of more (visually) complex secondary structures.This conclusion is further supported by the fact that the observed associations in our study tended to be stronger when visual components included both epithelial and stromal regions, such that the score could consider interactions between multiple cell types.Therefore, histologic heterogeneity may represent an independent direction of heterogeneity, capturing tissue-level evidence of whether the tumor has taken on a more unified appearance.This also raises another recent hypothesis: as cancer cells evade homeostatic barriers, they take on more growth independence, possibly even acquiring changes that are 'atavistic' and reminiscent of unicellular organismal growth [50].Indeed, many of the low-visual-ITH tumors are high-grade and show a very uniform distribution of cancer cells with high nuclear volume.Pending validation by further studies, our results suggest that among invasive tumors, those that have worse outcomes are more likely to be sampled after a single clone has emerged.
In our data, low-visual-ITH tumors were also more likely to harbor TP53 mutations and be comprised of only a single genetic clone, potentially reflecting the outcome of a selective sweep.These results were surprising, given that previous research has suggested that TP53 mutant tumors tend to have higher molecular intratumoral heterogeneity due to higher mutation burden [51] and chromosomal instability [52]; however, this may be partially explained by the fact that molecular measures are primarily based on epithelial tissue whereas visual measures also considered stromal components.Interactions with stroma are becoming increasingly emphasized in tumor prognostication, including in conjunction with TP53 mutation status [53,54].This highlights the importance of cross-talk between tumor and non-tumor cells, some of which may be captured by higher-level visible structures.Similarly, our results may seem to conflict with previous analyses of heterogeneity in ER protein expression, which have suggested that high heterogeneity leads to worse outcomes [4].However, heterogeneity in ER is difficult to disentangle from overall ER expression.That is, tumors with the highest ER positivity show low heterogeneity and are most likely to be responsive to estrogen-targeted therapy.Thus, we emphasize that the interpretation of heterogeneity likely depends on the methods of measurement.That is, it may vary according to the specific immunohistochemistry (IHC) of protein biomarkers and pathologic assessment by H&E.It is also important to consider that tumor histology is captured only at one moment in time, and the preceding changes (i.e., direct observation of temporal evolution) are unknown.Overall, this underscores the importance of considering the cellular and histologic context of a tumor, suggesting that heterogeneity at different temporal phases or modes of detection may have different implications.
The major strengths of this analysis include the use of a diverse population-based sample, integration of visual and molecular data, and the ability to address the influence of tumor composition by segmentation of epithelial and stromal tissue.A limitation of this analysis is that molecular heterogeneity measures were based on DNA from bulk sequencing of the tumors, which may underestimate clonal diversity relative to multiregion sequencing [55].This may have particularly limited analyses in CBCS, where tumors carried fewer observed mutations due to the use of targeted sequencing.However, the similar proportions of multi-clonal and TP53 mutant tumors in the CBCS and TCGA data, and the agreement between molecular and visual heterogeneity (which, by definition, capture multiple tumor regions), suggests that molecular measures still capture relevant between-tumor variation.

Conclusions
In summary, our results highlight the value of considering histologic parameters in tandem with molecular markers of tumor evolution.With the advent of high-quality, highdepth sequencing, it is increasingly common to consider genetic heterogeneity; however, other 'axes' of evolution (such as histologic evolution or even organ-level evolution, as depicted by radiology images) may elucidate how tumors progress over time, leading to novel insights for treatment strategies.

3 Figure 2 .
Figure 2. A toy example of Normalized Merge Level (NML) calculation of a patient.Hierarchical clustering method[34] was first applied to the core features to divide the cores into 1655 clusters, which is the same as the total number of patients in CBCS3.Clustering results were summarized in a clustering tree.To obtain the NML of a patient, (A) core IDs or the patient were first identified.(B) Based on the core IDs, the merge level (level of the tree where all cores of the patient merge) of the patient can be found via the clustering tree.(C) NML was calculated by dividing the merge level by the total number of cores of the patient.

Figure 3 .
Figure 3. Examples of visual intratumor homogeneous and heterogeneous patients and their corresponding TMA cores.The left panel shows the information and TMA cores of three patients from the visual intratumor homogeneous group.The right panel shows the information and TMA cores of three patients from the visual intratumor heterogenous group.

Figure 4 .
Figure 4. Associations between patient and tumor characteristics and visual ITH group (H&E) in CBCS3.Forest plot displays Relative Frequency Differences (RFDs) and 95% CIs for patient age, race, tumor grade, ROR, PAM50 type, ER status, and recurrence across visual intratumor homogeneous and heterogeneous groups.Unadjusted (original) RFDs are shown in red in the middle plot and the corresponding statistics are available in the last two columns.The RFDs and 95% CIs for models adjusted for age and race (Adjust 1) are shown in green, while the models adjusted for tumor grade, ER status, node status, and tumor size (Adjust 2) are shown in blue in the middle plot.Referent groups for each individual model are indicated in the figure, and sample size (N) and percentages are listed.Referent group: visual intratumor heterogeneous group for all models.Note that adjusted results (green bars) do not show results for age and race, nor do blue bars for tumor grade, ER status, node status and tumor size, as these variables were part of the adjustment set and not main effects.

Figure 5 .
Figure 5. Visual intratumor homogeneous tumors have poorer prognosis than heterogeneous ones.Kaplan-Meier curves for 5-year recurrence free survival by original visual ITH (A) or epithelium visual ITH (B) from CBCS3 are shown along with hazard ratios (HRs) and 95% CIs estimated from Cox proportional hazards regression (Referent group: visual intratumor heterogeneous group).

Table 1 .
Original visual intratumor heterogeneity measure shows the strongest associations with clinical characteristics.Comparison RFDs (95% CIs) based on three types of cores (original, epithelium, and stroma) are shown and were calculated using generalized linear models.Red: not significant.Bold: the biggest RFD among the three (original, epithelium, and stroma).

Table 2 .
Adjustment for age and race attenuate associations between visual intratumor heterogeneity and clinical characteristics.For each characteristic, unadjusted and race and age adjusted RFDs (95% CIs) based on epithelium and stroma cores are shown.Red: not significant.

Table 4 .
Odds ratios of the associations between low visual ITH and molecular indicators of heterogeneity in CBCS and TCGA (original only).Clonal/multi-clonal classes are defined based on the number of observed tumor subclonal populations, and TP53 mutations were identified using targeted (CBCS) or whole-exome sequencing (TCGA).