Impact of the Area of Residence of Ovarian Cancer Patients on Overall Survival

Simple Summary The disparities in ovarian cancer care and outcomes have been linked to socioeconomic indicators. Our study mainly demonstrated that women living in economically and socially deprived areas had a significantly higher risk of death after adjustment for individual factors. Our results reflect the complexity of the environment in which the patients live and the impact on overall survival of the combination of negative social, economic and educational factors within that environment. Abstract Survival disparities persist in ovarian cancer and may be linked to the environments in which patients live. The main objective of this study was to analyze the global impact of the area of residence of ovarian cancer patients on overall survival. The data were obtained from the Surveillance, Epidemiology and End Results (SEER) database. We included all the patients with epithelial ovarian cancers diagnosed between 2010 and 2016. The areas of residence were analyzed by the hierarchical clustering of the principal components to group similar counties. A multivariable Cox proportional hazards model was then fitted to evaluate the independent effect of each predictor on overall survival. We included a total of 16,806 patients. The clustering algorithm assigned the 607 counties to four clusters, with cluster 1 being the most disadvantaged and cluster 4 having the highest socioeconomic status and best access to care. The area of residence cluster remained a statistically significant independent predictor of overall survival in the multivariable analysis. The patients living in cluster 1 had a risk of death more than 25% higher than that of the patients living in cluster 4. This study highlights the importance of considering the sociodemographic factors within the patient’s area of residence when developing a care plan and follow-up.


Introduction
Ovarian cancer is the fifth leading cause of cancer-related deaths in women in the United States [1]. Despite improvements in ovarian cancer care, disparities in the quality of care and overall survival persist [2,3]. Disease onset and progression are related to social, political, ecological and historical exposures and are, therefore, closely linked to the environments in which patients live [4]. Early oncological studies exploring the effect of neighborhood socioeconomic status focused on breast cancer patients and found that residential segregation had a negative impact on all-cause mortality [5][6][7]. In economically unfavorable environments, the risk of death was more than 50% higher than that for patients from better-off areas after adjustment for age, ethnicity, sex and comorbid conditions [8]. Specifically in ovarian cancer, several studies have investigated the impact of socioeconomic status and ethnicity on ovarian cancer outcomes and treatment disparities [9][10][11]. In Bristow et al. [2], black women with ovarian cancer had a more than 30% increased risk of not receiving the recommended treatment compared to white women. Disparities in ovarian cancer care and outcomes have been linked to socioeconomic indicators, such as education level, employment status and household income, but the results remain inconsistent. In several studies, a decreased socioeconomic status has been associated with both a lower likelihood of receiving the recommended treatment and worse overall survival [2,9,10,12]. In contrast, others have failed to identify education level, income or poverty as predictive of either treatment or outcome [13]. Most of the studies performed focused on analyzing sociodemographic factors separately rather than considering the overall impact of the area in which patients live. Understanding how these sociodemographic factors interact with each other and their overall influence on ovarian cancer outcomes is critical to improving patient survival.
The National Cancer Institute set up the Surveillance, Epidemiology and End Results (SEER) database in the United States in 1973 [14]. The SEER database is one of best-known data sources for cancer patient follow-up anywhere in the world and provides reliable data for clinical research. A key feature of this database is the inclusion of numerous sociodemographic variables for the areas of residence of the patients. The main objective of this study was to analyze the global impact of the area of residence of ovarian cancer patients on overall survival, by applying hierarchical clustering on principal components to data from the SEER database.

Patient Selection
Data were obtained from the SEER database of the National Cancer Institute, the largest global open-access cancer database available, providing data from 18 population-based cancer registries covering about 28% of the total United States population [1]. The SEER database provides information about patient demographics, treatment methods, tumor characteristics, the follow-up period for survival analysis and sociodemographic data for the county of residence of the patients. We included all patients with epithelial ovarian cancer (serous, endometrioid, mucinous and clear cell) diagnosed between 1 January 2010 and 31 December 2016. Patients under the age of 18 years or diagnosed with ovarian cancer via their death certificate or at autopsy were not included. Patients with unknown American Joint Committee on Cancer (AJCC) stage or unknown tumor grade were also excluded. Finally, patients from Alaska were excluded due to the incomplete nature of sociodemographic data for this state. A flowchart of the study is provided in Figure 1.

Clinical and Sociodemographic Variables
The clinical features assessed in this study included age at diagnosis, ethnicity, marital status, insurance cover, tumor grade, American Joint Committee on Cancer (AJCC) stage, histological tumor type, chemotherapy, surgery, overall survival and vital status records. Age at diagnosis was analyzed both as a continuous variable and as a categorical variable with five classes (<40, 40-49, 50-59, 60-74, ≥75 years). Ethnicity was treated as a categorical variable with five classes: non-Hispanic (NH) white, NH American Indian/Alaska native, NH Asian or Pacific islander, NH black and Hispanic (all races). Marital status was classified as married, single (never married), divorced, separated, unmarried or with a domestic partner and widowed. The staging system definitions were based on the seventh edition of the AJCC staging system. For AJCC stage IV, an analysis of the different combinations of metastases was performed at diagnosis to assess their impact on overall survival and their relationship to the histological features of the tumor. In the absence of liver, lung, brain and bone metastases, patients were considered to be at stage IVA (i.e., without distant metastases). Tumor grade was classified as low (well differentiated-grade I and moderately differentiated-grade II) or high (poorly differentiated-grade III and undifferentiatedgrade IV). Histological tumor types were classified according to the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3): serous (8020-8022, 8441-8442, 8460-8463, 9014), endometrioid (8380-8383, 8570), mucinous (8470-8472, 8480-8481, 9015) and clear cell (8290, 8310, 8313, 8443-8444). Finally, the 28 sociodemographic variables for each county included are listed in Table S1. These variables relate to the following 10 fields: income/poverty, education, demographic, employment, housing, immigration, smoking, medical follow-up, rural/urban nature of the area and mobility.

Clinical and Sociodemographic Variables
The clinical features assessed in this study included age at diagnosis, ethnicity, marital status, insurance cover, tumor grade, American Joint Committee on Cancer (AJCC) stage, histological tumor type, chemotherapy, surgery, overall survival and vital status records. Age at diagnosis was analyzed both as a continuous variable and as a categorical variable with five classes (<40, 40-49, 50-59, 60-74, ≥75 years). Ethnicity was treated as a categorical variable with five classes: non-Hispanic (NH) white, NH American Indian/Alaska native, NH Asian or Pacific islander, NH black and Hispanic (all races). Marital status was classified as married, single (never married), divorced, separated, unmarried or with a domestic partner and widowed. The staging system definitions were based on the seventh edition of the AJCC staging system. For AJCC stage IV, an analysis of the different combinations of metastases was performed at diagnosis to assess their impact on overall survival and their relationship to the histological features of the tumor. In the absence of liver, lung, brain and bone metastases, patients were considered to be at stage IVA (i.e., without distant metastases). Tumor grade was classified as low (well differentiated-grade I and moderately differentiated-grade II) or high (poorly differentiatedgrade III and undifferentiated-grade IV). Histological tumor types were classified according to the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-

Outcome Measurement
The primary endpoint was overall survival, defined as the time from cancer diagnosis to death from any cause or last follow-up. Data were censored for patients still alive at last follow-up. The final follow-up visit occurred on 31 December 2016.

Statistical Analysis
Descriptive statistics for demographic and clinical characteristics were analyzed with Chi 2 tests for categorical variables and Student's t tests for continuous variables. To group together counties with similar characteristics, we used a clustering algorithm. Hierarchical clustering provides an excellent framework for identifying patterns and groups of similar observations in a dataset-in this case, residential areas [15]. Principal component analysis (PCA) can be applied to multidimensional datasets containing several continuous variables as a means of decreasing data dimensionality, resulting in a smaller number of variables capturing the principal information present in the initial data [16]. The combination of these two approaches, through the application of a hierarchical clustering approach to principal components, can be used to obtain a better clustering solution [17]. The first step of our model was thus to perform a PCA on the 28 county-level sociodemographic variables [17]. If the proportion of the variance explained by a dimension exceeded 15%, the dimension was retained. We then ran an agglomerative hierarchical clustering algorithm on the PCA-reduced dataset to identify areas of residence with similar sociodemographic characteristics. Ward's method was used for cluster combination [18]. The optimal number of clusters was defined as that generating the most interpretable and best-isolated groups. All values from 2 to 6 were tested, and the results were interpreted on the basis of the sociodemographic characteristics of the groups and the differences between groups.
Survival analysis was performed by the Kaplan-Meier method, with the estimation of survival probability and log-rank tests. The univariate survival analysis was stratified for stage, with two categories: early stages (AJCC stages I and II) and advanced stages (AJCC stages III and IV). After checking the assumption of proportionality, we fitted a multivariable Cox proportional hazards model to the data to evaluate the independent effect of each predictor on survival. We searched for interactions between clusters of counties and other sociodemographic variables, such as insurance, marital status and age. The best model was selected by a stepwise top-down procedure based on the minimization of Akaike's information criterion (AIC). Adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) were generated. All statistical analyses were performed with R version 4.0.3.

County Clusters
Patients from 607 different counties in four regions of the US (East, Northern Plains, Pacific Coast and Southwest) were included in this study. From the 28 county-level sociodemographic variables, we performed a PCA and selected the first two dimensions. The first and second dimensions accounted for 36% and 17% of the variance, respectively ( Figure S2A). The contribution of each sociodemographic variable to the dimensions is shown in Figure S2B. The first dimension corresponded principally to the income and education variables. The second dimension corresponded to the immigration and demographic variables. Based on these two dimensions, the clustering algorithm assigned the 607 counties to four clusters (Figure 3).

County Clusters
Patients from 607 different counties in four regions of the US (East, Northern Plains, Pacific Coast and Southwest) were included in this study. From the 28 county-level sociodemographic variables, we performed a PCA and selected the first two dimensions. The first and second dimensions accounted for 36% and 17% of the variance, respectively (Figure S2A). The contribution of each sociodemographic variable to the dimensions is shown in Figure S2B. The first dimension corresponded principally to the income and education variables. The second dimension corresponded to the immigration and demographic variables. Based on these two dimensions, the clustering algorithm assigned the 607 counties to four clusters ( Figure 3).  The characteristics of the four clusters related to education, income/poverty, employment, immigration and access to care are shown in Figure 4. The distribution of the variables related to demography, housing, smoking and mobility are presented in Figures S3 to S6. The income and education level were lowest in cluster 1 (n = 225 counties), the coun-  The characteristics of the four clusters related to education, income/poverty, employment, immigration and access to care are shown in Figure 4. The distribution of the variables related to demography, housing, smoking and mobility are presented in Figures S3-S6. The income and education level were lowest in cluster 1 (n = 225 counties), the counties of which had a low proportion of immigrants, were rather rural in nature and had low levels of access to healthcare. Cluster 2 (n = 67 counties) also corresponded to socially deprived areas, with low income and education levels, a high proportion of foreigners or immigrants and low levels of access to healthcare, but this cluster was more urban than cluster 1. Cluster 3 (n = 228 counties) corresponded to well-off areas with a high standard of living and high education levels, often rural, with a low proportion of immigrants and good access to healthcare. Finally, cluster 4 (n = 87 counties), like cluster 3, corresponded to well-off areas but mostly in urban settings and with a more diverse population. Access to healthcare was the greatest in cluster 4, in which 79.4% of the population had undergone a Pap smear test within the last three years, versus 72.4% in cluster 1 (p < 0.001). A map representing the distribution of the different clusters in the United States is shown in Figure 5.
The clinical and pathological characteristics of the patients are shown by county cluster in Table 1

Survival Analysis
A univariable survival analysis revealed a significant difference in the overall survival between the patients from different county clusters for AJCC stages III-IV but not for AJCC stages I-II (Figure 6a,b). The clinical and pathological characteristics of the patients are shown by county cluster in Table 1. The patients from cluster 1 and 2 were diagnosed in advanced stages in 61.6% of the cases versus 62.5% in clusters 3 and 4. Among the patients from cluster 1 with AJCC stage IV tumors, 96 (59.6%) were diagnosed with stage IVA and 65 (40.4%) with stage IVB, including 29 (18.0%) patients with isolated lung metastases. In cluster 4, 948 (63.6%) patients were diagnosed with stage IVA and 542 (36.4%) with stage IVB, of which 192 (12.9%) had isolated lung metastases ( Figure S7, p < 0.001).

Survival Analysis
A univariable survival analysis revealed a significant difference in the overall survival between the patients from different county clusters for AJCC stages III-IV but not for AJCC stages I-II (Figure 6a,b). The five-year overall survival for the patients with early cancer was 78.0% for cluster 1 and 83.0% for cluster 4. The five-year overall survival rates for the patients with advanced-stage cancers were 29.5% for cluster 1 and 39.6% for cluster 4. The median overall survival for patients with AJCC stage III-IV tumors was 37 months for cluster 1 and 45 months for cluster 4. The median follow-up was 30.0 (range: 16.0-50.0) months.
A multivariable survival analysis confirmed the known negative effects on the prognosis of older age, higher AJCC stage, particularly in cases of associated metastases, being black, having a mucinous or clear-cell tumor, high-grade tumors and, finally, a lack of surgery and/or chemotherapy (Figure 7). No interaction between the clusters and the other variables was found. After adjustment for other factors, the county cluster remained a statistically significant and independent predictor of overall survival. The patients from cluster 1 had a risk of death more than 25% higher than that for the patients from cluster 4 (HR 1.3, 95% CI 1.1 to 1.4, p < 0.001). The five-year overall survival for the patients with early cancer was 78.0% for cluster 1 and 83.0% for cluster 4. The five-year overall survival rates for the patients with advancedstage cancers were 29.5% for cluster 1 and 39.6% for cluster 4. The median overall survival for patients with AJCC stage III-IV tumors was 37 months for cluster 1 and 45 months for cluster 4. The median follow-up was 30.0 (range: 16.0-50.0) months.
A multivariable survival analysis confirmed the known negative effects on the prognosis of older age, higher AJCC stage, particularly in cases of associated metastases, being black, having a mucinous or clear-cell tumor, high-grade tumors and, finally, a lack of surgery and/or chemotherapy (Figure 7). No interaction between the clusters and the other variables was found. After adjustment for other factors, the county cluster remained a statistically significant and independent predictor of overall survival. The patients from cluster 1 had a risk of death more than 25% higher than that for the patients from cluster 4 (HR 1.3, 95% CI 1.1 to 1.4, p < 0.001).

Discussion
In this study, we analyzed the combined effect of the sociodemographic characteristics of the patient's area of residence on the overall survival in patients with epithelial ovarian cancer. We found significant disparities in survival between counties defined as disadvantaged or well-off in terms of social and economic factors, education and access

Discussion
In this study, we analyzed the combined effect of the sociodemographic characteristics of the patient's area of residence on the overall survival in patients with epithelial ovarian cancer. We found significant disparities in survival between counties defined as disadvantaged or well-off in terms of social and economic factors, education and access to care. In particular, women living in economically and socially deprived areas were found to have a significantly higher risk of death after adjustment for individual factors.
The major impact of socioeconomic inequalities on the survival of cancer patients has already been documented elsewhere [4]. Survival is consistently poorer in cancer patients of low socioeconomic status than in those of higher socioeconomic status, whether individual-level or geographic measures are used [4,19,20]. Studies on ovarian cancer have shown mortality to be higher, after adjustment for several individual factors, in unmarried women [21] and in women without health insurance [22]. Our study generated similar results. Ethnic differences were also confirmed in our study, in which overall survival was worst for non-Hispanic black women [2]. Published findings have also reported a negative impact of a poor neighborhood [9] or economic and racial residential segregation [12] on overall survival in patients with ovarian cancer. Our results, capturing the complexity of the environment in which the patients live and the impact on overall survival of the combination of negative social, economic and educational factors within that environment, confirm these previous findings.
Little is known about the mechanisms underlying the observed relationship between the area of residence and health, but several hypotheses can be put forward. One of the hypotheses frequently proposed in previous studies is that women with a lower socioeconomic status seek treatment late or are not directly referred to an expert center [23,24]. In women living in areas of low socioeconomic standing, a cancer diagnosis is often delayed, resulting in a decrease in survival. However, the symptoms of ovarian cancer are often mild or non-specific, and about 70% of cases are diagnosed at an advanced stage, even in well-off areas. Organized screening for ovarian cancer has not proved effective, and no screening test has yet been approved for the early diagnosis of ovarian cancer. In our study, patients were diagnosed at an advanced stage in 61.6% of cases in clusters 1 and 2 versus 62.5% in clusters 3 and 4. Comorbidity levels were also higher in cancer patients with a low socioeconomic status [25], increasing all-cause mortality [26,27]. The presence of multiple comorbid conditions may influence the time to diagnosis by leading to a delay in seeking care or the attribution of the symptoms to an existing secondary disease. Multiple comorbid conditions may also influence the choice or aggressiveness of cancer treatment in the face of a low performance status at diagnosis or contraindications for treatment in the cases of comedication [28]. Other studies have suggested that geographic variations in ovarian cancer-specific survival may be due to important predictors, such as receiving care in accordance with guidelines. By contrast to the situation for women with ovarian cancer in the general population, a British study of 1,406 ovarian cancer patients enrolled in two randomized controlled trials found no socioeconomic inequalities in ovarian cancer survival [29]. The authors concluded that the persistent socioeconomic gap in survival in the general population may be due to differential access to treatment and standards of care. Patients in areas of lower socioeconomic status tend to receive suboptimal care [2,23]. In our study, 31 (3.8%) patients in cluster 1 did not undergo surgery versus 243 (2.9%) in cluster 4 (p = 0.02). A similar disparity was observed for chemotherapy, with 205 (25.3%) patients in cluster 1 not receiving chemotherapy versus 1,699 (20.4%) in cluster 4. Overall, improving access to expert care and ensuring that guideline-adherent treatment is administered should be priorities in the optimization of survival in patients with ovarian cancer [30]. Most underprivileged areas are rural, and the distance to a hospital or an expert center may itself constitute an obstacle to seeking treatment or to receiving guideline-adherent care [31].

Conclusions
This study highlights the importance of considering the sociodemographic factors within the patient's area of residence when developing a care plan and follow-up. Improvements in the management of comorbid conditions, the provision of better services for patients of low socioeconomic status and the centralization of care are immediate actions that could be taken to ensure optimal care for all. Future studies of ovarian cancer should consider the area of residence of the patients and should develop analytical and statistical approaches appropriate for the geospatial and multilevel nature of the data. Incorporating social and environmental factors into ovarian cancer etiology and outcome research should help to improve our understanding of disease processes and the identification of vulnerable populations and should make it possible to generate outcomes relevant to the entire affected population.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14235987/s1, Table S1: List of sociodemographic variables; Figure S2: Results of the PCA; Figure S3: Distribution of the variables related to demography in the four clusters; Figure S4: Distribution of the variables related to housing in the four clusters; Figure S5: Distribution of the variables related to smoking in the four clusters; Figure S6: Distribution of the variables related to mobility in the four clusters; Figure S7: Distribution of metastasis combination by cluster.