Ovarian Cancer Incidence in the U.S. and Toxic Emissions from Pulp and Paper Plants: A Geospatial Analysis

Ovarian cancer is the fifth leading cause of female cancer mortality in the U.S. and accounts for five percent of all cancer deaths among women. No environmental risk factors for ovarian cancer have been confirmed. We previously reported that ovarian cancer incidence rates at the state level were significantly correlated with the extent of pulp and paper manufacturing. We evaluated that association using county-level data and advanced geospatial methods. Specifically, we investigated the relationship of spatial patterns of ovarian cancer incidence rates with toxic emissions from pulp and paper facilities using data from the Environmental Protection Agency’s Toxic Release Inventory (TRI). Geospatial analysis identified clusters of counties with high ovarian cancer incidence rates in south-central Iowa, Wisconsin, New York, Pennsylvania, Alabama, and Georgia. A bivariate local indicator of spatial autocorrelation (LISA) analysis confirmed that counties with high ovarian cancer rates were associated with counties with large numbers of pulp and paper mills. Regression analysis of state level data indicated a positive correlation between ovarian cancer and water pollutant emissions. A similar relationship was identified from the analysis of county-level data. These data support a possible role of water-borne pollutants from pulp and paper mills in the etiology of ovarian cancer.


Introduction
Ovarian cancer is the fifth leading cause of female cancer mortality and accounts for five percent of all cancer deaths among women in the U.S. The estimated number of new and fatal ovarian cancers in the U.S. in 2017 are 22,440 and 14,080, respectively [1]. Although five-year survival rates for ovarian cancer have shown improvement over the past four decades, the five-year survival rate is only 47 percent for all stages combined and is 29 percent for late-stage (metastatic) cancers, where the majority of ovarian cancers are diagnosed. Presently, there is no accurate modality or biomarker for early diagnosis [1]. The latency period for ovarian cancer (i.e., the time from inception of disease to clinical presentation) is unknown, but is estimated at 15-20 years [2].
The best-known risk factor for ovarian cancer is family history, notably among women with BCRA1/BRCA2 mutations. However, this factor explains only about 10 percent of all cases [3]. Other known risk factors are reproductive: advanced age, nulliparity, and infertility. There is a protective effect for breastfeeding and other hormonal factors, such as early age at menarche and/or late age at menopause, that are associated with a reduced number of ovulatory cycles. size (i.e., n = 987) and possible regional heterogeneities in factors that potentially affect the incidence of ovarian cancer incidence across U.S. counties.

Spatial Distribution of Ovarian Incidence Rates
The county-level age-standardized ovarian cancer incidence rates (2009-2013) varied from 6.1 to 30.3 per 100,000 ( Figure 1), a range considerably larger than that for the state-level data (i.e., 9.6-13.4 for the same time period as displayed in Figure 3). Incidence rates were classified using the quantile method [43]. Rates above the 50th percentile are shown in orange and red. Representation in counties in the West can be misleading because of the large geographic size of these counties. Areas with higher rates are located in Washington, Wisconsin, New York, and New Jersey, with additional foci of high rates in northeastern Alabama, western Florida, and coastal North Carolina. Analysis of the geographic distribution of ovarian cancer rates across the 987 counties using the global Moran's I indicates a significantly clustered pattern (Moran's I index value = 0.0298, p < 0.001). Outcomes of LISA analysis identify statistically significant clusters of high and low rates among counties with non-sparse data. Results indicate that localized clusters of high rates (in red) existed in several locations including northern Wisconsin, upstate and southern New York, northeast We obtained data on toxic air and water releases from pulp and paper mills (NAICS Code 322) from the Environmental Protection Agency's Toxic Release Inventory (TRI) for the time period 1988-2012 [34]. Due to the potential long latency period of ovarian cancer, we examined toxic air and water releases for a 25-year time period. Included in these data were number of facilities and amount and type of discharge by ZIP code, county, and EPA region. Data were also available for individual facilities (i.e., points on a map), enabling a finer level of geographic precision. Using each site's latitude/longitude coordinates, we mapped locations of individual facilities and overlaid them with ovarian cancer incidence rates that are aggregated to state and county boundaries. We also downloaded 2000-2012 data for two subcategories of pulp and paper mill emissions: Occupational Safety and Health Administration (OSHA) carcinogens and dioxin; the latter is used in the chlorine-bleaching process of paper production [26,35]. Since 2000, separate data on these chemicals can be downloaded from EPA's website.

Exploratory Spatial Data Analysis Methods
We first performed exploratory spatial data analysis (ESDA) to examine the geographic distributions of ovarian cancer incidence and paper and pulp facilities and to gauge whether these two variables are spatially related to each other [36]. Among various ESDA techniques, the global Moran's I allows a diagnosis of the overall spatial distribution of ovarian cancer incidence across the U.S. (i.e., clustered, dispersed, or random), whereas the local indicator of spatial autocorrelation (LISA) identifies local clusters or outliers that were comprised by counties having similar or dissimilar rates of incidence with their neighbors. Furthermore, a bivariate LISA analysis allows us to examine whether counties with high rates of ovarian cancer incidence were spatially associated with neighboring counties that had large numbers of paper mills, or vice versa [37,38]. All analyses were conducted using Environmental Systems Research Institute (ESRI) ArcGIS 10.5 and GeoDa, an open source software for spatial statistical analysis developed by Anselin and colleagues [39].

Regression Analysis of the Relationship between Paper Mill Emissions and Ovarian Cancer
To permit comparison with prior research that reported positive associations between ovarian cancer incidence and paper manufacturing, we first conducted regression analyses using state-level data. Ovarian cancer incidence rates were the dependent variable; air and water emissions were the independent variables. These procedures were conducted for: (1)  Geographic research has suggested that issues including spatial autocorrelation and spatially varying relationships can bias the estimates of model parameters in regression analyses when geographically referenced data are used [39][40][41]. Therefore, alternative spatial regression models (i.e., lag or error) could be used to address the spatial autocorrelation problem if statistically significant Moran's I is being flagged. Likewise, geographically weighted regression (GWR) is often employed to generate local regression models that address inconsistent relationships between dependent and independent variables across a study area [41,42]. Therefore, in order to investigate whether ovarian incidence rates were related to TRI emissions from pulp and paper plants, we used ordinary least squares (OLS) regression first for both the state-and county-level data. We next performed spatial lag models for the state-level data in light of the statistically significant spatial dependence across the 45 states. For the county-level data, we performed GWR models in consideration of the large sample size (i.e., n = 987) and possible regional heterogeneities in factors that potentially affect the incidence of ovarian cancer incidence across U.S. counties.

Spatial Distribution of Ovarian Incidence Rates
The county-level age-standardized ovarian cancer incidence rates (2009-2013) varied from 6.1 to 30.3 per 100,000 ( Figure 1), a range considerably larger than that for the state-level data (i.e., 9.6-13.4 for the same time period as displayed in Figure 3). Incidence rates were classified using the quantile method [43]. Rates above the 50th percentile are shown in orange and red. Representation in counties in the West can be misleading because of the large geographic size of these counties. Areas with higher rates are located in Washington, Wisconsin, New York, and New Jersey, with additional foci of high rates in northeastern Alabama, western Florida, and coastal North Carolina.
Analysis of the geographic distribution of ovarian cancer rates across the 987 counties using the global Moran's I indicates a significantly clustered pattern (Moran's I index value = 0.0298, p < 0.001). Outcomes of LISA analysis identify statistically significant clusters of high and low rates among counties with non-sparse data. Results indicate that localized clusters of high rates (in red) existed in several locations including northern Wisconsin, upstate and southern New York, northeast Pennsylvania, and dispersed counties in Alabama and Georgia ( Figure 2). Three large clusters of low rates appeared in the northeastern and southeastern U.S. and Arizona. Several small clusters were mainly scattered in Missouri, Arkansas, and Louisiana. Orange counties represented outliers of high rates in areas where surrounding counties had lower rates. Light-blue counties were outliers of low rates in higher-rate regions. LISA results can vary slightly each time the procedure is run, due to the number of permutations, each of which randomly rearranges neighborhood values to determine whether the observed value for a county (i.e., incidence rate) is random or significantly different. Pennsylvania, and dispersed counties in Alabama and Georgia ( Figure 2). Three large clusters of low rates appeared in the northeastern and southeastern U.S. and Arizona. Several small clusters were mainly scattered in Missouri, Arkansas, and Louisiana. Orange counties represented outliers of high rates in areas where surrounding counties had lower rates. Light-blue counties were outliers of low rates in higher-rate regions. LISA results can vary slightly each time the procedure is run, due to the number of permutations, each of which randomly rearranges neighborhood values to determine whether the observed value for a county (i.e., incidence rate) is random or significantly different.

Spatial Patterns of Pulp and Paper Facilities
Overlaying state-level ovarian cancer incidence rates with the distribution of pulp and paper plants facilitates visual inspection of the spatial relationship between these variables. Figure 3 shows pulp and paper mill locations for the year 1989 (peak year, n = 688), with background shading representing state-level ovarian cancer incidence rates (2009)(2010)(2011)(2012)(2013). It is noteworthy that states with the highest ovarian cancer incidence rates (e.g., New York, Pennsylvania, and New Jersey in the Northeast, Wisconsin and Michigan in the Midwest, and Georgia and Alabama in the South) tended to have the largest numbers of paper mills in the late 1980s.
There has been a notable decline in paper production in the U.S. in the past three decades, with a concomitant decrease in emissions, largely as a result of competition from other countries [44]. In 1988, there were 673 pulp and paper mills in 408 U.S. counties with reported emissions; by 2012, this had dropped to 348 mills in 286 counties. At the state level, in 1989, there were 10 states with more than 20 paper mills; in 2012, there were only two.

Spatial Patterns of Pulp and Paper Facilities
Overlaying state-level ovarian cancer incidence rates with the distribution of pulp and paper plants facilitates visual inspection of the spatial relationship between these variables. Figure 3 shows pulp and paper mill locations for the year 1989 (peak year, n = 688), with background shading representing state-level ovarian cancer incidence rates (2009)(2010)(2011)(2012)(2013). It is noteworthy that states with the highest ovarian cancer incidence rates (e.g., New York, Pennsylvania, and New Jersey in the Northeast, Wisconsin and Michigan in the Midwest, and Georgia and Alabama in the South) tended to have the largest numbers of paper mills in the late 1980s.  Mapping aggregated data on TRI emissions from pulp and paper mills (1988-2012) for U.S. counties enables us to better understand the link with ovarian cancer (Figure 4). Several geographic foci were evident: western Washington and Oregon, Maine, Wisconsin, Louisiana, southwestern Alabama and coastal Georgia, and South Carolina stand out. Air releases reflect overall emissions, but water releases were more restricted. Dark-blue counties on the water releases map indicate that no water effluents were released. Chemicals reported through the TRI process include about 180 known or suspected carcinogens, which EPA refers to as 'OSHA carcinogens'. The geography of OSHA carcinogen emissions is similar to that of all emissions. However, reported dioxin emissions, measured in grams, are greater in Washington and the southeastern U.S.  There has been a notable decline in paper production in the U.S. in the past three decades, with a concomitant decrease in emissions, largely as a result of competition from other countries [44]. In 1988, there were 673 pulp and paper mills in 408 U.S. counties with reported emissions; by 2012, this had dropped to 348 mills in 286 counties. At the state level, in 1989, there were 10 states with more than 20 paper mills; in 2012, there were only two.
Mapping aggregated data on TRI emissions from pulp and paper mills (1988-2012) for U.S. counties enables us to better understand the link with ovarian cancer (Figure 4). Several geographic foci were evident: western Washington and Oregon, Maine, Wisconsin, Louisiana, southwestern Alabama and coastal Georgia, and South Carolina stand out. Air releases reflect overall emissions, but water releases were more restricted. Dark-blue counties on the water releases map indicate that no water effluents were released. Chemicals reported through the TRI process include about 180 known or suspected carcinogens, which EPA refers to as 'OSHA carcinogens'. The geography of OSHA carcinogen emissions is similar to that of all emissions. However, reported dioxin emissions, measured in grams, are greater in Washington and the southeastern U.S.
Results of bivariate LISA analysis indicate an overall significant spatial association between county-level ovarian cancer incidence and the count of paper plants in nearby counties (Moran's I: 0.0127; p < 0.05). The bivariate LISA map ( Figure 5) illustrates that 115 counties were labeled as "High-High" (in dark red), which represents clusters of counties that had high incidence rates while being surrounded by counties with high counts of paper mills. Likewise, 139 counties labelled as "Low-Low" (in dark blue) had significantly low incidence rates, whereas nearby counties had low paper mill counts. Clusters of "High-High" and "Low-Low" collectively indicate a positive spatial association between ovarian cancer incidence rates and paper mill counts.
Alabama and coastal Georgia, and South Carolina stand out. Air releases reflect overall emissions, but water releases were more restricted. Dark-blue counties on the water releases map indicate that no water effluents were released. Chemicals reported through the TRI process include about 180 known or suspected carcinogens, which EPA refers to as 'OSHA carcinogens'. The geography of OSHA carcinogen emissions is similar to that of all emissions. However, reported dioxin emissions, measured in grams, are greater in Washington and the southeastern U.S.  Results of bivariate LISA analysis indicate an overall significant spatial association between county-level ovarian cancer incidence and the count of paper plants in nearby counties (Moran's I: 0.0127; p < 0.05). The bivariate LISA map ( Figure 5) illustrates that 115 counties were labeled as "High-High" (in dark red), which represents clusters of counties that had high incidence rates while being surrounded by counties with high counts of paper mills. Likewise, 139 counties labelled as "Low-Low" (in dark blue) had significantly low incidence rates, whereas nearby counties had low paper mill counts. Clusters of "High-High" and "Low-Low" collectively indicate a positive spatial association between ovarian cancer incidence rates and paper mill counts.

Impacts of Air and Water Emissions on Ovarian Cancer
Among the three OLS models, a statistically significant positive association was identified between surface water emission and ovarian cancer incidence rates for the "All Emissions" model. Among the three OLS models, the best-performing model (i.e., "All Emissions") explained less than seven percent of the variation of state-level ovarian cancer rates. Significant positive Moran's I values were observed among the residuals for all three OLS models, warranting the further use of spatial regression methods. Differently from the OLS models, spatially lagged values of the dependent variable, named "Lagged incidence", were added as an additional independent variable and showed a statistically significant and positive impact on ovarian cancer rate, indicating the rationale of mitigating spatial autocorrelation among geographically adjacent counties. Surface water remained statistically significant in the spatial lag models, although at slightly lower magnitude than the OLS models. Results of the spatial lag models improved the prediction power of the regression analyses

Impacts of Air and Water Emissions on Ovarian Cancer
Among the three OLS models, a statistically significant positive association was identified between surface water emission and ovarian cancer incidence rates for the "All Emissions" model. Among the three OLS models, the best-performing model (i.e., "All Emissions") explained less than seven percent of the variation of state-level ovarian cancer rates. Significant positive Moran's I values were observed among the residuals for all three OLS models, warranting the further use of spatial regression methods.
Differently from the OLS models, spatially lagged values of the dependent variable, named "Lagged incidence", were added as an additional independent variable and showed a statistically significant and positive impact on ovarian cancer rate, indicating the rationale of mitigating spatial autocorrelation among geographically adjacent counties. Surface water remained statistically significant in the spatial lag models, although at slightly lower magnitude than the OLS models. Results of the spatial lag models improved the prediction power of the regression analyses indicated by significantly larger adjusted R squared values and lower Akaike info criterion (AICc) statistics.
Conversely, results of the regression analyses for the county-level data did not show significant regression coefficients for the two measures of paper mill emissions in all three OLS models ( Table 2). Adjusted R-squared values ranged from 0.002 to 0.005, indicating that paper mill wastes explained less than one percent of the variations in ovarian cancer incidence. However, results of the GWR models dramatically improved the explanatory power of the regression analysis. For "All Emissions", GWR models overall explained nearly five percent of county-level variation in ovarian incidence rates and the local parameters of the adjusted R squared statistic were between a range of 0.059 and 0.15 for counties along the east coast including Maine, Pennsylvania, Maryland, Delaware, Virginia, North Carolina, South Carolina, and Florida ( Figure 6). Moreover, water emissions showed statistically significant and positive local regression coefficients for 48 counties located in the aforementioned regions (illustrated in the inset map in Figure 6), which confirms the finding from the state-level analysis of the relationship between TRI pollution and ovarian cancer incidence.  Figure 6). Moreover, water emissions showed statistically significant and positive local regression coefficients for 48 counties located in the aforementioned regions (illustrated in the inset map in Figure 6), which confirms the finding from the state-level analysis of the relationship between TRI pollution and ovarian cancer incidence.

Discussion
This is the first report to investigate the spatial patterns of ovarian cancer rates in the U.S. in relation to pulp and paper emissions using advanced geospatial techniques and national datasets at state and county levels. A comparison of the spatial distributions of ovarian cancer and paper mills indicates that areas with a high level of ovarian cancer incidence rates are also more likely to have had large numbers of pulp/paper manufacturing facilities in the late 1980s. Global Moran's I analysis suggests that ovarian cancer incidence had an overall clustered pattern, whereas local clusters of high rates of incidence were detected in Wisconsin, New York/Pennsylvania, Alabama/Georgia, and south-central Iowa. Bivariate LISA analysis of county-level data reinforces our observations of a positive spatial association between cancer incidence rates and concentrations of paper mills.
Results of regression analyses for state-level data revealed that the amount of water pollutants was positively related to ovarian cancer incidence in the models for all emissions. In particular, the spatial lag models significantly improved the predictive power of the regression analysis and manifested that neighboring states tend to have similar levels of ovarian cancer rates. Findings from the state-level data extend prior research that reported a correlation between ovarian cancer rates in the U.S. and pulp and paper manufacturing [27]. Local models from GWR analysis provide supportive evidence for a significant association between ovarian cancer rates and reported water emissions from pulp and paper mills. This finding justifies the rationale of using GWR to explore relationships of cancer incidence and environmental risk factors that may vary across space. This research has several limitations, due mainly to the incompleteness of data on cancer incidence and pollutant emissions. For example, the five-year timespan of incidence data (2009-2013) provided by State Cancer Profiles contain only 31 percent of all counties in the U.S. (i.e., 987 counties). Surveillance, Epidemiology, and End Results (SEER) program registries do not exist in most of the states with high pulp and paper production. One option for additional research is to obtain better geographic coverage directly from CDC's Research Data Center (RDC); however, at least six of the 48 contiguous states do not provide county-level data to the RDC. Future research to clarify this discrepancy should focus on specific, paper-producing states with long-term cancer registries (e.g., Georgia) and could include local analyses of hot spots, including fate and transport modeling.
TRI data have been used in many studies of public health and exposure [45,46], but these data also have limitations. For example, chemical releases are industry-reported estimates and are subject to biases concerning accuracy and potential underreporting [47]. Indeed, there are financial and political incentives for polluters to minimize the quantity of pollutants reported. Additionally, the release of chemicals does not necessarily indicate exposure in a population or the concentration of chemicals in air or water. The latter requires fate and transport modeling.
Residential mobility and population migration could be a source of confounding in light of the long latency of ovarian cancer. In the 1950s and 60s, one in five Americans moved annually [48]. This decreased to 14 percent in the 2000s and to 11.6 percent in 2011. American Community Survey five-year (2011-2015) estimates indicate that 5.5% of the U.S. population aged 1 year and older migrated across county or state boundaries [49]. Therefore, accounting for population migration in future research could potentially shed new light on the relationship between ovarian cancer and TRI toxic emissions.
Most importantly, we emphasize that this is an ecological study in which measurements were made at the levels of states and counties, not the individual. Thus, one cannot conclude from these findings that exposure to toxic emissions from pulp and paper mills is associated with the risk of ovarian cancer at the individual level. However, several studies at the individual level, for example, Inoue-Choi et al. (2015) [50] and Langseth and Kjaerheim (2004) [25] suggest that water contamination or pollution from the paper industry may increase the risk of ovarian cancer. One method to pursue this lead in future would be to employ a nested case-control study using stored sera. For example, a recent report from the JANUS cohort in Norway demonstrated that higher calcium levels in serum were significantly associated with an increased risk of ovarian cancer decades later [51]. Since a major class of chemicals involved in pulp and paper manufacturing is the organochlorines, a nested case-control study of organochlorine exposure and ovarian cancer, as has been reported for prostate cancer and for lymphoma, would be especially valuable [52,53].

Conclusions
Application of advanced geospatial methods to ovarian cancer incidence rates in the U.S. at the state and county levels demonstrates that these rates are significantly correlated with water borne pollutant emissions from the pulp and paper industry. Analytic epidemiologic studies of ovarian cancer in relation to emissions from pulp and paper manufacturing are warranted.