1. Introduction
Cancer in children and adolescents is the leading cause of death by disease past infancy among them in the United States [
1]. A recent study by Siegel et al. [
2] evaluated age-adjusted cancer incidence rates and trends of cancer among children and adolescents between 2003–2019. Overall cancer incidence rates were the highest for leukemia (46.6 per 1 million), brain and central nervous system (CNS) neoplasms (30.8), and lymphomas (27.3). Between 2003 and 2019, rates of leukemia, lymphoma, bone malignant tumors, and thyroid carcinomas increased, while melanoma rates decreased.
Many studies suggest environmental contaminants may play a role in the development of childhood cancers. These investigations have focused on pesticides and solvents, such as benzene, and most recently hazardous air pollutants including motor vehicle exhausts and environmental tobacco smoke [
3]. Associations have been noted for CNS neoplasms, refs. [
4,
5,
6] leukemia, refs. [
4,
5,
6,
7,
8,
9,
10,
11] and lymphoma [
4,
5,
6]. Exposures to diagnostic medical radiation such as computed tomography (CT) and X-rays during childhood and/or the mother’s exposure to radiation during pregnancy have been found to be associated with a slight increase in risk of leukemia and brain tumors, and possibly other cancers [
12]. Additionally, limited research indicates that pesticides may be a potential risk factor for Ewing’s sarcoma in children [
6,
7].
Hydraulic fracturing (also known as fracking) is a type of unconventional natural gas development (UNGD) used to extract natural gas from underground shale rock formations. UNGD has been rapidly expanding in Pennsylvania since 2005 [
13,
14] with a particularly high density in Southwestern Pennsylvania. In 2005, only 13 permits were issued, and two wells were completed in Pennsylvania [
13]. The number of active wells increased to approximately 11,500 in 2022 [
15]. This UNGD process consists of four phases: preparation, drilling, hydraulic fracturing, and production [
16]. To extract the gas, fracturing fluid is injected into the rock which “fractures” and releases the trapped gas [
16].
This relatively new drilling technique has raised concerns about the potential exposure to multiple hazardous substances released into the environment. Fracturing fluid typically consists of over 1 million gallons of water, sand, and chemical additives per well. A number of these chemicals are known and suspected carcinogens [
17]. Previous systematic exposure assessments by Xu et al. [
18] and Elliott et al. [
19] investigated both air and water pollutants from fracturing according to the International Agency for Research on Cancer (IARC)’s Classification of carcinogenicity [
20]. The studies reported the presence of known human carcinogens in the UNGD-related pollutants including 1,3-butadiene, benzene, cadmium, ethanol, ethylene oxide, formaldehyde, quartz, and heavy metals, as well as naturally occurring uranium and radium. Exposure to these human carcinogens may cause alterations in genes that lead to uncontrolled cell growth and eventually cancer [
1]. Unconventional natural gas development releases not only fossil fuel emissions in well preparation but the aftermath of production creates a large amount of wastewater (flowback) that must be disposed of.
There have been three studies that examined the associations between UNGD activity and risk of childhood cancer. Fryzek et al. [
21] conducted an ecological investigation of childhood cancer incidence rates at the county level within Pennsylvania and found no significant increase in the incidence rates of the total cancer or leukemia in post- versus pre-UNGD years, but a slight increase in CNS neoplasms in years after the UNGD activities began. This investigation was followed by two case–control studies of childhood cancer, one in Colorado [
22] and the other [
23] in Pennsylvania. Both used residential proximity to active UNGD sites and inverse-distance weighted (IDW) well counts as surrogates of exposure. The study by McKenzie et al. [
22] included children and adolescents (0–24 years of age) with acute lymphoblastic leukemia (ALL, n = 87) and non-Hodgkin lymphoma (NHL, n = 50) as cases and 528 children with cancers other than hematologic malignancies as controls in rural Colorado during 2001–2013. Overall, the study did not find an association of the risk of these malignancies with the annual IDW well count within 16.1 km (i.e., 10 miles [mi]) radius of residence. However, for ages 5–24, children with ALL were 4.3 times as likely to live in the highest tertile of the IDW well count compared those living more than 16.1 km (about 10 mi) away from a UNGD well (95% CI: 1.1 to 16), with an increase in risk across tertiles of well counts (P trend = 0.035). Clarke et al.’s study [
23] included 405 children (2–7 years of age) diagnosed with ALL in Pennsylvania during 2009–2017 and 2080 control children randomly chosen from the Pennsylvania birth records that were matched by birth year, but not sex, race, or county. The study found an elevated odds (OR = 1.98, 95% CI 1.06–3.69) of children with ALL whose residences at birth were located within 2 km (1.2 miles) of any UNGD wells during the primary window of exposure time studied (from 3 months prior to conception to 1 year prior to ALL diagnosis) compared with those living >2 km (>1.2 miles) away. With the caveats of these prior epidemiological studies in their study design and assessment of exposures to UNGD activities, the findings of these studies suggest a potential link between UNGD and the risk of hematologic malignancies and CNS neoplasms.
Given the increased UNGD proliferation within Southwestern Pennsylvania since 2005 and health concerns of local communities, we conducted a case–control study assessing the association for the overall activities and residential proximity of UNGD with the risk of common childhood cancer types including leukemia, lymphoma, and CNS neoplasm. These are also the types of cancers found to be associated with various environmental exposures including hydraulic fracturing in both adults and children in the literature [
1]. Our study also included malignant bone tumors due to community concerns about a potential link for UNGD to Ewing’s family of tumors in Southwestern Pennsylvania [
24,
25]. In addition to the overall activities across the four phases of UNGD process and the residential proximity to UNGD sites for each residence per study subject, we assessed other potential environmental exposures including Superfund, Toxic Release Inventory (TRI), and Uranium Mill Tailings Remediation (UMTRA) sites as independent variables or covariates with the risk of childhood cancer.
2. Materials and Methods
We employed a matched case–control study design for the present study. Both Internal Review Boards (IRBs) of the University of Pittsburgh (STUDY21020141) and the Department of Health of Commonwealth of Pennsylvania (PADOH) approved the present study.
2.1. Study Population
2.1.1. Cancer Cases
All cancer cases were identified by the Pennsylvania Cancer Registry, a part of the National Program of Cancer Registries (NPCR) administered by the Centers for Disease Control and Prevention (CDC). Eligibility criteria were as follows: Cancer types included leukemia, lymphoma, CNS neoplasms, and malignant bone tumors defined by the International Classification of Childhood Cancer Recoded Third Edition ICD-O-3/IARC 2017 (see details in
Table S1). Cases with leukemia, lymphoma, and CNS malignant neoplasms were diagnosed at 19 years of age or younger during 2010–2019. Children with Ewing’s family of tumors who were diagnosed at 20–29 years of age were included in the study to increase the sample size due to the disease’s extreme rarity. Both the residences at the time of birth and at cancer diagnosis were from one of the following 8 counties in Southwestern Pennsylvania: Allegheny, Armstrong, Beaver, Butler, Fayette, Greene, Washington, and Westmoreland. Out of a total of 593 cases ascertained from the PA Cancer registry in the original dataset for the period 2010 through 2019, and after the application of further eligibility criteria cited above, the following cases were removed as follows:
- -
A total of 41 were ineligible cancer cases based on the Third Edition ICD-O-3/IARC 2017).
- -
A total of 20 were born outside of the eight-county study area.
- -
A total of 25 were diagnosed within the city of Pittsburgh with no fracturing. Cases with the residence at birth or cancer diagnosis within the city of Pittsburgh in Allegheny County were excluded from the present study due to a city’s ordinance against fracking. In total, 507 cases were eligible for the study.
2.1.2. Control Subjects
The source for control subjects was birth records from the Pennsylvania Bureau of Health Statistics and Registries within the PADOH from 1990 through 2019. For each case, we randomly selected one control subject who was matched to the index case by date of birth (±45 days), the same race (white, black, and other), sex, and county of residence. If a control subject was matched to multiple cases or multiple control subjects to one case, a simple random sampling algorithm without replacement was used to determine the matched case–control pair. Nine case/control pairs were excluded due to a mismatching on the county of residence or race, resulting from data entry errors. A total of 498 case–control matched pairs were available for the study.
2.2. Measurement of Exposure to UNGD
The measurements of exposure to UNGD activity were estimated using official open-source electronic datasets from the Pennsylvania Department of Environmental Protection (PADEP) [
26] and the Department of Conservation and Natural Resources [
27]. These source activity datasets contained the coordinates of individual UNGD well locations and the start and end dates for each of the four phases of UNGD well processes. Efforts were made, as best as possible, to independently verify and correct missing and/or inconsistent data elements. These source datasets serve as the basis for when well sites were active. Any well that was active or completed during the defined period of exposure (i.e., birth till the date of diagnosis of index cases is counted toward the exposure metric). Any well that was active after the time of diagnosis was censored. The mean duration for the four UNGD process phases were as follows: 30 days for well pad preparation, 145 days for drilling, 20 days for hydraulic fracturing, and 2239 days for production.
Two different exposure metrics were created as follows: an overall UNGD exposure measure, and a proximity measure. There are several previous papers related to UNGD activity and birth/cancer outcomes using an inverse-distance weighted index of UNGD activity [
22,
28]. We geocoded the residential address at birth for each of the study subjects using ArcGIS 10.6. This birth address was used for the calculation of the duration of exposure for cases and controls.
The cumulative exposure measure or overall UNGD exposure metric was created to estimate the length of the exposure from any activity of wells within a 5 mile buffer from the residence at birth. This cumulative exposure measure was calculated separately for two predefined windows of time: pregnancy time period was defined as the period of time prior to birth as derived from gestational age in weeks on the birth certificate; birth to diagnosis time period was defined as the date of birth until the cancer diagnosis date (for cases) or index date (for controls). Only those days of well activity that coincided with the exposure window (e.g., pregnancy time period or birth to diagnosis) were included in the calculation.
The overall UNGD exposure metric captures the length of exposure to well activity while simultaneously adjusting for the distance from the birth residence. Thus, there is less contribution from wells with greater distances than wells with shorter distances to the overall UNGD exposure.
For each well, the Euclidean distance between it and the residence at birth was calculated. If the distance was within 5 miles, the number of days of well activity was calculated and then weighted by the inverse of the squared Euclidian distance for that site. The overall UNGD exposure for each individual is estimated as the sum of the contributions from all wells for each case and control within five miles from their birth residence until their index cases’ time of diagnosis. If the sum is zero, this individual was classified as non-exposed.
The second exposure metric was the closest well proximity metric for each person. A Euclidean distance from the residence at birth to each individual UNGD well was calculated and the minimum value selected. Individuals whose closest well proximity exceeds 5 miles were considered non-exposed. Similar to the overall UNGD exposure metric, the closest well proximity metric was calculated separately for the two predefined windows of time (pregnancy time period and birth to diagnosis time period). Only those wells active during the time window were used in the respective estimates.
For the purposes of analysis, the closest proximity measures were grouped into categories of pre-specified buffer zones: [0–0.5], (0.5–1], (1–2], and (2–5] miles [the “(” in front of a number denoted that the number was not included in the group category]. In addition, study subjects whose closest proximity measures were within 5 miles of any UNGD well(s) were classified as exposed, or otherwise they were considered non-exposed. The overall UNGD exposure metric was grouped into tertile or quartiles.
Given that UNGD did not begin until 2005 in Southwestern Pennsylvania and the age criteria for this study was 0–29 years at diagnosis from 2010 to 2019, there were 264 (52%) of the 498 cases and matched controls born before 2004 who did not have any exposure to UNGD during their pregnancy. This significantly limited the number of children for the analysis of this exposure time period. Therefore, our primary focus was the birth to diagnosis time period.
2.3. Other Covariates
The maternal age at childbirth, maternal education level, maternal smoking status, gestational age in weeks at birth, and birthweight were derived from the birth records. These variables were included in statistical modelling to account for their potential confounding effect on the UNGD-cancer risk association.
2.4. Other Environmental Exposures
In addition to the UNGD activity metrics, there were three environmental exposures sources included in our analysis: UMTRA sites, TRI sites, and Superfund sites. We used the same method to separately estimate the
closest proximity distance metric from the residence at birth to TRI sites within a set of determined distance zones, a circular zone surrounding a subject’s residence (0.5, 1, 2, and 5 miles) for each study subject [
29,
30,
31]. As for Superfund and UMTRA sites, of which there were fewer (see
Figures S1 and S2), the buffer zone was living within 5 miles from these sites from birth to diagnosis (index) date. UMTRA sites are stationary legacy sites which have been identified and monitored by US agencies since the mid 1970’s. There were 235 TRI sites in the eight-county study area representing over 650 hazardous compounds monitored by the US Environmental Protection Agency (EPA) [
29]. It should be noted that TRI data is self-reported by the company and is an estimate of the pounds of a compound of toxic materials released into the environment. It may not reflect a true amount released into the environment and thus can contribute to exposure misclassification.
Geographic locations of these stationary environmental exposures are shown in maps (
Figures S1–S3). Four locations of UMTRA are in
Figure S1. Mill tailings are defined as the sandy waste material from a conventional uranium mill. Milling is the first step in making fuel for nuclear reactors from natural uranium ore. UMTRA sites are areas designated by the US Department of Energy who monitor the clean-up of these mills and prevent the further contamination of ground water [
30].
TRI sites are known facilities throughout the US that must report toxic chemical releases to the EPA yearly.
Figure S3 shows the location of the TRI sites in the study area and surrounding counties regulated by the EPA. For the present analysis, we downloaded the 2015 data on all TRI inventory sites for the eight-county study area and all surrounding counties [
29]. The year 2015 was chosen as a representative time-point based on the midpoint of the diagnosis time (i.e., 2010–2019) of cancer cases included in the study.
Figure S2 shows Superfund sites, which is an environmental remediation program established by the EPA. The program is designed to investigate and clean-up sites contaminated with hazardous substances and includes seven sites within the eight-county study area [
31].
2.5. Statistical Analysis
All the data analysis was conducted using SAS version 9.4 (SAS Institute, Cary, NC, USA). Descriptive statistics were computed and assessed for all outcomes and exposure measures, covariates, and characteristics of the 498 childhood cancer cases and their matched controls. For continuous variables, mean/standard deviation and median/inter quartile ranges were used; for categorical variables, frequency/percentiles were used. Chi-square testing assessed differences in percentages for sociodemographic and maternal characteristics between groups (e.g., cases vs. controls) when categorical; t-tests evaluated differences in means between groups when continuous. When appropriate, nonparametric tests were used.
The study’s main aim was to examine the association between the exposure to UNGD activity and childhood cancer risk. As such, conditional logistic regression modeling was used to assess this relationship. Separate regression models were used to estimate the relative risk (ORs and the 95% CIs) for all cancers combined (i.e., leukemia, lymphoma, CNS neoplasms, and malignant bone tumors) evaluating exposure metrics as exposed/unexposed, overall UNGD exposure, and closest well proximity as indicated by the buffer zone. Regression analyses were performed with and without adjustment for additional covariates. Multivariable-adjusted models included the following covariates: maternal age at childbirth (continuous), maternal education level (≤8th grade, high school, some college, or college degree or higher), maternal smoking status at childbirth (yes/no), gestational age (continuous in weeks), birthweight (continuous in grams), TRI (delineated as non-exposed or exposed within 5 miles), and UMTRA (non-exposed or exposed within 5 miles), as well as Superfund sites (non-exposed or exposed within 5 miles).
Similar logistic regression modeling was also performed for each of the four individual cancer types. Although this might have led to some analyses being underpowered, our study team, a priori, believed it was important to separately examine them due to their different biological characteristics. For the Ewing family of tumors (n = 20), unconditional logistic regression modeling to increase the sample size was performed separately from other malignant tumors of the bone by including all 498 controls with adjustments for matching variables (i.e., age at diagnosis, sex, race/ethnicity, and county of residence).
Significance testing was performed for individual ORs, as well as for the evaluation of the linear trend of increasing levels of UNGD activities using an ordinal variable (i.e., 0 for non-exposed and 1, 2, and 3 for tertiles or 1, 2, 3, and 4 for quartiles) with the risk of disease of interest. Similar logistic models were used for the decreasing buffer zone {non-exposed, (2–5] miles, (1–2] miles, (0.5–1.0] miles, and [0–0.5] miles} with the risk of disease of interest.
Additional conditional logistic regression modeling was performed to assess the relationship between the closest proximity to TRI sites (non-exposed, [0–0.5], (0.5–1], (1–2], and (2–5] miles) and all the cancers combined, adjusted for the covariates of maternal age at childbirth, maternal education level, maternal smoking status at childbirth, gestation age, and birthweight. This was repeated for individual cancers as well. We also constructed separate logistic models for UMTRA (non-exposed or exposed within 5 miles) and for Superfund sites (non-exposed or exposed within 5 miles).
4. Discussion
The present study considered an overall measure of UNGD activity and proximity in relation to the risk of four childhood cancer types that integrated both the distance and duration of every active well within the time period of interest. Children diagnosed with any of the four malignancies included in the study were significantly more likely to live within a half mile of a UNGD site with a dose–response trend for a higher risk of children’s cancer with closer proximity to an UNGD site within 5 miles (P trend = 0.0041). A higher total UNGD activity was associated with an elevated risk of any of the four childhood cancer types, but the trend test was not statistically significant (P trend = 0.092). This positive association was mainly due to the elevated risk of lymphoma associated with higher levels of exposure to UNGD activities (both Ps for the trend <0.02). The present study did not find statistically significant, consistent associations between the measurements of exposure to UNGD and the risk of leukemia, CNS neoplasms, and malignant bone tumors including the Ewing family of tumors.
Our investigation was also the first to show a statistically significantly elevated risk of lymphoma associated with higher levels of exposure to total UNGD activities or close to the UNGD site, with dose–response relationships. McKenzie et al. [
22] was the only study that examined the association between hydraulic fracturing and the risk of childhood non-Hodgkin’s lymphoma. However, their analysis only included 50 cases of non-Hodgkin’s lymphoma. The balance of 528 cases were children diagnosed with cancer other than hematologic malignancies. McKenzie et al. [
22] did not find a statistically significant association for the risk of non-Hodgkin’s lymphoma with the density of oil and gas development or IDW well counts within 16.1 km (i.e., 10 miles) of the radius. A higher number of UNGD wells was not associated with an elevated risk of non-Hodgkin’s lymphoma, nor proximity to a UNGD site after the adjustment for age, race, gender, socioeconomic status, elevation, and year of diagnosis. Out of the 50 non-Hodgkin’s lymphoma cases, 18 were unexposed and 32 were within 8 km or a five-mile buffer with UNGD activity exposure. ORs (95% CIs) of non-Hodgkin’s lymphoma were 1.5 (0.72–3.3) in the lowest tertile, 0.91 (0.37–2.2) in the medium tertile, and 1.6 (0.77–3.4) in the highest tertile of exposure to UNGD activities. None of their ORs were statistically significant (which could be due to a smaller number of cases). Our findings are different from this previous study that examined one kind of lymphoma. Future studies are warranted to confirm the findings of our study.
In comparison to the Clark study which noted an increased risk of 1.98 times the odds of developing ALL within 2 km, we conducted a subgroup analysis of ALL case–control pairs only (
Table S2). An elevated, albeit non-significant risk of ALL was observed for children in our study whose mothers lived within 0.5 miles (OR = 5.77, 95% CI 0.42–79.16), but not for any other measures of exposure to UNGD. We should caution that our study was different in terms of the exposure time period and age at diagnosis from the study by Clark et al. [
23]. Clark noted, after adjustments for maternal race and socioeconomic status, the observed odds ratio was no longer significant (OR = 1.74, 95% CI = 0.93–3.27 and OR = 2.35, 95% CI = 0.93–5.95, respectively) for children residing within 2 miles of a UNGD site. McKenzie et al. [
22] also found an elevated risk of ALL (OR = 4.6, 95% CI 1.2–18.0) for children with a highest tertile of UNGD well counts (>33 wells) within 1.6 km (1 mile). Our findings are in line with those of these two previous studies, although our study produced a risk estimate with a wider 95% confidence interval because of the small sample size.
Our study demonstrated a statistically significant elevated risk of CNS neoplasm for children who lived within 5 miles of a UMTRA site (OR = 2.68, 95% CI 1.11–6.44) after adjustments for maternal and infant characteristics from birth records. Legacy Management by the US Department of Energy (DOE) manages over 100 sites in the US associated with past radiological and nuclear material production from the Manhattan Project related to the mining and processing of uranium and hence the exposure to low-level radiation contamination [
30]. Four of these UMTRA sites are located in Southwestern Pennsylvania, the region of our study. The environmental cleanup has been completed or the treatment systems for groundwater are in place. DOE performs long-term surveillance and monitoring to make certain that remedies continue to protect the public health and the environment. Out of an abundance of caution, however, we included this environmental exposure in our analysis and observed a positive association for CNS tumors. In addition to radiation exposure from UMTRA, fracturing fluids also contain naturally occurring radioactive materials such as radium-226 and -228 [
18,
19]. Previous studies [
25,
33,
34] also found that the increased use of ionizing radiation modalities such as cranial computed tomography was associated with an increased risk of CNS tumors [
35]. Prospective epidemiologic studies are needed to examine the precise carcinogenic effect of the exposure to ionizing radiation.
This study did not find any evidence in support of risk associations of malignant bone tumors, including the Ewing family of tumors associated with the exposure to UNGD activities and other environmental factors. Given the extremely small number of children with malignant bone tumors, particularly Ewing’s family of tumors, additional studies with a larger sample size may be warranted.
Our study considered all forms of lymphoma (52 Hodgkin’s, 22 NHL, 5 Burkitt’s lymphoma, 25 miscellaneous lymphoreticular neoplasm, and five unspecified), and we were able to consider multiple buffer distances and individual hydraulic fracturing phases as well as an overall metric that considered birth residence. In contrast, McKenzie et al. [
22] used geocoded addresses at the time of the cancer diagnosis as the only residence.
Lymphoma is more likely to emerge in the presence of infectious stimuli, chemical toxicity, or an immune system that has lost the ability of self-regulation [
36]. There are several studies investigating the possible environmental risk factors for lymphoma in children and adults. Some of the environmental risk factors investigated include polychlorinated biphenyls, organophosphate and organochlorine pesticides, benzene, nitrogen dioxide, and in utero exposure to smoking [
37]. Many of these chemicals are in the IARC carcinogen list and are also found in hydraulic fracturing fluids. Future studies with biomarkers for the exposure to UNGD activities may clarify the current study’s observed association between hydraulic fracturing and the risk of lymphoma.
The primary exposure of UNGD activity and the proximity to UNGD for this study was based on birth records for both cases and controls. We sought to determine the extent to which the participant cases and their matched controls remained in the same county or the eight-county study region during the study period. We identified the cases addresses from the Cancer Registry at the time of diagnosis. Among the 507 children with childhood cancer, 445 (87.8%) remained in the same county at the time of the cancer diagnosis as those at the time of birth (
Table S8). Similarly, among the 219 control children that responded to our survey request, 201 (91.8%) had remained at the same county at the time of the interview as at the time of birth (
Table S9). Given a large proportion of cases and controls remained at the same place from birth to the time of the cancer diagnosis (or index date for controls), a change in residence would not impact on our findings.
This study has many strengths. It is the first population-based study on UNGD activities and the childhood cancer risk randomly sampling age-, race-, and sex-matched controls from birth records. The study population was restricted to Southwestern Pennsylvanian counties which have permitted UNGD activities since 2005. As such, the City of Pittsburgh was excluded due to a ban on hydraulic fracturing. This minimized potential confounding and bias due to other environmental risk factors. The rigid matching criteria (less than 45 days of the difference in birth dates between a case and a matched control) eliminated the potential confounding effect by age. The collection of other environmental exposure data through publicly available sources provided additional information on factors (e.g., TRI, UMTRA, and Superfund sites), which were adjusted for through multivariable logistic models.
In contrast to other studies that used well counts and IDW well counts as exposure variables, our study team was able to create a new metric called “overall UNGD exposure” to evaluate the cancer risk. The challenge in considering the health effects of individual hydraulic fracturing phases particularly with a condition such as cancer with a longer latency period, is that they may be occurring simultaneously in the background with other co-located wells. This overall UNGD exposure metric simultaneously accounted for the duration of UNGD activity (including all four phases, well pad preparation, drilling, fracturing, and production) and the distance of the wells from the residential address at the time of birth. Wells closer to the residence of birth are given a greater weight than those further away. Moreover, other potential environmental covariates including the proximity to TRI, UMTRA, and Superfund sites were included in the overall analysis. An additional strength was the application of multiple buffers for the proximity of residences within <0.5, 0.5–1.0, 1–2, and 2–5 miles of these sites, which allowed for the assessment of the cancer risk with UNGD proximity. The increased risk of childhood cancer with decreasing residential distance from UNGD sites suggests a probable link between UNGD activities and the childhood cancer risk.
This comprehensive analysis also revealed consistent associations for various metrics of UNGD activities, which were highly correlated with each other and the risk of childhood cancer outcomes, further strengthening a probable link between UNGD activities in general and the risk of childhood cancer.
Our study is also the first study to include the four most common childhood cancers—leukemia, lymphoma, CNS neoplasms, and malignant bone tumors. The inclusion of multiple cancer types provided a larger sample size for the study and allowed for the assessment of cancer-specific risks with UNGD activities. The strongest association was observed between UNGD activities and the risk of childhood lymphoma, which are novel findings and warrant assessments by future studies.
The present study also has some limitations. Whereas our overall UNGD exposure metric has a duration of time, proximity, and density components, it may be affected by many factors such as the nearby topography and geological formations, weather patterns, and water sources, and the behaviors of individuals residing near UNGD activity. It is possible that using the closest well proximity distance as a proxy for exposure has resulted in misclassification bias, which may identify an association where there is not one or vice versa. Although there was an attempt to include UNGD open-source data from Ohio and West Virginia, that data was reported only as annual estimates. In addition, we used the residence from the birth records as a proxy for UNGD exposure from birth until index date which also introduces the possibility of misclassification bias. However, there was a high concordance (88–92%) of residences at birth compared to their residence at diagnosis (cases) or birth compared to their residence at index date (controls) remaining in the same county. This adds validity to the use of birth certificates as a proxy for UNGD metrics for this study. Another limitation of the study was the small sample size particularly for bone cancer and the Ewing Family of Tumors which resulted in large variations in risk estimates and wider confidence intervals.