Socioeconomic, Ethnocultural, Substance- and Cannabinoid-Related Epidemiology of Down Syndrome USA 1986–2016: Combined Geotemporospatial and Causal Inference Investigation

Background: Down syndrome (DS) is the commonest of the congenital genetic defects whose incidence has been rising in recent years for unknown reasons. This study aims to assess the impact of substance and cannabinoid use on the DS Rate (DSR) and assess their possible causal involvement. Methods: An observational population-based epidemiological study 1986–2016 was performed utilizing geotemporospatial and causal inferential analysis. Participants included all patients diagnosed with DS and reported to state based registries with data obtained from National Birth Defects Prevention Network of Centers for Disease Control. Drug exposure data was from the National Survey of Drug Use and Health (NSDUH) a nationally representative sample interviewing 67,000 participants annually. Drug exposures assessed were: cigarette consumption, alcohol abuse, analgesic/opioid abuse, cocaine use and last month cannabis use. Covariates included ethnicity and median household income from US Census Bureau; maternal age of childbearing from CDC births registries; and cannabinoid concentrations from Drug Enforcement Agency. Results: NSDUH reports 74.1% response rate. Other data was population-wide. DSR was noted to rise over time and with cannabis use and cannabis-use quintile. In the optimal geospatial model lagged to four years terms including Δ9-tetrahydrocannabinol and cannabigerol were significant (from β-est. = 4189.96 (95%C.I. 1924.74, 6455.17), p = 2.9 × 10−4). Ethnicity, income, and maternal age covariates were not significant. DSR in states where cannabis was not illegal was higher than elsewhere (β-est. = 2.160 (1.5, 2.82), R.R. = 1.81 (1.51, 2.16), p = 4.7 × 10−10). In inverse probability-weighted mixed models terms including cannabinoids were significant (from β-estimate = 18.82 (16.82, 20.82), p < 0.0001). 62 E-value estimates ranged to infinity with median values of 303.98 (IQR 2.50, 2.75 × 107) and 95% lower bounds ranged to 1.1 × 1071 with median values of 10.92 (IQR 1.82, 7990). Conclusions. Data show that the association between DSR and substance- and cannabinoid- exposure is robust to multivariable geotemporospatial adjustment, implicate particularly cannabigerol and Δ9-tetrahydrocannabinol, and fulfil quantitative epidemiological criteria for causality. Nevertheless, detailed experimental studies would be required to formally demonstrate causality. Cannabis legalization was associated with elevated DSR’s at both bivariate and multivariable analysis. Findings are consistent with those from Hawaii, Colorado, Canada, Australia and Europe and concordant with several cellular mechanisms. Given that the cannabis industry is presently in a rapid growth-commercialization phase the present findings linking cannabis use with megabase scale genotoxicity suggest unrecognized DS risk factors, are of public health importance and suggest that re-focussing the cannabis debate on multigenerational health concerns is prudent.


Introduction
Down syndrome (DS) was first described by British physician John Down in 1866 [1] and is well known to include a variety of facial features, mild to moderate growth retardation, mild to moderately impaired intellectual development, a single palmar crease, loose muscle tone or joints and a curved little finger [2]. It also includes a number of lesser known features including reduced life expectancy, a 6% rate of myeloid leukaemia development and a number of increased co-morbidities including congenital heart disease [2][3][4]. As it was shown in 1959 that the cause of the syndrome is an extra chromosome 21 [1] it represents an example of a disorder with an altered genome. Down syndrome is increasingly common which is usually attributed to women having children later in life. Indeed the only cause mentioned on the Centres for Disease Control (CDC) website relating to this disorder is advanced maternal age [2]. However, society is changing in other ways, particularly in exposure to drug substances. US data show that the overall use of tobacco and alcohol products is declining [5]. In some nations opioid abuse has become widespread and the cannabis industry appears to be entering a rapidly developing commercialization phase in nations such as Canada and USA.
We were particularly intrigued by the demonstration in Hawaii in 2007 that cannabis use alone was associated with an elevated risk of Down syndrome with a report of a rate ratio of 5.26 (95%C.I. 1.08-15.46) on the basis of only 3 exposed cases amongst 479 total cases [6]. Researchers from Canada Health have also reported higher rates of Down syndrome in the northern territories of Canada where more cannabis is known to be consumed [7][8][9][10]. A similar link was recently shown in Colorado where Down syndrome has risen over the decade of cannabis legalization and where the rise is associated more with increased cannabis use during a period where rates of tobacco and alcohol consumption were declining [11,12]. Similar findings also emerge from an Australian report [13] and recent European reports [14][15][16]. Interestingly congenital heart disease has also increased in line with cannabis consumption in Hawaii, Canada, Colorado and Europe [6,8,12,14,15,[17][18][19]. However, the important issue of the relationship of cannabis use with Down syndrome in the rich databases of the USA more generally has not as yet been considered in detail but has only been mentioned en passant [20]. We chose to focus on cannabis use since whilst similar associations have been described for cannabis they have not been described for other drugs.
Recent reports have found strong bivariate relationships between exposure to cannabis and many cannabinoids and both the raw Down syndrome rate (DSR) and estimates of the DSR corrected for early termination of pregnancy for anomaly (ETOPFA) across USA [17]. Similar recent powerful reports have issued from Europe [14,15]. However, this link has not been studied in a formal space-time relationship within a causal inferential framework in USA. The present report addresses this knowledge gap.
We therefore formed three related hypotheses prior to commencing our study. The first was that a demographic survey of drug use would be highly correlated with DS rates (DSR) across the USA after controlling for common covariates such as ethnicity, median household income and maternal age. Our second hypothesis was that cannabis use or cannabinoid exposure would be positively correlated with DSR epidemiologically across both space and time. The third hypothesis was that the various legal paradigms relating to cannabis may be significantly related to the DSR.

Data
Data on Down Syndrome rates was taken from the annual reports of the National Birth Defects Prevention Network (NBDPN) overseen by the CDC Atlanta Georgia 1986-2016 [21]. National US reports collate state based data which generally relate to five year periods, the latest being 2012-2016. Case ascertainment style for each registry was taken from the NBDPN annual reports. Drug exposure data was taken on a state basis from the National Survey of Drug Use and Health (NSDUH) an annual survey conducted by the Substance Abuse and Mental Health Services Administration (SAMHSA) which is a nationally representative survey of the non-institutionalized US population over 18 years [22]. The main drugs of interest and their NSDUH abbreviations were cigarette use in the past month (cigmon), abuse or dependence on alcohol in the past year (abodalc), last month cannabis use (mrjmon), past year analgesic abuse (anlyr) and past year cocaine use (cocyr). Birth census and maternal age structure data was obtained from the CDC Wonder birth registries [23]. National state ethnicity figures and median household income were downloaded from US Census Bureau via tidycensus in "R". Cannabinoid concentration data was taken from published Drug Enforcement Agency reports [24,25].

Derived Data
National cannabinoid concentration data was multiplied by state-based measures of monthly cannabis use to derive an estimate of state based exposure to the various cannabinoids. NSDUH data was used to derive a mean number of days of cannabis use by ethnicity at the national level. This was multiplied by the state monthly cannabis use and by the reported THC potency to derive an index of state-based ethnic exposure to ∆9-tetrahydrocannabinol (THC). States were divided into cannabis-use quintiles based on their ranking in the 2015 NSUDH survey. Similarly NSDUH data was used to calculate an average mean number of days of cannabis used in pregnancy nationally also. This was multiplied by mrjmon to derive a local estimate of state-based pregnancy use denoted "First Trimester Cannabis Exposure". CDC birth census data was used to derive a stateand year-specific fraction of mothers giving birth who were over 35 years of age to account for the known age effect on DS incidence.
The only longitudinal time series which could be identified describing the time course of early termination of pregnancy for anomaly (ETOPFA) for DS was that from the Western Australia Registry of Developmental Anomalies (WARDA) 1980-2014 [26]. The present ETOPFA for Down syndrome is 70%. The time course of the increase from the WARDA dataset until present was used to calculate a fractional maximal ETOPFA rate (FMaxTR) and the raw NBDPN-reported rates were standardized against that to derive an annual estimate of DSR inclusive of ETOPFA's. These data are supported by other international series [27][28][29].

Statistical Analysis and Data Cleaning
Data was processed in "R-Studio" version 1.2.1335 based on "R" from CRAN version 3.6.1. Data was manipulated and matched in R packages base and dplyr [30], linear regression was performed in base, graphs were drawn in ggplot2 [30] and sf [31] and geofacetted in geofacet [32], correlation matrices were visualized in corrgram [33,34], panel regression was conducted in plm [35], robust sandwich regressoin was performed in survey [36], spatial weights and matrices were calculated in spdep [37], and spatial regression was performed in splm [38,39]. Variables were log transformed based on the Shapiro test. Lagged instrumental variables were used in two step panel regressions as described. Models were tidied using broom and broom.mixed [40,41]. Datapoints lying outside 10 standard deviations (sd's) from the population mean were substituted by temporal kriging (temporal mean substitution) of that states' data. This was applied to Nebraska data for 2011-2015, lying 11.6 sd's outside the population mean. Data for multivariable regression was z-transformed to address parameter scale effects. Model reduction from first to final models was by the classical technique of serial omission of the least significant term. Only significant terms are presented. The standard spatial regression model used is the full spatial panel maximum likelihood (spml) model including a maximum likelihood panel looking at main effects over time and using the spatial error structure of Baltagi [42] conducted using the R package splm [39,43,44].
The net effect of cannabinoid parameters was summed in multivariable models from matrix multiplication by multiplying the parameter coefficient by the mean value of the covariate and multiplying these measures for interactive terms. The total value for the model was then summed across all covariates in this manner. For robust regression the survey design included the grouping parameter, the inverse probability weighting parameter and the standardized dataset.

Causal Analysis
Causal analysis was conducted in R using the package ipw to assign inverse probability weights to monthly cannabis use as the exposure of interest [45]. Truncation was not necessary. These weights were then used in weighted mixed effects models performed using the nlme package in R [46]. The strength required of unmeasured confounders to account for the described effects was estimated using E-Values derived from the EValue package in R [47]. p < 0.05 was considered significant.

Data Availability Statement
Data including R programming code has been made freely available in the Mendeley Data Archive at URL: http://dx.doi.org/10.17632/tn46tdhc4c.2 (accessed 13 October 2022).

Ethics
The study was approved by the Human Research Ethics Committee of the University of Western Australia on 7 June 2019 No. RA/4/20/4724.

Univariate Data
437 data points for the DS rate (DSR) were derived from the published NBDPN database from 1986-1988 to 2012-2016. The mid-year of each annual report was taken as the nominal reference year. This data is displayed in Supplementary Table S1 and map-graphically in Figure 1. In recent years, hotspots are seen to emerge in Colorado and Massachusetts. Supplementary Figure S1 illustrates these changes for estimates of the ETOPFA-corrected DS pregnancy rate calculated as described in Methods. Colorado, Georgia and Massachusetts stand out prominently both before and after correction for ETOPFA's. The details for the only data source in the world we could identify providing a longitudinal series of ETOPFA rates for DS was the 2015 WARDA report [26]. These data are provided in Supplementary Table S2. States were divided into quintiles of cannabis use based on the 2015 NSDUH survey as shown in Supplementary Table S3. Colorado, Vermont and Alaska were in Quintile 5 which is the highest cannabis exposure quintile, and Maine, Rhode Island, Oregon and New Hampshire were in the fourth cannabis use quintile.  Figure 2 shows the DSR categorized in in several ways. Panel A shows a rise over time of both the raw data and the estimates for ETOPFA corrections. Panel B shows a rise in time for the three rates reported by NBDPN, as the overall rate, those born to mothers less than 35 years and those older than this cut-off figure. Time-dependent rises are noted in each case. Panel C plots the DSR against the monthly cannabis use rate and notes rises in the overall and under 35 years groups, but not in the older group. Panel D charts the time course of DSR by each cannabis-use quintile. Importantly the highest cannabis use quintile has DSR's is obviously higher than the other quintiles with little overlap in the standard error shaded zones. Panel E provides the same information as groups irrespective of time. One reads the chart by noticing where the notches do not overlap for such non-overlapping areas indicate statistical significance. There is therefore an impression of a rise with quintile number. Panel F re-presents the same quintile data showing the highest quintile compared to all of the others, and thus dichotomizes the quintile data. Since the error zones are largely non-overlapping this suggests an important statistically significant difference. These dichotomized quintiles are again presented as boxplots in Panel G where the notches just overlap. Panels H and I present a similar quintile analysis of the ETOPFA-corrected estimates with generally similar findings.

Bivariate Relationships
Supplementary Figure S2 presents the DSR as separate panels on a gridded plot. Steeply rising rates in Colorado, Georgia, Illinois, Massachusetts and South Carolina are noted. Supplementary Figure S3 presents these data with each state in its approximate position on the map in a geofacetted display. This display allows the rises in the various states to be grouped by geographical region. This shows a group of states in the midwest which rose sharply-Wisconsin, Illinois, Tennessee and Mississippi, Georgia and Ohio-and a group in the northeast-Massachusetts and Rhode Island. Supplementary Figure S4 makes a similar plot for the ETOPFA-corrected estimates and notes widespread uniform rises across the country which vary only in the extent of the rise.
Supplementary Figure S5 presents the NSUDH data on daily or near daily cannabis use (20-30 days per month) and cannabis use in pregnancy at the national level. Dramatic rises in both indices are noted. As these covariates were not found to be significant in the following regression models they were omitted from final models.

Linear Regressions
Supplementary Table S4 presents linear regression of key covariates of the DSR from these opening data. One notes highly significant terms for time (β-est. = 0.  A strongly positive rising effect is noted with income which is well described in the literature. There appears to be a fall with tobacco use, no relationship with analgesic use, but a rise with alcohol abuse, cocaine and cannabis exposure. Four of the five cannabinoids listed-THC, cannabigerol, cannabichromene, cannabinol-show a positive relationship with DSR as does daily cannabis use and the First Trimester Cannabis Exposure index. A strong ethnic trend amongst Hispanic-Americans is also apparent. Year

Case Ascertainment
One issue relates to whether the style of case ascertainment of the different birth defects registries may impact the reporting rate of DS. As shown in Supplementary Figure S6 it does appear to, however the direction is the reverse of what would likely be expected with the rate in the active case-finding registries lower than those without. Compared to passive case-finding registries the rate in the active registries is significantly lower (β-est. = −0.10, (−0.17, −0.03), p = 0.0080; model F = 4.273, df = 2141, p = 0.0158). However, as this variable was not found significant in multivariable models during exploratory modelling it was not considered further.

Correlograms
Supplementary Figure S7 presents a correlogram for these correlations for drugs and CDC-derived data on maternal age (as the fraction of mothers over 35 years). The correlation coefficient together with its confidence interval appear in the upper triangle and the colour code is from yellow-For strongly positive to maroon strongly negative. The correlogram is sorted along its diagonal by principal component analysis. The correlation cannabis use with DSR is shown as 0.31 (95%C.I. 0.21, 0.40) and the correlation of cannabis use with the ETOPFA-correction for DSR as 0.39 (95%C.I. 0.30, 0.48). The first square in this chart is for "FrOlder35" the fraction of mothers older than 35 years.
Supplementary Figure S8 performs the same role for state-based exposure to cannabinoids. For example, the correlation between state-based THC exposure and ETOPFAestimates of DSR is 0.51 (95%C.I. 0.42, 0.58).
Supplementary Figure S9 performs a similar role for ethnicity which is weakly correlated, and for ethnic cannabis exposure which for many ethnicities is strongly correlated with both the DSR and ETOPFA-corrected DSR (ETOPFAC-DSR). Hence, for cannabis exposure in the Non-Hispanic African American, Non-Hispanic Caucasian-American

Panel Regression
Panel regression is a suitable technique to use to regress all of these variables in datasets such as this which have multiple missing data. Supplementary Table S5 presents these results for models lagged at zero, two, three and four years. Main effects were assessed for the five substances, five ethnicities, income and advanced maternal age. Lagged instrumental variables were used for THC exposure and cannabigerol exposure and for the THC exposure of the five ethnicities to account for the mediating effect of these covariates on the main effects. Terms including cannabis are significant from β-est. = 3.38, (1.61, 5.14), p = 1.72 × 10 −4 .

Geospatial Regression
This form of data suggests that geospatial analysis should be applicable to it. However, as no extant geospatial algorithm can cope with missing data it is necessary to impute the missing data by temporal kriging which is an acceptable method. The kriged dataset is shown in Supplementary Table S6 which for the period 2005-2014 lists 322 native points, to which 38 have been added totalling 360 points (with 10.5% kriged).
Supplementary Figure S10 shows this data map-graphically for the raw time series plot, and Figure 4 for the ETOPFA-adjusted estimates. Supplementary Figure S11 shows the geospatial links which were derived from R::spdep and edited as indicated to derive the geospatial weights matrix.  Table 1 shows the results from spatial regression of this data for multivariable models including cannabis, THC and cannabigerol including all the main socioeconomic, ethnicity and drug exposure variables. Table considers first the unadjusted DSR and then the DSR after adjustment for estimated ETOPFA rates. In each case first cannabis, then THC and then cannabigerol are considered. The columns of the right hand of the table list the value of the spatial coefficient rho along with its statistical significance.
12 of the 14 terms for cannabinoids in this Table are positive in direction so on the face of it the effect of cannabinoids appears to be in the positive direction. However, this may be formally determined in multivariable models by the technique of summation of term coefficients by the mean value, multiplying them together for interactive terms, and then adding them across all terms in the model. In this case, since the covariates have all been standardized (by z-transformation) their mean value in each case is zero. Hence, the terms can simply by added directly for all the cannabinoid terms in each model. The sum of these terms is called the Sum of the Cannabinoid Coefficients (SCC) and appears in the right hand column of the table. In all six models this parameter is strongly positive.

Legal Status
It was also of interest to see if the legal status of cannabis might be associated with the DSR. This is illustrated in Figure 5 where the four legal statuses are charted over time in panel A, the ETOPFA-corrected DSR is charted over time in panel B, the dichotomized legal status is charted against the DSR in Panel C, as boxplots in panel D, dichotomized boxplots in panel E and dichotomized boxplots for the ETOPFA-corrected estimates in panel F. In each case apparently highly significant changes are shown.
These changes are quantified in Table 2 which regresses the DSR against each of the parameters shown. Cannabis legalization is shown to be associated with an increased DSR (β-est. = 6.

Inverse Probability Weighted Mixed Effects Regression
Given the strong evidence established from the geospatial models of a close association across both space and time between DS and survey measurements of drug and cannabinoid exposure the next logical issue related to a formal investigation of whether it may be formally possible to demonstrate a causal relationship. This was facilitated by the use of inverse probability weights in the final kriged model, derived from the ipw Package in R. Table 3 presents the results of the fixed effects analysis of inverse probability weighted mixed effects models for three models, a simple additive model, a model interactive in drug exposures and a model interactive in both drug exposures and ethnicities. All covariates in this Table have been standardized by z-transformation. All models include all substance exposures, ethnicities and median household income. For each model covariates are listed in descending order of their coefficient. In addition to the usual mixed effects models parameters the Model Parameter column on the right hand of the table include a Sum of the Cannabinoid Coefficients (SCC) term which has the same meaning as in Table 1. It is noted that in the second and third models the SCC is strongly positive.
Supplementary Table S7 shows the fixed effects of final model outputs from increasingly complex mixed models regressing the DSR against six addictive agents (tobacco, alcohol abuse, opioid analgesics, cocaine, THC and cannabigerol), four races (Caucasian American, African American, Hispanic American and Asian Americans), the ethnic cannabis use index for these four ethnicities and median household income. The first model is an additive model, the second is interactive in terms of the addictive agents (as indicated in Table 3) and the final model has interactions included both amongst the drugs and also amongst the ethnic use of cannabis (as indicated). The best model is the final model (AIC's of additive and final interactive models 24,811.022 vs. 1757.386, ANOVA: df 15 vs. 27, Log Likelihood ratio =747.6357, p = 1.50 × 10 −17 ). Many highly significant terms including cannabinoids appear in final models for both the substances and ethnic cannabis use indices.

Robust Regression
Robust regression which accounts for model structure has also been employed using the R package survey. In these models the state identity, the dataset and inverse probability weights were all accounted for in the model design term. Table 4 presents these results for the ETOPFA-corrected DSR. In five of the six models terms for cannabis or cannabinoids remain significant in final models. Mechanistically these changes may be summarized in the following Figure 6 which presents a Directed Acyclic Graph (DAG) of the various covariates studied. Figure 6a shows four domains namely, Supervariables, Substances, Ethnicity and Genomic pathology. Age is a well recognized super-variable as chromosomal and genetic disorders become more common for many reasons with age. Income is also involved as it affects access to substances and lifestyle factors. All the substances are implicated in genotoxic outcomes, most particularly cannabinoids. A strong link is therefore shown between cannabis and genotoxic outcomes. Ethnicity is also implicated but regression studies show that its effect is largely accounted for by cannabinoid exposure by ethnicity as its effect disappears after adjustment. These various domains are joined by two headed arrows indicating largely two way effects throughout.
However, these effects are not conceptualized in two dimensions. Rather, as shown in Figure 6b each of these four domains is seen as a vertex connected to each of the other vertices as in a double tetrahedron in three dimensions. Hence, all four domains are interconnected both directly and via the other domains. That is both direct (main) effects and interactive and interdependent effects will be seen and described.

Causality and Uncontrolled Confounding
The final question related to quantitation of unmeasured uncontrolled confounding. This was considered using E-Values (from package EValue in R). These are listed in Table 5. One notes that minimum E-Values are generally considered to be potentially causal if they are greater than 1.25 [48]. 52 Table 6. The nexus between the E-value pairs has been broken in this Table to allow both lists to be presented in consecutive descending order.  Table 1 Cannabis

Multivariate Regression of Cannabis Legal Status
It was of interest to determine if cannabis legal status was significant after multivariable adjustment. Inverse probability weight linear regression was used for this study. Supplementary Table S8 presents the results of additive and interactive models for the legal status and the dichotomized legal status with the Down syndrome rate as the dependent variable. Table 7 performs the same exercise for the ETOPFA-adjusted DSR in additive and interactive model. These studies showed that in each case legal status continued to be significant in all models after adjustment for the full panel of substance exposure, ethnicity and income covariates.

Main Findings
The study examined socioeconomic, ethnocultural and substance exposure data that may represent key variables underlying the present rise in US DS rate. We found that the rate is rising quite sharply in many US states and particularly so when estimates of ETOPFA-corrected DSR are considered. The DSR also rises with increase of cannabis exposure and cannabis exposure quintile. When considered by formal multivariable regression across both time and space including median household income, state ethnic composition, cannabinoid-and other drug (cigarette, alcohol, analgesic/opioid, cocaine) exposure and the state rate of mothers older than 35 years, terms including tobacco, ∆9THC, cannabigerol and analgesics/opioids were significant in the final model (p < 0.001), and terms including ethnicity, income and maternal age were not.
Exploration of the potentially causal nature of the cannabis-DS relationship by the use of inverse probability weighting in mixed effects models and E-Value calculation (which quantitates uncontrolled and unmeasured confounding) showed that the association fulfilled causal criteria with tiny probabilities on weighted mixed effects modelling and significant E-Values in geospatial modelling.
Interestingly our studies linking Down syndrome and cannabis are consistent with prior reports from Hawaii, Colorado, Canada and Australia [6,8,12,13]. The strength of evidence provided in this paper is stronger than that reported previously from other independent datasets and jurisdictions. The present results are reminiscent of recent powerful studies in Europe on this issue [14,15,18].

Pathways and Mechanisms
Various biological pathways have been described which may contribute to the aetiology of chromosomal aneuploidy. In particular cannabis exposure has been shown to interfere with the synthesis and action of tubulin which is the key monomer from which microtubules are polymerized [49]. Microtubules form the cytoskeletal framework upon which chromosomes are segregated during anaphase and the de-railing of chromosomes during their separation can cause micronucleus formation (to which cannabis has long been known to contribute) and chromosomal pulverization [50]. For these reasons cannabis has been described as an indirect genotoxin. Cannabis has been described as deforming human gross sperm morphology [51] and cannabinol and cannabidiol have been linked with sperm chromosomal breaks [52][53][54]. Cannabis has also been described as being highly toxic to primary oocytes with chromosomal breakages, linkages, rings and microsatellites formed after in vitro culture and exposure to ∆9THC [55].
It has also been well established that most DS is related to disturbances of female meiosis I, which is a very prolonged phase which can extend from foetal in utero life to the end of a woman's reproductive life at age 50 years [56][57][58]. This implies that the female gametes are actually susceptible to genetic damage throughout the woman's life which implicates on a theoretical basis prenatal, adolescent and young adult cannabinoid exposures as potentially genotoxic and teratogenic.
Cannabis also has quite marked epigenotoxic effects with marked genome-wide alteration of DNA methylation profiles shown in both human and rat sperm [59,60]. This is important as the epigenome is increasingly being understood to effect many genomic features including double-stranded DNA breakage [61,62]. Moreover, epigenomic disruption of key steps in formation of the mitotic spindle, disruption of the post-translational modifications of tubulin which polymerize to form microtubules of the spindle rays, kinetochore function and binding of chromosomes to the microtubules of the mitotic and meiotic spindles, formation of the centromeric chromatin at which the kinetochores will bind, formation of the centrosomes which draw together then ends of the mitotic spindles to form the normal bipolar arrangement which predicts cell division into two daughter cells, have all been now documented in the detailed epigenomic findings of cannabis withdrawal and dependence [63]. These data imply that the marked degree of epigenomic derangement inherent in cannabinoid exposure is likely a key factor driving the chromosomal mis-segregation events underlying reported cannabis-related chromosomal and genetic anomalies in recently population wide studies [6,13,14,17,20,64,65].
Importantly a longitudinal study of the effects of cannabis on the human and rodent epigenome by DNA methylation noted many changes in the machinery of the centromere and kinetochore and damage to the polymerized tubulin polymers of the microtubules of the mitotic spindle induced by cannabis dependence and withdrawal which will greatly increase the error rate of chromosomal segregation, and lead to chromosomal isolation independent of the mitotic spindle, aneuploidy and micronucleus formation [63]. These changes are discussed at length elsewhere [9,15,66,67].

Causal Assignment
Some exposition of the causal inference analysis employed in this paper is appropriate to clarify the techniques employed for the benefit of readers who might be less familiar with the application of modern statistical techniques to causal inference in direct refutation of the "causalophobic" reputation for which classical statistics is so well known [68]. The major short-coming of classical regression techniques in causal assignment has been the potential for unobserved variables to confound the analysis, and in particular the potential for the exposure of interest to be more common amongst the tested group than amongst the control group, a fallacy which results in a comparison of "apples with oranges". Inverse probability weighting of an observational or ecological population can be applied to the regression model so that the exposure in the two groups becomes equivalent thereby constructing so-called "pseudo-randomized" equally exposed populations, from which causal conclusions can indeed be drawn. This is equally true for continuous variables as was used in the present study. This methodology is further strengthened by the use of robust regression techniques as was employed herein.
Another major way in which causality can be assessed is by the use of "Expected Values", known as "E-Values" [69]. This is best illustrated by a classical illustration from recent medical history which has been eloquently recounted by one of the foremost causal inferential statisticians of our time Judea Pearl [70]. Prior to the 1950 s lung cancer was a very rare disease. However, with the rise of tobacco and then cigarette smoking lung cancer became common. It was realized before long that cigarette smoking posed a considerable exposure risk with a relative risk elevation amongst smokers of nine-fold. However, the debate was protracted and conflicted since many of the world's leading scientists-including leading statisticians-smoked tobacco. The debate centred upon whether the observed effect might be confounded by unobserved variables-such as a pre-disposing genetic risk-which might confound the observed relationship. Eventually a genetic risk was indeed identified in 2016 with the identification of a gene which doubled the lung cancer risk. The gridlock was resolved when Robins showed that it was inconceivable that an unobserved confounder variable-which has to be related to both the exposure and the outcome-would be sufficiently powerful to account for all of the observed tobacco effect.
The demonstration in the present study then of many elevated minimum E-Values strongly suggests that the association between cannabis use and Down syndrome is likely to be causal. For example, the present results compares favourably with many modern reports which often have typical E-Values of about 1.25 [48]. However, since both inverse probability weighting and E-values have a number of underlying theoretical assumptions formal demonstration of the causal nature of the link must await further experimental studies. Given the strong evidence presented in the present report such studies should be pursued with high and urgent priority.
Moreover, it is further noted that the present work fulfills all the qualitative Hill criteria of causality as well [71]. Strength of association, consistency amongst studies [6,13,64,65], specificity, temporal sequence (see geotemporospatial analysis), coherence with known data, biological plausibility, dose-response relationships, analogy with similar situations elsewhere [6,13,64,65] and experimental confirmation are all very adequately fulfilled by the epidemiological analyses presented herein and published experimental work as summarized in the Section 4.2 above. These findings may be summarized in the following Table 8.

Implications for Policy
Chromosome 21 is 48 million megabases in length so the now repeated demonstration of strong associations between cannabis use and Down syndrome in a sixth location [6,8,12,13] and in a causal paradigm necessarily implies the implication of cannabis in clinical genotoxic syndromes at the chromosomal megabase scale. The known associations of Down syndrome with congenital heart disease on the one hand and acute lymphoid leukaemia on the other [2][3][4] further imply that adult exposure to cannabinoids is necessarily linked with intergenerational and transgenerational transmission of cannabinoid-related teratogenicity and cannabinoid-related carcinogenicity which are each major aspects of cannabinoid-related genotoxicity. Indeed cannabis has been noted to be an indirect chromosomal clastogen by lab researchers [54,72,73] and has been implicated in a host of genotoxic and epigenotoxic activities as enumerated briefly above [50,52,54,59,[72][73][74][75][76][77][78][79][80][81][82][83].
Since data implicate cannabis in heritable genotoxic syndromes it follows that access to cannabinoid products should be restricted and controlled in the same way as other known genotoxic agents including chemotherapeutic cancer drugs and known genotoxins such as thalidomide. Indeed it is of considerable interest that cannabis shares many of the pathophysiological cellular and molecular mechanisms of thalidomide [84][85][86][87][88][89][90] and is also linked with congenital heart defects [91] and limb outgrowth defects [6,13,65,92] just as thalidomide has been [21,84,89,[93][94][95][96].
In short the well described implication of cannabis with heritable clinical genotoxic syndromes is a major motivation for carefully controlling the community penetration of cannabinoid products of all types in line with the way known genotoxins are handled, addressed and carefully controlled in every other entry in the pharmacopeia of western medicine. This statement in relation to cannabinoid genotoxicity is strongly supported by the now known causal links relating to cannabinoid neurotoxicity both in the exposed, expressed as high rates of mental illness in young American adults [97], and in their offspring, expressed as high and rapidly increasing rates of child autism across USA [98][99][100].

Time-Dependent Changes in Down Syndrome Rates
It is clear that improved diagnostic techniques are applied to antenatal diagnosis in more recent times so it is possible that there has been a secular rise in the diagnosis of DS which might contribute in part to some of the findings reported. However, the principle results persist even when the uncorrected DSR's are used (results not shown). Moreover, it may be argued that the rate of termination for anomaly has risen over time. Whilst the present analysis accounts for this it is important also to note that similar-and even more marked-findings were noted in the European dataset where this effect was explicitly included in the raw data themselves [14]. Thus the European data is more robust that the USA data in this respect as the ETOPFA effect could only be estimated for the USA data. Where the ETOPFA effect was known and available the cannabis effect was actually found to be stronger than that documented in the present report [14].

Generalizability
Since the subject of our study was USA, which by many metrics is the world's leading nation, and is consistent with reports from several other western nations [6,8,[12][13][14] and especially from the large European dataset [14], we feel that study findings are generalizable to other developed nations. However, in view of the rapidly expanding nature of cannabis commercialization, and the widespread misunderstanding of its supposedly benign nature, more research needs to be performed at both the genotoxicity / epigenotoxicity level, and at the epidemiological level. NBDPN-CDC recently published an analysis of gastroschisis at the US county level [101] and a similar high definition geospatiotemporal investigation in the area of cannabis and DS is urgently required. This highly generalizable view is supported by similar reports from multiple other jurisdictions including Hawaii, Colorado, Australia, Canada and most recently Europe [6,13,14,17,20,64,65].

Strengths and Limitations
This study has various strengths and limitations. Its strengths relate to its study of the USA providing the best public data in the world, its population-census wide nature of both its underlying population and birth defect data, its use of the NSDUH which is a robust widely used nationally representative sample, the use of a wide spectrum of covariates, and the application of geospatial and causal inference statistical techniques to this area for the first time to our knowledge. Its limitations relate mainly to its design and the unavailability of individual case-level data. This implies that strictly causal relationships cannot be established. Nevertheless, the present ecological report does pursue this line of investigation as far as this is possible with population level data. Moreover, cannabis use may be associated with other covariates of teratological significance such as tobacco and alcohol use. While these have been adjusted for herein using regression techniques some residual confounding might still exist. Inverse probability weighting minimized this effect. The use of E-Values quantifies the extent to which such effects may be significant. As demonstrated uncontrolled confounding does not perturb the major conclusions emerging from geospatial models. Cannabis use has been shown in many studies to be inversely associated with maternal age so consideration of this variable would likely increase the observed effects. For these reasons further research in this area is urgently required.

Conclusions
Our interpretation of study findings is that there exist solid grounds for accepting a positive epidemiological association between cannabis exposure and DS across both space and time in the present study and also in many other recent series. These results from USA are consistent with similar data from several other states and nations [6,13,14,17,20,64,65] and are further strengthened in the context of the known genotoxicity and epigenotoxicity of multiple ethnobiological cannabinoids. Clearly more research in this area is necessary. Furthermore, in view of its public health implications a greatly augmented public education campaign of current empirical findings should be encouraged following the public health model and approach to tobacco.  Figure S1: Map graph of the logarithm of the Down syndrome rate corrected for estimates of the early termination of pregnancy for anomaly (ETOPFA) rate; Figure S2: Down syndrome rate across USA over time as a panelled and facetted scatterplot in alphabetical order; Figure S3: Down syndrome rate across USA over time as a geospatial panelled geofacetted scatterplot with each state in approximately its correct spatial location; Figure S4: ETOPFA-corrected Down syndrome rate across USA over time as a geospatial panelled geofacetted scatterplot with each state in approximately its correct spatial location; Figure S5 Author Contributions: A.S.R. assembled the data, designed and conducted the analyses, and wrote the first manuscript draft. G.K.H. provided technical and logistic support, co-wrote the paper, assisted with gaining ethical approval, provided advice on manuscript preparation and general guidance to study conduct. A.S.R. had the idea for the article, performed the literature search, wrote the first draft and is the guarantor for the article. All authors have read and agreed to the published version of the manuscript.

Informed Consent Statement:
Patient consent was waived in this study as it was not applicable, all data was grouped and non-identifiable and anonymous. Data Availability Statement: All data generated or analysed during this study are included in this published article and its supplementary information files. Data along with the relevant R code has been made publicly available on the Mendeley Database Repository and can be accessed from this URL: https://data.mendeley.com/datasets/tn46tdhc4c/2 (accessed 13 October 2022).