Cannabis- and Substance-Related Epidemiological Patterns of Chromosomal Congenital Anomalies in Europe: Geospatiotemporal and Causal Inferential Study

Introduction: Laboratory data link cannabinoid exposure to chromosomal mis-segregation errors. Recent epidemiological reports confirm this link and raise concern that elevated chromosomal congenital anomaly rates (CCAR) may be occurring in Europe which is experiencing increased cannabis use, daily intensity of use and cannabinoid potency. Methods: CCAR data from Eurocat. Drug use data from the European Monitoring Centre for Drugs and Drug Addiction. Income from World Bank. Bivariate, multivariate, panel and geotemporospatial regressions analyzed. Inverse probability weighting of panel models and E-values used as major quantitative causal inferential methodologies. Results: In countries where daily cannabis use was rising the trend for CCA’s was upwards whereas in those where daily use was declining it was usually downwards (p = 0.0002). In inverse probability weighted panel models terms for cannabis metrics were significant for chromosomal disorders, trisomies 21 and 13 and Klinefelters syndrome from p < 2.2 × 10−16. In spatiotemporal models cannabis terms were positive and significant for chromosomal disorders, genetic disorders, trisomies 21, 18 and 13, Turners and Klinefelters syndromes from 4.28 × 10−6, 5.79 × 10−12, 1.26 × 10−11, 1.12 × 10−7, 7.52 × 10−9, 7.19 × 10−7 and 7.27 × 10−7. 83.7% of E-value estimates and 74.4% of minimum E-values (mEV) > 9 including four values each at infinity. Considering E-values: the sensitivity of the individual disorders was trisomy 13 > trisomy 21 > Klinefelters > chromosomal disorders > Turners > genetic syndromes > trisomy 18 with mEV’s 1.91 × 1025 to 59.31; and daily cannabis use was the most powerful covariate (median mEV = 1.91 × 1025). Conclusions: Data indicate that, consistent with reports from Hawaii, Canada, Colorado, Australia and USA, CCARs are causally and spatiotemporally related to metrics and intensity of cannabis exposure, directly impact 645 MB (21.5%) of the human genome and may implicate epigenomic-centrosomal mechanisms.


Introduction
Chromosomal congenital anomalies (CCA's) have been an important feature of several recent epidemiological studies of the associations and causal links of various prenatal drug exposures especially cannabinoids. In reports from Hawaii, Colorado, Canada, Australia and USA congenital chromosomal anomalies have featured prominently on long lists of congenital anomalies elevated after prenatal or community cannabinoid exposure [1][2][3][4][5][6][7][8][9]. Given that most of Europe has recently experienced a triply convergent rise in the prevalence of cannabinoid use, the intensity of daily use and the ∆9-tetrahydrocannabinol (THC) content of available cannabis products [10,11], it was of interest to learn if trends observed elsewhere might be reflected in European data. Since all three prongs of this triple convergence tend to increase cannabinoid exposure it is likely that cannabinoid exposure has increased on the ground more than is expressed by any one of these common metrics [3,10]. Indeed recent studies have confirmed that compound metrics of cannabinoid exposure reflect patterns of genotoxic and congenital anomaly disorders better than single more traditionally employed measures.
Particularly concerning in this regard is the well documented exponential dose response of cannabis genotoxicity [12][13][14][15][16][17][18]. It might be reasonably expected that a marked jump in community cannabinoid exposure could be expressed as a switch like mechanism in epidemiological patterns of disease as indeed appears to have occurred recently in northeastern France where both calves and human babies are suddenly being born without limbs at greatly elevated rates 60-times those of background [19][20][21]. There are indications that in these areas large crops of cannabis are being cultivated and food chain contamination seems likely. Since epidemiological studies have confirmed that the exponentiation of cannabinoid genotoxicity seen in the laboratory is also reflected in patterns of congenital anomaly incidence [1,3,4,8,[22][23][24][25] a relatively abrupt rise in community cannabinoid exposure would be expected to be associated with a relatively sudden and abrupt step-wise rise in congenital anomaly rates. This issue seems to not be well understood in the public health community.
Considerable strides have been made in the epigenomic understanding of drug addition and drug dependency in recent years with several multigenerational animal studies appearing [41,42,[44][45][46] and several studies of alterations of the DNA methylation patterns of rat and human sperm [47,48]. Particularly fascinating in this regard was the recent elegant demonstration that altered patterns of chromatin looping and 'gene melting' define uniquely active gene cassettes and topologically defined transcriptional domains which may account for many of the features of addiction within the dopaminergic single cell nucleus of the ventral tegmental area [63].
The subject of this study is the seven chromosomal disorders tracked by the European Network of Population-Based Registries for the Epidemiological Surveillance of Congenital Anomalies (Eurocat) namely trisomies 13, 18, 21 (syndromes of Down, Edwards and Patau) along with Klinefelters syndrome (male XXY), genetic disorders including deletions and chromosomal disorders as a group [64]. One of the major advantages of the European data is that it is complete and records all cases including stillbirths and therapeutic terminations for anomaly. The Eurocat database also reports on more congenital anomalies (95) than the USA CDC (42) [65] which means that we are able to analyze the subject of chromosomal anomalies with greater completeness than from the USA data where only five anomalies are tracked.
One of the most intriguing things about the pattern of CCA disease to be discussed is the cellular mechanisms which might be responsible. Whilst various disruptions of normal chromosomal physiology were described several decades ago as noted above [16,27,66,67], major advances in our understanding of the machinery of mitosis and meiosis have occurred in the decades since that time [68][69][70][71][72][73] and it would seem obvious from the pairing of trisomic chromosomal disorders with the monosomy of Turners syndrome [2,8] that chromosomal mis-segregation must be involved by some mechanism. However, as explored in some detail in the Discussion section, the exact mechanism of this is not understood in molecular terms. Perhaps the most likely explanations relate to disruptions of the post-translational modification code on either the tubulin which comprise the microtubules along which the chromosomes are pulled by cellular kinesin motors [74][75][76], the 90 proteins of the mammalian kinetochore which bind the chromosomes to the microtubules [77], or to the centrosomal chromatin itself [77]. Chromosomal scission and one or two whole genome doubling events [78] are also described as part of cannabinoid genotoxicity [79] all of which suggest gross disruption of the meiotic I machinery of spermatid division. These fascinating mechanistic issues are explored in more detail below.
The objective of the present study was to examine the epidemiology of congenital anomalies across Europe with regard to population substance exposure. Use of inverse probability weighting transformed the analysis from a merely observational study to a pseudo-randomized study from which it is proper and appropriate to draw causal inferences. Space-time regression was also employed to examine these changes accounting for effects such as serial correlation, spatial correlation, spatial autocorrelation and random effects. The study builds on an earlier study [3] and explores the aspects of CCA's in more detail than was possible previously.

Data
Data on all available congenital anomaly rates was downloaded by each individual year for each of 14 nations from the European Network of Population-Based Registries for the Epidemiological Surveillance of Congenital Anomalies (EUROCAT) website [64] and analyzed. EUROCAT total congenital anomaly rate includes anomaly rates amongst live births, stillbirths and cases where early termination for anomaly was practised all combined together so that it represents a total overall picture across all classes of births. The nations selected were chosen on the basis of the availability of their congenital anomaly data for most of the years 2010-2019. National tobacco (percent daily tobacco use prevalence) and alcohol (litres of pure alcohol consumed per capita annually) use data were downloaded from the World Health Organization [80]. Drug use data for cannabis, amphetamines and cocaine was taken from the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA) [81]. Last month cannabis use data was also supplemented by data on the tetrahydrocannabinol (THC) content of cannabis herb and resin provided in recent published reports [11]. Data on daily cannabis use was also available from EMCDDA and was collated in recent reports [11]. Median household income data (in $USD) was taken from the World Bank [82].

National Assignment
Nations were categorized as being either high and rising daily cannabis use or low and/or falling daily cannabis use based on a recent European epidemiological study (see Supplementary Figure S4 in [11]). Thus Belgium, Croatia, France, Germany, Italy, The Netherlands, Norway, Portugal and Spain were categorized as nations experiencing increasing daily use, while Bulgaria, Finland, Hungary, Poland and Sweden were nations which were experiencing low or falling levels of daily cannabis use.

Derived Data
The availability of several metrics of cannabis use, exposure and consumption made it possible to calculate various derived metrics. Hence, last month cannabis use prevalence data was multiplied by the THC content of cannabis herb and resin to derive compound metrics. These metrics were also multiplied by imputed daily cannabis use prevalence rates to derive further compound metrics for both cannabis herb and resin.

Data Imputation
Missing data was completed by linear interpolation. This was particularly the case for daily cannabis use. 59 data points on daily cannabis use from EMCDDA were available for these 14 nations across this period. Linear interpolation expanded this dataset to 129 datapoints (further details provided in Results section). Data on cannabis resin THC concentration were not available for Sweden. However, it was noted that the resin to herb THC concentration was almost constant in nearby Norway at 17.7 so this ratio was applied to the Swedish cannabis herb THC concentration data to derive estimates of Swedish cannabis resin THC concentration. Similarly data for the cannabis resin THC concentration in Poland were not available. The resin to herb THC concentration ratio of nearby Germany was used to estimate the resin THC content in Poland from the known Polish herb THC concentrations. Since geospatial analytical techniques do not tolerate missing data the dataset was completed by the last observation carried forward or backwards for Croatia in 2018 and 2019 and The Netherlands in 2010. It was not appropriate to use multiple imputation methods for this dataset as multiple imputation cannot be applied in panel or spatial multivariable regression techniques.

Statistics
Data was processed in R Studio version 1.4.1717 based on R version 4.1.1 from the Comprehensive R Archive Network and the R Foundation for Statistical Computing [83]. The analysis was conducted in December 2021. Data was manipulated using dplyr from the tidyverse [84]. Data were log transformed where appropriate to improve compliance with normality assumptions based on the results of the Shapiro-Wilks test. Graphs were drawn in ggplot2 from tidyverse. Maps were drawn using ggplot2, sf (simple features) [85] and both custom colour palettes and palettes taken from the viridis and viridisLite packages [86].
Bivariate maps were drawn with package colorplaner [87]. All illustrations are original and have not been published previously. Linear regression was conducted in Base R. Mixed effects regression was performed using package nlme [88]. In all multivariable models model reduction was by the classical technique of serial deletion of the least significant term to yield a final reduced model which is the model presented. Multiple linear models were processed in a single pass using combined techniques from R packages purrr and broom [84,89,90]. The overall effect of covariates in multivariable models may be quantified as the marginal effect. In this case, the overall marginal effect was calculated using the R package margins [91].

Covariate Selection
The presence of multiple different metrics for cannabis consumption and exposure created a problem for analysis as it was not clear which was the most appropriate metric to employ for any particular model. Indiscriminate use of excessive covariates in a multivariable model would unnecessarily consume degrees of freedom and thereby restrict ability to assess interactions. This issue was formally addressed by the use of random forest regression using the R package ranger [92] with variable importance being formally assessed via the R package vip (variable importance plot) [93]. The most predictive covariates from this process were entered into the regression modelling equations. The tables from this analysis are presented in the Results section.

Panel and Geospatial Analysis
Panel analysis was conducted using R package plm [94] across both space and time simultaneously using the "twoways" effect. The spatial weights matrix was calculated using the edge and corner "queen" relationships using R package spdep (spatial dependency) [95]. Geospatial modelling was conducted using the spatial panel random effects maximum likelihood (spreml) function from the package spml which allows detailed modelling and correction of model error structures [96,97]. Such models may produce four model coefficients of interest which are useful in determining the most appropriate error structure for the model. These coefficients are phi the random error effect, psi the serial correlation effect, rho the spatial coefficient and theta the spatial autocorrelation coefficient. In each case the most appropriate error structure was chosen for each spatial model generally taking care to preserve the model error specification across related models. The appropriate error structure was determined by the backwards methods from the full general model to the most specific model as has been described [98]. Both panel and geospatial models were temporally lagged as indicated by one to two years.

Causal Inference
The formal tools of causal inference were used in this analysis. Inverse probability weighting (ipw) is the technique of choice to convert a purely observational study into a pseudo-randomized study from which it is appropriate to make causal inference [99]. All multivariable panel models presented herein were inverse probability weighted. Inverse probability weighting was performed using the R package ipw. Similarly E-values (expected values) quantify the correlation required of some hypothetical unmeasured confounder covariate with both the exposure of concern and the outcome of interest in order to explain away some apparently causal relationship [100][101][102]. It therefore provides a quantitative measure of the robustness of the model to extraneous covariates which have not been accounted for within the measured parameters. E-Values have a confidence interval associated with them and the 95% lower bound of this confidence interval is reported herein. E-Value estimates greater than 1.25 are said to indicate causality [103] with E-values greater than nine being described as high [104]. E-values were calculated from the R package EValue [105]. Both inverse probability weighting and E-values are foundational and pivotal techniques used in formal causal inferential methods in order to allow causal relationships to be assessed from real world observational studies.

Data Availability
Raw datasets including 3800 lines of computation code in R has been made freely available through the Mendeley data repository at the following URL's: 10.17632/mvwvxk756z.1 and 10.17632/vd6mt5r5jm.1 (accessed on 22 March 2022).

Ethics
Ethical approval for this study was provided from the Human Research Ethics Committee of the University of Western Australia number RA/4/20/4724 on 24 September 2021.

Data
As shown in Supplementary Table S1 data points on total rates of 854 chromosomal congenital anomalies (inclusive of stillbirths and therapeutic abortions) were downloaded from the Eurocat database. Data was from fourteen nations as listed. Data on chromosomal anomalies, Down syndrome (Trisomy 21), Edwards syndrome (trisomy 18), Patau syndrome (trisomy 13), genetic syndromes including microdeletions, and Turner and Klinefelters syndrome were obtained with 122 data points in each group. Details on drug use, cannabis exposure, compound metrics of cannabis exposure and median household income are also shown.
As shown in Supplementary Table S2 data on daily cannabis use obtained from EMCDDA was largely incomplete with only 59 points being available. These data were completed by linear interpolation with the addition of 70 datapoints as described in the Methods section so that a total of 129 data points were available for analysis (Supplementary  Table S3). Figure 1 shows the rate of the various chromosomal anomalies as a function of substance exposure. Interestingly the regression lines of best fit for tobacco, alcohol and amphetamine are either negatively sloping or flat. In contrast the regression lines for daily cannabis use and cocaine exposure are all positively sloping, sometimes quite steeply.     Figure 4 represents the time course of genetic disorders and microdeletions. High rates are noted in The Netherlands, Belgium, Spain and Bulgaria at different periods. Figure 5 shows the time course of the compound cannabis exposure metric last month cannabis use × cannabis herb THC concentration × daily use interpolated. Highest rates are noted in Spain and France. Figure 6 is a bivariate map of the chromosomal disorder rate as a function of the cannabis use × cannabis herb THC concentration × daily use interpolated compound metric. One notes that Spain turns from green to purple shading indicating that a nation that was once low in both the compound cannabis exposure metric and chromosomal disorders is now high in both metrics. Similarly the area of France has changed colour from red to bright pink again indicating that both the cannabis exposure metric and chromosomal disorders have now become elevated. Figure 7 is a bivariate mapped plot of genetic disorders against the same compound cannabis exposure metric. Here, again a similar pattern is observed with Spain and France moving to a pattern where both exposure and outcome incidences are simultaneously high.

Bivariate Analysis
The time trends for these seven chromosomal anomalies can also be considered dividing the nations into those where daily cannabis use is low or decreasing and those where it is rising based on recent published reports ( [11] Figure 8). In most cases there was a clear temporal separation between the two groups. Indeed this is confirmed on formal testing. A mixed effects regression procedure with anomaly as the random effect found that indeed the congenital anomaly rates (CARs) were significantly higher in the group of nations where cannabis use was rising (β-est. = 0.688, t = 3.709, p = 2.29 × 10 −4 ; model AIC = 1811.6, LogLik. = −899.798). The slopes of the regression lines show in Figures 1 and 2 were formally studied by linear regression in a single pass analysis using a combined purr-broom workflow for each anomaly and for each substance exposure. The results are presented in Supplementary  Table S4.
When those slopes which were positive and statistically significant were extracted 51 bivariate relationships remained and these are shown in Table 1. The models are listed in descending order of minimum E-Value (mEV). It is noted that some of the E-value estimates and the mEV's are very high. The first seven lines of the table are all occupied by daily cannabis use interpolated as the most closely associated cannabis exposure metric. Indeed the first seven lines of table list consecutively all seven of the chromosomal anomalies of interest. Indeed 24 of the 51 mEV's reported (47.6%) are above nine and so are in the high range. All 51 mEV's reported exceed 1.25 which is the generally accepted cut-off point for potentially causal relationships [103]. The only other covariate to appear in table is cocaine which fits with the data presented in Figure 1. Clearly these results demonstrate a very close bivariate relationship between cannabis metric exposure and the various different chromosomal pathologies.

Random Forest Regression
The next question therefore is how would these various covariates behave in a multivariable analytical context. However, this question is not straightforward to answer due to the multiplicity of covariates listed. This issue was addressed with the use of random Forrest regression using the R package ranger together with the package vip to assess variable importance. These variable importance tables are shown as Supplementary Tables S5-S11 for the seven CAs respectively.

Inverse Probability Weighted Panel Regression
Supplementary Table S12 presents final multivariable panel regression models (after model reduction) for chromosomal anomalies as a class. All panel models were inverse probability weighted allowing causality to be directly inferred. Selection of covariates was undertaken by the Random Forrest analysis described above. Additive and interactive models are presented together with interactive models lagged by one and two years. Confirming the bivariate results in each case daily cannabis use interpolated has invariably positive coefficients and is highly statistically significant (from p = 3.61 × 10 −5 to p < 2.2 × 10 −16 ) along with other cannabis exposure metrics.
Supplementary Tables S13-S18 repeat these procedure for the other six chromosomal anomalies. In each case metrics of cannabis exposure are positive and highly statistically significant. Since these models are all inverse probability weighted and convert the analysis from an observational study to a pseudorandomized framework they are reporting not just associations but causal inferences.

Geospatiotemporal Regression
Data was also considered in a formal space-time framework that formally accounts for random effects, serial correlation, spatial correlation and spatial autocorrelation. Supplementary Figure S1 illustrates the derivation, editing and finalization of the spatial links between the various nations which were used to form the basis of the spatial weighting matrix used for formal space-time regression.
Tables 2-6, Tables S19 and S20 present the results of geospatial regression in additive, interactive and temporally lagged models. In each case one and sometimes several cannabis metrics are associated with positive regression coefficients are highly statistically significant.
The case of Klinefelter syndrome shown in Supplementary Table S20 is somewhat different. Although cannabis metrics are positive in the additive model they do not appear to be so in the interactive and lagged models. However, this dataset is deficient in that 30 of the results are artificially set to zero and are not true readings. As it has not been possible to correct this data by mathematical means the results are presented as they stand however they should only be interpreted as a result which is the least powerful of all the available scenarios (Supplementary Table S20).

Quantitative Causal Inference-E-Values
E-Values may be extracted from these panel (Supplementary Table S21) and geospatial models (Supplementary Table S22). When they are compiled into two lists (one for each of the E-value estimate and its mEV) it is noted that 72/86 (83.7%) E-value estimates exceed 9 an are therefore high, and all 86 (100%) exceed 1.25. When the mEV's are considered it is noted that 64/86 (74.4%) are greater than 9 and all 86 (100%) are also above 1.25 (Table 7). Hence, this Table reports some of the most important findings in this study.            Supplementary Table S23 lists these E-values by anomaly. These results are considered collectively in using various summary statistics for the E-value estimates and the mEV's in Table 8. In this table the various anomalies are listed in descending order of median minimum E-value. Table is notable for the remarkable size of many of these E-values.
Supplementary Table S24 lists these E-values again in order of the substance exposure described. In this table the substances are broadly grouped into the primary covariate as either herb or resin THC or daily cannabis consumption. The findings from this grouped analysis are collated into Supplementary Table S25 which presents summary statistics for these grouped exposures and again lists the cannabis exposures in descending order of median mEV. Once again daily cannabis consumption is conspicuously the most tightly related covariate.
These groups may be compared formally using Wilcoxson non-parametric tests. The results of these tests are shown in Supplementary Table S26. The E-value estimates are shown to be significantly different across all three primary exposure groups with the order of effect daily use interpolated > herb THC > resin THC as indicated in Supplementary  Table S25. For the mEV's the daily cannabis use interpolated exposure data is significantly more strongly associated than either the herb or resin groups (Supplementary Table S26).

Main Results
All seven chromosomal disorders studied, both as a class and individually, are closely related to various metrics of cannabis exposure. This strong statistical signal assessed by various regression models is in close accord with recently reported results from other jurisdictions including Hawaii, Colorado, Canada, Australia and USA [1][2][3][4][5][6][7][8][9], and an earlier simpler study of the European dataset [3].
At bivariate analysis strongly positive trends for cannabis metrics, especially daily cannabis, and cannabis herb were noted. Contrariwise those for tobacco and alcohol are relatively weak (Figures 1 and 2). Countries with rising levels of daily cannabis use had higher and rising levels of most chromosomal anomalies whereas the temporal trajectory of the trend in nations which did not was downwards for most anomalies (p = 0.0002) (Figures 8 and 9).
Terms for cannabis metrics were positive and significant in inverse probability weighted panel models from p < 2.2 × 10 −16 for chromosomal disorders at 1 and 2 temporal lags, Down syndrome in additive model, Patau syndrome in additive model, and Klinefelters syndrome at 2 lags (Supplementary Tables S12-S18).
Spatial regression terms including daily cannabis use interpolated were significant for chromosomal disorders from 6.13 × 10 −6 in additive models, and in models lagged by two years 4.28 × 10 −6 ( Table 2). In an additive spatial model for genetic disorders daily cannabis use was from 5.79 × 10 −12 (Table 3). Down syndrome assessed in an additive model was significant for cannabis herb THC concentration from 1.26 × 10 −11 (Table 4). For trisomy 18 daily cannabis use was significant from 1.12 × 10 −7 (Table 5). For trisomy 13 terms including cannabis herb THC concentration were significant from 7.52 × 10 −9 ( Table 6). Turner's interactive model terms including daily use significant from 7.19 × 10 −7 (Supplementary Table S19). Klinefelters syndrome terms including cannabis herb THC concentration significant in additive models from 7.27 × 10 −8 (Supplementary Table S20). 83.7% of 86 E-value estimates exceeded 9 and were thus in the high range including four values at infinity. 74.4% of 86 minimum E-values exceeded 9 including four at infinity (Table 7). Daily cannabis use was shown to be a more powerful covariate than cannabis herb or resin THC concentration (Supplementary Tables S5-S11 and S26).

Choice of Anomalies
In view of the importance of accurately studying the detailed epidemiology of chromosomal disorders, which themselves chronicle direct genetic damage at the hundred megabase scale, it was important to consider all seven of the lesions by full and formal analysis.

Qualitative Causal Inference
In 1965 the renowned epidemiologist A.B. Hill described nine criteria which have become established as qualitative criteria for properly attributing a causal relationship to an association [106]. It is noted that the present analysis fulfils all nine of these criteria: strength of association, consistency amongst studies, specificity, temporality, coherence with known data, biological plausibility, dose-response effects, analogy with similar situations elsewhere, and experimental confirmation.

Quantitative Causal Inference
The methodologies adopted in this study formally address the two major analytical techniques employed by quantitative causal inference namely inverse probability weighting and E-values. Inverse probability weighting has the effect of evening out exposures across a whole groups of data thereby making the groups broadly comparable across the whole investigative domain and addressing the non-equivalence issue which is one of the primary criticisms levelled at observational studies. This procedure effectively transforms an observational dataset into a pseudorandomized analysis from which it is entirely appropriate to draw causal inferences [107].
E-values, or expected values take into account the degree of association required of some unidentified and uncontrolled confounding variable with both the exposure of concern and the outcome of interest to discount an apparently causal relationship [100][101][102][103]. It is noted that the very high magnitude even of the lower E-values in the present study (74.4% greater than 9 which is high [104]) ensures that discount of a causal relationship is excluded.

Chromosomal Overview
The fact that chromosomal anomalies have been positively identified in all six of the major epidemiological surveys of cannabis teratology to be conducted in recent years [1][2][3][4][5][6][7][8][9] clearly indicates that disturbances of chromosomal physiology must be one of the defining features of cannabinoid teratogenesis. This begs the intriguing mechanistic question as to the underlying cellular mechanisms which might be responsible for this broad spectrum of anomalies. Moreover, mechanistic considerations are central to the whole causal argument as outlined by Hill [106] including biological plausibility for the cannabis -congenital chromosomal anomaly association. For these reasons mechanistic considerations well merit careful consideration and detailed molecular dissection so that state-of-the-art insights and understandings can be brought to bear on this pivotal and fundamental question.
Considering the list of disorders presented by the rich European dataset two separate sets of processes appear to be operating when considered in the most general of terms. First, chromosomal mis-segregation likely attributable to disruptions of the mitotic machinery and/or the mitotic spindle (responsible for the series of chromosomal trisomic syndromes reported including Klinefelters disorder and the monosomy Turners syndrome). Second, genetic deletions and microdeletions which clearly involve DNA breaks and chromosomal breakage lesions. If one adds together the lengths of the chromosomes involved, here chromosomes 13, 18, 21 and X it is noted that this accounts for (113 + 76 + 46 + 153 = 388) megabases (12.9%) of the 3000 megabases in the human genome.
If the length of all of the chromosomes implicated in cannabis-related genotoxic disease is added together (excluding duplicates) it totals and impressive 1765 megabases (58.8%) of the human genome which by any measure is a sizeable segment to be registering direct genetic damage.
The case of the chromosomal pathologies in testicular cancer is particularly intriguing and its rapidity opens up special insights into chromosomal pathological mechanisms. Metanalysis has demonstrated that the incidence of testicular cancer is 2.6 times greater after cannabis exposure than without it [112]. Moreover, if one assigns a median age of 20 years to cannabis exposure, and the known median age of testicular cancer is 33 years, then cannabis exposure abbreviates the usual oncogenic incubation period from 33 to just 13 years which is a 2.5-fold acceleration. If one considers the compound index of incidence-rapidity it is noted that the oncogenic process following cannabis is (2.6 × 2.5 =) 6.5 times faster.
Moreover, the biogenesis of testicular cancer is known to involve genome demethylation, one or two rounds of whole genome doubling events followed by the loss of (usually) 50-70 chromosomal arms [78]. This oncogenic scenario is thus intriguing as cannabis has been previously linked with disruption of the meiotic machinery [47], and is also known to cause widespread alteration in patterns of DNA methylation. Most curious of all however is the issue of losing so many chromosomal arms in such a short period.
It was noted in the Introduction that chromosomal breaks, fusion and bridges have been identified in cells following cannabis exposure. In 1938 the breakage-fusion-bridge cycle was identified by which broken ends of chromosomes became joined to other broken ends as fusions which became disrupted with cell division thereby perpetuating the cycle [113].
Indeed cannabis has been linked with inhibition of telomerase potentially leading to uncapped and unprotected chromosomal ends, and accelerated aging syndromes [47,114].
If the breakage-fusion-bridge cycle was operating in the testicular germ cell niche this would explain both the relatively rapid appearance of testicular carcinogenesis after young adult exposure and its requirement to take an average of 13 years to occur whilst the cycle turns, builds up genetically modified clones which become selected out and expanded and eventually achieve a growth advantage and become clinically revealed as overt malignant disease.
Whilst such oncogenic mechanisms are not relevant to congenital chromosome inherited disorders they do powerfully inform the broader subject of cannabis-induced chromosomal pathology more generally and expand the potential repertoire of human chromosomal pathophysiology which is of relevance to the broader subject.
In considering the spectrum of chromosomal pathology in relation to cannabis it should also be noted that cannabis has long been known to test positively in the micronucleus assay [115]. Micronuclei have been identified as a key engine for the formation of chromosomal shattering, so-called chromothriptic events, which are a key engine of genetic damage and implicated in cancerogenesis, congenital anomalies mental retardation and pregnancy loss [116][117][118][119][120][121][122][123][124][125][126][127][128].

Chromosomal Detachment
Given that chromosomal mis-segregation is so a major part of cannabinoid teratogenesis a major outstanding question must relate to the mechanism whereby chromosomes detach from the mitotic half spindle. Why do the chromosomes detach? This is clearly the source of mis-segregations which lead to trisomies, monosomies, micronucleus formation and chromothripsis.
Whilst the exact answer is not known there are many possibilities owing to the impressive complexity of the system of cell division. Tubulin proteins are a complex family of proteins. Microtubules are made of αand βtubulin dimers which polymerize into high order elongated structures which form the mitotic spindle. There are 25-30 microtubules in each ray of the spindle. Cannabis was shown in a proteomic screen to reduce tubulin synthesis [38] so this would obviously impede the function of the whole structure.
As mentioned there are many different types of tubulins. They are all subject to a complex system of post-translational modifications (PTM's) which control subcellular function, address and reactivities. These PTM's include methylation, acetylation, tyrosination, removal or addition of terminal glutamates including polyglutamylation, glycation or polyglycation, polyamination, phosphorylation, ubiquinylation and palmitoylation [76]. Acetylation of α-tubulin on lysine 40 controls microtubule flexibility and microtubule "aging" [76].
Microtubules of the spindle pointing to the cell equator are enriched for detyrosination and polyglutamylation. Unaligned chromosomes are carried to the metaphase plate by the kinetochore associated protein kinesin-7 motor protein CENPE (centromere protein E) in a manner which is dependent on the detyrosination status of the microtubules implying that the motor protein can read the tyrosination state of the microtubules [76]. Such steps are clearly critical steps which either direct or epigenomic impact of cannabinoids may perturb.
PTM's have interesting effects, as those near the centrioles encourage nucleation and the formation of the poles of the mitotic spindle by polyglutamylation. Importantly phosphorylation of serine 172 on β-tubulin by CDK1 (cyclin dependent kinase 1) prevents incorporation of the tubulin dimer into the growing microtubule and reduces the pool of tubulin available for polymeric assembly. As noted below CDK1 is perturbed epigenomically by cannabinoid withdrawal [47].
Microtubule dynamics were shown to be affected epigenomically by differential DNA methylation in cannabis withdrawal on functional annotation of a recent Ingenuity Pathway Analysis longitudinal whole epigenome screen (58 genes, Supplement S5 page 300, p = 0.000033; and again on page 352, p = 0.00445) [47].
The microtubules are bound to the spindle by a complex multiprotein assembly called the kinetochore which is a giant molecular complex comprising 90 proteins all highly regulated and highly post-translationally modified [77]. It has been argued that sumoylation is a particular key and foundational PTM placed on several of the key subunits of the kinetochore which control the assembly of the subsequent PTM's [77]. ∆9THC has been shown to impair desumoylation thereby disrupting the PTM code and thus kinetochore control [129].
Clearly this would be a very fruitful area for active investigation.

Epigenomic Mechanisms
An important recent whole genome DNA methylation screen identified 163 differentially methylated regions (DMR's) affecting hundreds of genes across the genome during cannabis dependence, and then 127 DMR's which appeared after 11 weeks of cannabis withdrawal [47]. Three functional annotations from the Ingenuity Pathway Analysis identified for chromosomal genes were found during this process involving homologous pairing of chromosomes (MSH5, RAD21L1, SMC1B and SYCP3), chromosomal assembly (CDK1) and Philadelphia-chromosome-negative acute lymphoblastic leukaemia (CD3D, EPHA2, FYN, KMT2A).
MSH5, MutS homolog 5 is involved in DNA mismatch repair and facilitates DNA crossing over during meiosis [130]. Abnormalities of it are implicated in azoospermia and premature ovarian failure and thus infertility in both sexes. RAD21L1, Double Strand Break Repair Protein RAD21-like is involved in male prophase I of meiosis, sister chromatid paring, crossing over, synaptonemal complex formation and synapsis initiation [131]. When mutated it causes spermatogenic failure. SMC1B, Structural Maintenance of Chromosomes forms a key part of the meiosis-specific cohesin complex with RAD21 which hold sister chromatids together and is involved in DNA recombination events [132]. It is expressed in brain, heart and testis. Mutations cause gonadal dysgenesis and corneal dystrophy [132]. SYCP3, Synaptonemal Complex Protein 3, is expressed in human testis germ cells and functions during meiosis I to control chromosome separation, recombination events and synaptonemal complex formation. When mutated it causes infertility, testicular degeneration, spermatogenic failure and azoospermia in males and pregnancy loss in females [133].
CDK1 is the serine/threonine cyclin dependent kinase I which stands at the very threshold of cell cycle control [134]. CDK1 phosphorylates over 75 binding partners which control cell cycle entry, mitotic spindle formation and degradation, centromere assembly and many features relating to cell cycle progression [135]. It is also highly regulated by multiple interacting control systems including positive feedback loops which operate in switch like fashion to suddenly activate or shut down its functions [135]. It controls entry into key cell cycle phases such as the G1/S transition and the G2/M transition.
CD3D (T-cell surface glycoprotein CD3) is a cell surface glycoprotein on T-cells encoding the CD3 receptor [136]. EPHA2 is the EPH Receptor 2A involved in CNS development and congenital cataract [137]. FYN is a protein tyrosine kinase involved intracellular sig-nalling cascades in T-cells and neurons, cell growth, fertilization, mitosis, development and cancer [138]. KMT2A is a lysine specific demethylase 2A involved in gene silencing and heterochromatin maintenance including at centromeric chromatin [139]. It performs demethylation of dimethylated lysine-36 on histone 3 (H3K36me2) at CpG islands. It performs a vital role in the maintenance of the histone code and thus controlling chromatin availability to the transcription machinery [139].
The epigenomewide screen by Schrott and colleagues also identified 256 pathway hits on terms including DNA mainly during cannabis withdrawal [47]. Central DNA functions including transcription (60 genes), promoter activation (106 genes), DNA replication, recombination and repair (12 genes), DNA binding (24 genes), DNA synthesis, replication, recombination and repair (20 genes) and cell signalling, DNA replication, recombination and repair, nucleic acid metabolism and small molecular biochemistry were a few of the pathway annotations identified.
Interestingly it was recently shown that neural progenitor cells derived from induced pluripotent stem cells from patients with Down syndrome had global suppression of transcription related to major alterations in the epigenomic state including chromosomal "introversion", destruction of the nuclear lamina, abrogation of lamina-associated chromosomal domains, and a reduction in topologically active transcriptional domains which collectively closely recapitulated changes of neural stem cell senescence [140]. This global transcriptional suppression was thought to underlie the many neurodevelopmental and morphogenic anomalies with which Down syndrome is associated. Importantly these changes could be collectively improved in vitro when senolytic drugs were administered to the growing neural stem cells which also had the effect of rescuing overall transcription rates [140]. Such findings clearly underscore the close link between CCAR's and accelerated aging pathophenotypes.

Epigenomic Centrosomal Mechanisms
A fascinating recent study described the manner in which centromeric chromatin is a hotspot for DNA breaks and chromosomal translocations [141]. Indeed centromeric instability is known to be linked with cancer, congenital abnormalities and premature aging [141]. Centromeres are hotspots for chromosomal breakage and their inherent fragility is linked to recurrent breakage-fusion-bridge cycles in solid tumours. Indeed spontaneous centromeric chromosomal breakages apparently for simply structural reasons are not uncommon [141].
Investigators revealed that the modified centrosomal histone 3 variant CENP-A together with its chaperone HJURP and dimethylated lysine 4 of histone 4 (H3K4me2) enable a succession of events to allow the recruitment of the RAD51-BRCA1-BRCA2 complex of the homologous recombination pathway to induce formation of a DNA-RNA hybrid and high fidelity end repair. Moreover, inhibition of RAD51 led to the involvement of RAD52 and the activation of the low fidelity mutagenic microhomology end joining pathway leading to centromeric instability and chromosomal translocations [141].
In this context some of the findings of the recent epigenomic screen of Schrott and colleagues is particularly relevant [47]. It has recently been shown that Nuclear Mitotic Apparatus 1 (NUMA1) and the centrosomal proteins (CEP) CEP152 and CEP 55 are key to centrosomal function. The motor proteins kinesin and dynein-dynactin preform key role not only moving chromosomes but also on collecting the ends of microtubules together into cohesive mitotic spindle poles in a bipolar orientation [74,75,[142][143][144][145][146][147]. Table 9 lists the EWAS hits identified in the Schrott datasets for some of these proteins. Interestingly there were 218 hits for the kinesins which were some of the most strongly associated of all the hits identified in cannabis dependence in Schrott's Tables S1 and S4 [47]. It is noted that KIF14 is an alternate nomenclature for KIFC3 which was shown in human and animals models to be particularly involved in these roles [147]. Table 10 lists further EWAS hits from kinesin family members from the Schrott database [47].
Considering the CENP proteins 105 hits were identified (Table 11) including 86 for  CENPN alone (examples shown in Table 12) which is the second CENP to bind in the cascade suggesting global disruption of centromeric chromatin decoration induced by cannabinoids and necessarily downstream centrosomal-kinetochore dysfunction as is indeed observed so prominently epidemiologically. The high level of probability reached for the CENPN epimutations is notable from this study from p = 7.73 × 10 −20 . Indeed this group of cancer linked mutations is amongst the most highly significant of all the DNA methylation hits identified. Laboratory studies of CENP -A, -C and -N have shown that interference with all three key CENP proteins leads to chromosomal mis-segregation errors, aneuploidy, accelerated aging and cancer [73]. Highly penetrant mutations in all three genes individually results in early embryonic lethality [73].
Indeed it has been demonstrated that as many as 90% of solid organ and 50% of haemopoietic cancers are aneuploid which directly implicates epigenomic centromeric/ kinetochore processes in tumourigensis [69].
As shown in Table 13 there were nine hits for RAD51 which occurred in both cannabis dependence and withdrawal (Bonferroni-adjusted p-values from 0.0009 to 0.0262) and only one for RAD52 indicating that cannabis blocks RAD51 and homologous recombination much more severely than lower fidelity pathways such as microhomology-mediated repair which forces errors into centrosomal replication and downstream chromosomal mis-segregation events [141].

Epigenomic Gene Activation Post-Human Fertilization
The patients described in this report were born non-mosaically aneuploid, that is uniformly aneuploid in all body cells. This important finding localizes the genotoxic lesion to either the pre-fertilization male or female gametes or the first cell of the embryo, the bipronuclear fertilized zygote.
A fascinating paper recently appeared which provided details on gene activation from the transcriptome analysis of the earliest stages of human life from the oocyte held at the second meiotic division (MII) across fertilization and into the first three zygotic cell divisions to the eight cell stage [148]. An intriguing group of 21 DNA damage repair, replication, cell growth and cancer related pathways were identified by Ingenuity Pathway Analysis. Pathways identified included particularly kinetochore metaphase signalling pathways and cell cycle control of chromosomal replication pathways which are of particular relevance to the present enquiry into the possible aetiopathogenesis of cannabinoid-induced chromosomal aneuploidy.
6 of 9 (66.7%) of the genes identified in the cell cycle control of chromosomal replication pathways were similarly identified in the EWAS analysis of Schrott namely CDK1, MCM6 (minichromosome maintenance complex component 6, involved in the initiation of chromosomal replication), ORC2 (origin replication complex 2, initiates one round of chromosomal duplication per cell cycle), PCNA (proliferating cell nuclear antigen, increases processivity of DNA polymerase delta and high fidelity DNA damage repair), POLA1 (DNA polymerase 1, which replicates DNA), and TOP2A (topoisomerase 2A, a major topoisomerase which cuts and unwinds DNA for replication and transcription) ( Table 16).        The implication of cannabis in the formation of double minute chromosomes, chromosomal rings and micronuclei was noted above [26,128]. It was recently shown that one or more of these extrachromosomal DNA (ecDNA) rings can form transcription hubs within the cell nucleus which form an oncogenic factory and lead to the expression of various oncogenes acting in trans as enhancers for multi-chromosomal transcription [149]. Such elements are well described in several cancers where they form a key driver of transformation by acting as promiscuous enhancers. Indeed oncogenic selection of ecDNA's occurs and oncogenic enhancer-ecDNA co-selection has also been documented in these ecDNA oncogenic hubs [149].

Exponential Genotoxicity
Numerous laboratory studies document that cannabis genotoxicity and the processes on which genetic stability rests, exhibit an exponential dose-response relationship. Assays of cell growth and cell viability, mutagenesis, inhibition of tubulin, action, DNA and RNA synthesis, and inhibition of DNA methylation resulting in morphine self-administration in offspring all demonstrate exponential dose response relationships [12][13][14][15][16][17][18].
Moreover, this clear exponentiation of dose-response genetic damage in the laboratory is reflected in similar changes epidemiologically as has been well demonstrated in USA [2,[22][23][24].
This now well documented exponentiation becomes a major issue downstream of the rising triple convergence of cannabis use prevalence, daily dosing intensity and cannabinoid potency which is widely reported on both sides of the Atlantic [10,11,166,167] and opens whole communities to sudden encounters with major genotoxic outcomes such as has recently been reported from France with limbless cows and babies occurring at 60 times the background rate [19][20][21] and in Mississippi and Kentucky where rate of atrial septal defect has suddenly shot up 20 times that of ten years ago [168].

Generalizability
A number of factors provide confidence that study results are robust and widely generalizable: 1. Data is based on one of the largest most comprehensive datasets globally; 2. Findings are strongly concordant with all of the other peer reviewed published data in the field internationally [1][2][3][4][5][6][7][8][9]; 3. Both the quantitative and qualitative epidemiological criteria of causality are fulfilled; 4. There is an abundance of causal mechanisms to explain this findings; 5. Results have both internally and external consistency; 6. Similar outcomes were derived by several different regression techniques; 7. Findings have been demonstrated in their native space-time context. Indeed the signals derived are some of the strongest statistical signals seen in all of the field of cannabis teratology.

Strengths and Limitations
The strengths of the present study are that it used one of the world's largest most comprehensive datasets, used the database with information on the most number of genetic pathologies, employed both inverse probability weighting and E-values [99,100,104,169,170], the major techniques of causal inference [102][103][104]171], employed several different forms of regression including spatiotemporal regression [96][97][98], demonstrated both internal consistency and found external concordance with other published results in this field [1][2][3][4][5][6][7][8][9]. Random Forrest regression was used for formal variable selection [92]. Study limitations include that of many epidemiological studies that it did not have availability to individual participant level data. Incomplete data was also an issue both for daily cannabis exposure and Klinefelter's syndrome. Nevertheless, we feel that the way we have addressed the issue of missing data is reasonable and has been openly transparent.

Conclusions
Data present strong evidence demonstrating in inverse probability weighted and causal inferential models that various metrics of cannabis use, especially daily use, are strongly associated with all seven of the chromosomal disorders studied. These findings confirm both many decades of in vitro laboratory work and also several recent epidemiological series from Hawaii, Colorado, Australia, Canada and the USA, as well as an earlier simpler analysis of this same dataset. Naturally these findings are of great concern in their own right. However, when considered in terms of the known exponential dose-response relationship of cannabis genotoxicity and the increasing contamination of the food chain in Europe and parts of the USA, public health concerns must be heightened. The present findings form part of the interface of cannabinoids with the broader spectrum of congenital anomalies which is also an area of great concern. Together with cannabinoid-related carcinogenesis and cannabinoid-potentiated aging syndromes cannabis-related congenital chromosomal anomalies, findings form one pivotal plank of overall cannabinoid genotoxicity. As has been shown elsewhere and also in the present results that cannabis is a much more potent genotoxic than tobacco and alcohol combined (see Tables 1-8 and Tables  S3-S17) it is clear that a rational evidence driven policy in relation to cannabis would restrict access to cannabinoids in the same way as for other powerful genotoxic environmental compounds. As discussed above the field is wide open for genotoxic and epigenotoxic mechanistic studies to investigate epigenetic mechanisms of chromosomal mis-segregation and mitotic spindle dysfunction, cannabinoid-induced chromosomal scission and the gross disorders of meiosis I which lead to increased whole genome doubling events and are part of the testicular cancer oncobiogenesis pathway and the cellular pathology of epigenomic aging. Because CCA's are relatively common it would also be important for future research studies to focus on higher spatial and temporal resolution space-time studies. In time, as inverse probability weighting of geospatial models becomes available such new methodologies which directly interrogate causality could also be applied to these important datasets.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijerph191811208/s1,  Figure S1: International geospatial links used for computing the sparse spatial weights matrix for use in the geospatial regressions (A) edited and (B) final links. Data along with the relevant R code has been made publicly available on the Mendeley Database Repository and can be accessed from these URL's: 10.17632/mvwvxk756z.1 (accessed on 20 January 2022) and 10.17632/vd6mt5r5jm.1 (accessed on 10 January 2022).
Author Contributions: A.S.R. assembled the data, designed and conducted the analyses, and wrote the first manuscript draft. G.K.H. provided technical and logistic support, co-wrote the paper, assisted with gaining ethical approval, provided advice on manuscript preparation and general guidance to study conduct. A.S.R. had the idea for the article, performed the literature search, wrote the first draft and is the guarantor for the article. All authors have read and agreed to the published version of the manuscript.

Funding:
No funding was provided for this study. No funding organization played any role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Institutional Review Board Statement:
The Human Research Ethics Committee of the University of Western Australia provided ethical approval for the study to be undertaken 24 September 2021 (No. RA/4/20/4724). The requirement for patient consent was waived as all study data is deidentified, anonymous, grouped and publicly available.

Informed Consent Statement:
Written informed consent has been obtained. Data Availability Statement: All data generated or analyzed during this study are included in this published article and its supplementary information files.