Pan-Cancer Study on Variants of Canonical miRNA Biogenesis Pathway Components: A Pooled Analysis

Simple Summary Previous relations between microRNA machinery gene variants and human cancer risk have not reach the required statistical power. For the first time, the authors pooled data from 33 studies in “The Cancer Genome Atlas” database that cover 8176 pan-cancer samples which yielded 22 eligible variants within 11 genes subjected to further testing under different genetic association models, Cox regression, and trail sequential analyses. The potential roles of miRNA biogenesis genes in cancer development and prognosis have been concluded. Abstract Single nucleotide polymorphisms in genes involved in microRNA processing/maturation and release may deregulate the microRNAome expression levels. We aimed to assess the relationship between miRNA machinery genetic variants and human cancer risk using integrative bioinformatics analyses to identify the role of these genes in cancer aggressiveness. Mutations of 8176 pan-cancer samples were retrieved from 33 studies in “TCGA” database, and a Cox regression model for survival was performed. Next, 22 computationally identified variants within 11 genes were selected based on their high citation rate and MAF. Relevant articles through March 2020 were included. Pooled estimates under the five genetic association models were calculated. Publication bias and heterogeneity between articles were evaluated. Trial Sequential Analysis (TSA) was applied to assess the power and reliability of the draw conclusions. TCGA patients with different cancer types revealed significant alterations in miRNA machinery genes, with mutation frequency ranging from 0.6–13% of samples. RAN was associated with LN metastasis, while TARBP2 and PIWIL1 gene mutations exhibited better overall survival. In the meta-analysis, 45 articles (74,593 cases and 89,198 controls) met the eligibility criteria. Pooled analysis revealed an increased cancer risk with DROSHArs10719*G, RANrs3803012*G, DGCR8rs417309*A, and GEMIN3rs197414*A. In contrast, both DICER1rs1057035*T and GEMIN4rs2743048*G conferred protection against developing cancer. TSA showed the cumulative evidence is inadequate, and the addition of further primary studies is necessary. This study suggests a potential role of miRNA biogenesis genes in cancer development/prognosis. Further functional studies may reveal biological explanations for the differential risks of the machinery variants in different cancer types.


Introduction
microRNAs (miRNAs) are endogenous small non-coding RNAs that are critical regulators of gene expression at the post-transcriptional level [1]. Typically, miRNAs interact with specific target genes through complementary base-pairing to influence the translation or stability of the target mRNA molecules [2]. Individual miRNAs can control the expression of several hundred genes, and each mRNA can be regulated by multiple miRNAs, forming a complex dynamic crosstalk regulatory network [3,4]. miRNAs are involved in almost all biological processes, including cellular development, metabolism, differentiation, proliferation, and angiogenesis. They may function as either tumor suppressors or promoters, giving miRNAs potential diagnostic and prognostic utility [5,6]. Despite compelling evidence demonstrating that overall global miRNA expression is deregulated in human cancers, the exact underlying mechanisms of miRNA deregulation have not been fully elucidated [7]. In addition, multiple emerging studies delineating the role of selected miRNA panels in tumors have produced inconsistent results.
Canonical miRNA biogenesis begins with generating the primary miRNA (pri-miRNA) transcript in the nucleus, where the microprocessor complex, comprising Drosha and DGCR8, do further cleavage to produce the hairpin-shaped precursor miRNA (pre-miRNA). This pre-miRNA is exported to the cytoplasm in an XPO5/RAN GTP-dependent manner and processed to produce the mature miRNA duplex [8]. Finally, either the 5p or 3p strands of the mature miRNA duplex are loaded into the Argonaute (AGO) family of proteins to form a functional miRNA-induced silencing complex (miRISC) to guide the binding to target mRNAs or be released by the cells (Figure 1).
Kinetic modeling studies reveal that miRNAs rank among the fastest produced and longest-lived cellular transcripts [15]. There is a strong correlation between the rate of primary miRNA transcription and that of mature miRNAs, reflecting the rapid turnover of miRNA duplexes even before being loaded into miRISC [9]. Notably, an increase in AGO proteins was directly correlated with the increase in miRNA abundance, indicating that miRISC formation can act as a kinetic bottleneck in miRNA homeostasis [16,17]. Another factor that may contribute to the dynamics of miRNA-mediated gene regulation is the functionalized compartmentalization and shuttling of miRISC within the cells [8]. Emerging evidence showing dysregulation of the miRNA biogenesis pathway components in cancer highlights its pathophysiological relevance in tumor development and progression [18,19]. The expression levels of miRNA machinery components such as DROSHA and DICER1 are suppressed in cancer and associated with advanced tumor stage and poor clinical outcomes [20,21]; therefore, studying the genes involved in miRNA biogenesis could provide a comprehensive insight into the spatial perturbation along the signaling pathway and help identify the putative vital players in the miRNA machinery system that may function as promising therapeutic targets for cancer.
Here, we conducted a meta-analysis to assess the associations between these genetic variants and human cancer risk and applied integrative bioinformatics methods to identify their role in cancer aggressiveness. Our results reveal another level of fundamental principles that control miRNA transcriptomic profile and regulation and open new avenues toward understanding the intracellular dynamics of miRNA biogenesis in cancer. Regulation of microRNA biogenesis (processing, maturation, and release). After transcription of miRNA genes by polymerase II in the nucleus, primary microRNAs (pri-miRNAs) are cleaved by microprocessor DROSHA/DGCR8 complex to liberate 60-70 nucleotide hairpin-shaped precursor microRNAs (pre-miRNAs), which are then exported in the cytoplasm by XPO5/RAN/GTP complex and further cleaved by DICER1 to produce 21-24 nucleotide duplexes. One strand of the miRNA duplex can either associate with the miRNA-induced silencing complex (miRISC) and guide the binding to target mRNAs or be released by the cells. In the former case, sequence complementarity between the seed region of miRNAs and mRNAs mediates gene silencing by targeting mRNA degradation or translational repression in processing bodies (P-bodies). In the latter case, the mature miRNAs can either bind to RNA-binding proteins such as argonaute-2 or to lipoproteins, or miRNAs can be loaded in microvesicles formed by plasma membrane blebbing or in exosomes that are released in the extracellular space upon exocytic fusion of multivesicular bodies with the plasma membrane.
As depicted in Figure 2, a meta-analysis was performed to identify the association between the 22 SNPs and susceptibility to cancer development. In addition, due to the low number of published papers on the association between these genes and cancer progression and aggressiveness features, we performed bioinformatics analysis using TCGA data of 33 cancer types in 8176 cancer patients.

Meta-Analysis on miRNA Biogenesis Gene Variants and Cancer Risk
We performed a systematic review and meta-analysis according to the "meta-analysis of observational studies in epidemiology (MOOSE)" guidelines [24]. The study results were reported following "preferred reporting items for systematic reviews and meta-analyses protocols (PRISMA-P)" regulations [25].

Identification of Eligible Studies
Articles were included in the study if (1) they reported an association between at least one of the selected gene variants and cancer risk, (2) genotype frequencies were available, and (3) the paper was presented as a case-control study. Articles were excluded if they (1) were review articles, abstracts, conference proceedings, or case reports, (2) were nonhuman studies, (3) lacked data necessary for a systematic review, or (4) retrieved repeated published data from previous articles. The discrepancy in study selection was resolved by a discussion with a third author (MSF). Workflow followed for in silico data analysis and meta-analysis. We followed three step (1) Selection of genes enrolled in microRNA machinery canonical pathway followed by a selectio of genetic variants with five or more published articles and minor allele frequency (MAF) of mor than 0.3 after identification of 11 genes, including 22 single nucleotide polymorphisms. (2) Meta analysis was performed to pool the results of 45 published articles. Pairwise comparisons were car ried out for each genetic association model to identify the association between gene polymorphism and cancer risk. Next, heterogeneity analysis, subgroup analysis, trials sequential analysis, and pub lication bias were assessed. (3) To explore the association between genetic variants and cancer prog nosis, transcriptomic and genomic data of 33 different types of cancer were retrieved from the TCGA database. Genetic mutations of the 11 studied genes were screened in association with the expres sion of corresponding genes and the total microRNAs expression in each patient. The Mann-Whi ney U and Kruskal-Wallis tests were employed to test the association between gene mutation an clinicopathological features, including lymph node metastasis, distant metastasis, and pathologica grade. Survival analysis was performed using Cox regression analysis and Kaplan-Meier curv plots were generated. Both univariate and multivariate analyses using total miRNAome level wer conducted.

Meta-Analysis on miRNA Biogenesis Gene Variants and Cancer Risk
We performed a systematic review and meta-analysis according to the "meta-analy sis of observational studies in epidemiology (MOOSE)" guidelines [24]. The study result were reported following "preferred reporting items for systematic reviews and meta-ana yses protocols (PRISMA-P)" regulations [25]. Workflow followed for in silico data analysis and meta-analysis. We followed three steps: (1) Selection of genes enrolled in microRNA machinery canonical pathway followed by a selection of genetic variants with five or more published articles and minor allele frequency (MAF) of more than 0.3 after identification of 11 genes, including 22 single nucleotide polymorphisms. (2) Meta-analysis was performed to pool the results of 45 published articles. Pairwise comparisons were carried out for each genetic association model to identify the association between gene polymorphism and cancer risk. Next, heterogeneity analysis, subgroup analysis, trials sequential analysis, and publication bias were assessed. (3) To explore the association between genetic variants and cancer prognosis, transcriptomic and genomic data of 33 different types of cancer were retrieved from the TCGA database. Genetic mutations of the 11 studied genes were screened in association with the expression of corresponding genes and the total microRNAs expression in each patient. The Mann-Whitney U and Kruskal-Wallis tests were employed to test the association between gene mutation and clinicopathological features, including lymph node metastasis, distant metastasis, and pathological grade. Survival analysis was performed using Cox regression analysis and Kaplan-Meier curve plots were generated. Both univariate and multivariate analyses using total miRNAome level were conducted.

Data Extraction
Two-step screening of articles was performed by two independent authors (RE and ET), inspecting titles/abstracts and examining full text published articles for eligibility. Appropriate data were extracted from the previously published articles in consistent tabulated forms. Study characteristics were collected including first author surname, publication year, country of origin, ethnicity of subject, sample size for cancer and noncancer cohorts, genotyping method, source of controls ("hospital-based" or "populationbased"), the allelic and genotypic distributions for gene variants among cancer and noncancer subjects, and p-value related to Hardy-Weinberg Equilibrium (HWE) for non-cancer controls. Published articles including different sets of comparison groups were extracted individually, and disagreements were discussed to reach an optimum conclusion with the aid of a third investigator (MSF).

Quality Scoring
Quality assessment of articles was conducted based on defined criteria described in a prior publication [26] (Supplementary Tables S2 and S3). These included the patients' representativeness, control source, ascertainment, case-control match, genotyping method, quality control, genotyping analysis, specimens used for ascertaining genotypes, HWE in controls, total sample size, and association evaluation. Quality scores ranged from 0 (worst) to 21 (best), and articles with scores ≥12 were designated as "high quality".

Statistical Analysis for Pooling Results
The genotypic and allelic distributions of the gene variants were analyzed among different ethnic subjects as previously described [27]. The HWE within non-cancer subjects was evaluated using the chi-square test [28]. Pooled odds ratio (OR) along with their equivalent 95% confidence interval (CI) were estimated to explore the potential impact of variants on the susceptibility of developing cancer under various inheritance models (allelic model, recessive model, dominant model, homozygote comparison, and heterozygote comparison) [29]. The comprehensive meta-analysis software version 3.0 (Biostat, Englewood, NJ, USA) was used for statistical calculation, and STATA 16.0 was used to generate the summarized forest plots.

Heterogeneity Assessment
The chi-square-based Q-statistic test [30] along with the I 2 index [31] were used to estimate heterogeneity among numerous published articles. A fixed-effects model was designated to compute the pooled OR for each individual study in cases where the I 2 index < 50% (p-value ≥ 0.10) [32], while a random-effects model was applied to ascertain within-study inaccuracies if the heterogeneity was achieved with the I 2 index > 50% (p-value < 0.10) [33].

Publication Bias
The impact of publication bias was quantified by the utilization of Begg's funnel plot along with Egger's linear regression approach [34]. A significant publication bias was detected using the asymmetric funnel model (p-value < 0.1), and the adjustment of bias was achieved by applying the trim and fill procedure [35].

Trial Sequential Analysis (TSA) ta Extraction
The statistical reliability of this meta-analysis was tested with the aid of TSA by combining the cumulative sample trials of all available studies using the statistical threshold to decrease inadvertent errors and improve the strength of expectations [36,37]. Firstly, two side trials were used with type I error (α) and a power set of 5% [38]; then, when the cumulative Z-curve crossed the monitoring boundaries, a reasonable level of influence was achieved, and no further complementary trials were necessary. However, if the Z-curve failed to reach the threshold boundaries, the expected sample sizes were shown to have not completed the requirements to achieve a suitable significance, and additional trials were required. TSA software version 0.9.5.10 beta was applied in this trial.

The Total microRNAome Expression Level
Across all cancers, the expression of all mature microRNAs of 717 paired tumors and normal samples were compared using the Mann-Whitney U test. Next, the z scores of mutants versus wild-type samples were compared for all tumors using the Mann-Whitney U test, and boxplots were generated. The total miRNome has been estimated as the sum of all the miRNAs expressed in each sample, hence reflecting the total quantity of miRNA present per sample.

Association between Gene Mutations and Total miroRNome and Clinicopathological Characteristics
To identify the impact of miRNA machinery gene mutation on clinicopathological features of tumors, we performed linear regression (for continuous variables) and logistic regression (for categorical variables) using lm and glm R functions. Heatmap and dot plots were generated with pheatmap and ggplot2 R packages. Data are presented as OR and 95% CI. Multivariate analysis was performed using the z scores for total microRNome levels.
Overall analysis for all cancers and stratification by the type of cancer were performed.

Survival Analysis
Univariate and multivariate Cox regression and Kaplan-Meier survival analyses were performed using survival and survminer R packages. Briefly, Cox analysis was run with the coxph function, and the Kaplan-Meier curve was fit with survfit (conf.type = "log-log") and plotted with ggsurvplot. Statistical differences were analyzed with survdiff (rho = 0). Definitions of survival times are described in our prior publication [41].

Principal Component Analysis
A principal component analysis (PCA) was performed with the 11 gene mutants across the pan-cancer study using the FactoMineR and factoextra R packages. Next, a PCA index score was calculated according to the equation: with Lij, loading value of the ith variable of grouping on jth PC, Ei, the eigenvalue of the jth PC, Wi, weight of the ith variable, Xi, the normalized value of ith variable.
From the PCA, hierarchical clustering was performed with the HCPC R function (Fac-toMineR R packages). Ten clusters were identified. Regression and distribution analyses were performed between the ten PCA clusters and clinical/survival parameters. For each parameter analyzed, clusters were compared with each other, and p-values were clustered to extract similar and dissimilar clusters, reducing the number of clusters. Further analyses with all the clinical and survival parameters were performed. The cluster classification presenting the highest significant p-value was chosen as the most optimized cluster stratification. Cluster A gathered the PCA clusters 1, 2, 3, and 5; cluster B, the PCA clusters 7 and 6; cluster C, the PCA clusters 9 and 10; and cluster D, the PCA clusters 4 and 8.  Table S4). The sample size for each study ranged from 159 to 3003. The number of studies and the study population for each SNP are shown in Figure 3A,B. Based on the literature of eligible articles, a total of 20 different cancers were analyzed ( Figure 3C and Supplementary Table S5). Studies covered different geographical distributions, with the majority in China (58 comparisons) and the USA (48 comparisons), followed by Korea (30 comparisons) and Spain (12 comparisons). The assessment score of included studies is listed in Supplementary Table S3. A high score was reported in 138 comparisons compared to 50, with a low-quality score of <12. In addition, 22 genotyping data deviated from HWE and were considered during sensitivity analysis.

Characteristics of Studies Included in the Meta-Analysis
In the current study, 45 articles with 188 allelic discrimination comparisons of 163,791 cohorts (74,593 cancer cases and 89,198 controls) were included   (Supplementary  Table S4). The sample size for each study ranged from 159 to 3003. The number of studies and the study population for each SNP are shown in Figure 3A,B. Based on the literature of eligible articles, a total of 20 different cancers were analyzed ( Figure 3C and Supplementary Table S5). Studies covered different geographical distributions, with the majority in China (58 comparisons) and the USA (48 comparisons), followed by Korea (30 comparisons) and Spain (12 comparisons). The assessment score of included studies is listed in Supplementary Table S3. A high score was reported in 138 comparisons compared to 50, with a low-quality score of <12. In addition, 22 genotyping data deviated from HWE and were considered during sensitivity analysis.

Pooled Analysis of Pairwise Comparisons
As summarized in Figure 4A, the overall analysis revealed higher susceptibility of cancer with the following polymorphisms;

Pooled Analysis of Pairwise Comparisons
As summarized in Figure 4A, the overall analysis revealed higher susceptibility of cancer with the following polymorphisms; There was homogeneity between studies except RAN (C > T; rs14035) (I 2 = 58.9%, p = 0.005), TARBP2 (G > A; rs784567) (I 2 = 86.7%, p < 0.0 and GEMIN4 (G > C; rs2740348) (I 2 = 47.6%, p = 0.039). Figure 4B shows the cellular locat to highlight the biological function of miRNA machinery genes studied in the curr meta-analysis and depicts the significant SNPs that were associated with higher and lo risk of cancer development. showing cellular localization of action for the significant genes and polymorphisms. Out of th studied genetic variants, 9 SNPs were associated with cancer risk. The red gene with SNP I associated with higher cancer risk, while the blue gene and SNP ID conferred protection aga cancer.

Subgroup Analysis and Publication Bias
Iteration of pairwise comparisons stratified by geographical region, cancer type, g otyping method, sources of control subjects, agreement with HWE, and quality score There was homogeneity between studies except for RAN (C > T; rs14035) (I 2 = 58.9%, p = 0.005), TARBP2 (G > A; rs784567) (I 2 = 86.7%, p < 0.001), and GEMIN4 (G > C; rs2740348) (I 2 = 47.6%, p = 0.039). Figure 4B shows the cellular location to highlight the biological function of miRNA machinery genes studied in the current meta-analysis and depicts the significant SNPs that were associated with higher and lower risk of cancer development.

Subgroup Analysis and Publication Bias
Iteration of pairwise comparisons stratified by geographical region, cancer type, genotyping method, sources of control subjects, agreement with HWE, and quality score are depicted in Supplementary Tables S6-S27. Egger's regression test revealed publication bias in the association between RAN*rs14035 and cancer susceptibility (p = 0.046).
For each type of cancer, only one or two studies were available; thus, heterogeneity and publication bias assessments were not performed. However, it was noted that some genetic variants were associated with an increased risk of specific cancer and decreased susceptibility to other types.

Trial Sequential Analysis
TSA was applied to assess the power and reliability of drawn conclusions. Our results revealed that the cumulative Z-curve for DGCR8*rs417309, RAN*rs14035, RAN*rs3803012, DICER1*rs1057035, TARBP2*rs784567, GEMIN3*rs197414, and GEMIN4*rs2740348 transverses the monitoring boundaries of the trial sequential before it approximates the requested sample sizes, indicating that accumulative evidence was convenient, and no further trials are required. However, cumulative evidence for the other genetic variants was inadequate, and additional primary studies are necessary to validate the outcomes ( Figure 5).

Mutation Rates in Cancer
Thirty-three different cancer studies were collected through The Cancer Genome Atlas, corresponding to 8176 samples with mutation information for 11 genes. In the pancancer study, average mutation rates ranged from 0.23% for the RAN gene to 1.87% for the DICER1 gene. As for cancer type, there was a wide variation across studies ( Figure 6). Higher mutation rates were evident in gastrointestinal tumors (esophagus, gastric, gall bladder, and colorectal), uterine cancer, melanoma, and lung carcinoma.

Mutation Rates in Cancer
Thirty-three different cancer studies were collected through The Cancer Genome Atlas, corresponding to 8176 samples with mutation information for 11 genes. In the pan-cancer study, average mutation rates ranged from 0.23% for the RAN gene to 1.87% for the DICER1 gene. As for cancer type, there was a wide variation across studies ( Figure 6). Higher mutation rates were evident in gastrointestinal tumors (esophagus, gastric, gall bladder, and colorectal), uterine cancer, melanoma, and lung carcinoma.

Survival Analysis
As depicted in Figure 7B,C, multivariate Cox regression analysis revealed that PIWIL1 mutant was associated with better disease-specific survival (DSS) (HR = 0.

Total microRNome Analysis
A total of 717 paired tumor and normal samples were compared, and the average total microRNome levels were significantly higher in pan-cancer samples. Across different types of tumors, elevated levels were found in uterine, bladder, lung squamous, stomach, prostate, and kidney clear cell cancers. In contrast, lower expressions of mature miRNAs were observed in thyroid, pancreatic, and kidney chromophobe carcinoma samples compared to paired normal counterparts ( Figure 8A). Higher levels of total microRNome were associated with poor survival times as demonstrated by Cox regression models (DFS: HR = 1.29, 95% CI = 1.19-1.39, p = 1.04 × 10 −10 , and OS: HR = 1.33, 95% CI = 1.24-1.43, p = 2.65 × 10 −15 ) and Kaplan-Meier curves ( Figure 8B). Genetic alterations in DICER1, DROSHA, PIWIL1, and GEMIN4 exhibited higher levels of expressed mature miRNAs ( Figure 8C).

Total microRNome Analysis
A total of 717 paired tumor and normal samples were compared, and the average total microRNome levels were significantly higher in pan-cancer samples. Across different types of tumors, elevated levels were found in uterine, bladder, lung squamous, stomach, prostate, and kidney clear cell cancers. In contrast, lower expressions of mature miR-NAs were observed in thyroid, pancreatic, and kidney chromophobe carcinoma samples compared to paired normal counterparts ( Figure 8A). Higher levels of total microRNome were associated with poor survival times as demonstrated by Cox regression models (DFS: HR = 1.29, 95% CI = 1.19-1.39, p = 1.04 × 10 −10 , and OS: HR = 1.33, 95% CI = 1.24-1.43, p = 2.65 × 10 −15 ) and Kaplan-Meier curves ( Figure 8B). Genetic alterations in DICER1, DROSHA, PIWIL1, and GEMIN4 exhibited higher levels of expressed mature miRNAs ( Figure 8C).
Finally, we assessed the prognosis ability of the mutation count in different cancer types. A higher number of mutations was associated with (a) lymph node dissemination in ESCA (p = 0.024) and (b) late stages in HNSC (p = 0.013) and KICH (p = 0.045) tumors. Inversely, a higher mutation count was more frequent in (a) non-disseminated BRCA  Taken together, these results suggested that miRNA machinery gene alterations present a diverse prognosis prediction. However, the accumulation of mutation tends to be more associated with poor prognosis.
As the accumulation of mutation presented a specific association with clinical and survival parameters, we performed PCA to determine how the panel of gene mutants differentiated cancer samples ( Figure 10). Interestingly, the eleven mutants' distribution could explain more than 25% of the heterogeneity observed across the tumor samples ( Figure 10A), with a clear stratification following mutation count ( Figure 10B). Hence, we used hierarchical clustering on the PCA analysis to further stratify the pan-cancer samples based on the eleven-gene mutant panel. Ten clusters were first identified ( Figure 11A). After a robust analysis of the clinical-survival parameter association, the ten clusters were classified into four optimized subgroups ( Figure 11B), with a significant distribution of the PCA index score (p = 0, Figure 11C) and accumulation of mutations (p = 0, Figure 11D). Cluster C presented the highest number of mutations, whereas cluster A bore the smallest. The four clusters showed a significant distribution across the cancer type (p = 5.16 × 10 −26 ), with cluster A specifically enriched in LAML, LGG, PCPG, THYM, and UVM cancer types, cluster B specifically characterized by KICH, TGCT, and UCS cancer types, cluster C defined by KIRC, KIRP, LIHC, MESO, and PAAD, and finally cluster D by LUAD cancer type.
Hence, we used hierarchical clustering on the PCA analysis to further stratify the pancancer samples based on the eleven-gene mutant panel. Ten clusters were first identified ( Figure 11A). After a robust analysis of the clinical-survival parameter association, the ten clusters were classified into four optimized subgroups ( Figure 11B), with a significant distribution of the PCA index score (p = 0, Figure 11C) and accumulation of mutations (p = 0, Figure 11D). Cluster C presented the highest number of mutations, whereas cluster A bore the smallest. The four clusters showed a significant distribution across the cancer type (p = 5.16 × 10 −26 ), with cluster A specifically enriched in LAML, LGG, PCPG, THYM, and UVM cancer types, cluster B specifically characterized by KICH, TGCT, and UCS cancer types, cluster C defined by KIRC, KIRP, LIHC, MESO, and PAAD, and finally cluster D by LUAD cancer type.   (top table) with corresponding description of cancer type impoverishment (orange) or enrichment (green) in each cluster (table bottom). Purple highlights cancer types presenting a specific enrichment in one cluster.

Stratification Analysis for Total miRNome by Cancer Type
The total miRNome presented a heterogenous distribution across the cancer types, with a lower expression in KIRP, MESO, and TCGT cancer types (p = 0) ( Figure 13A). We investigated the prognosis propensity of the total miRNome marker in a pan-cancer study. We observed that a high score was associated with (a) an absence of distant metastasis (OR = 0.44, 95% CI = 0.31-0.61, p = 1.99 × 10 −4 ) and (b) early stages (I/II, OR = 0.46, 95% CI = 0.37-0.58, p = 1.16 × 10 −12 ; I/II/III, OR = 0.74, 95% CI = 0.58-0.95, p = 0.026) ( Figure 13B). Furthermore, a high total miRNome score was associated with a good prognosis, Interestingly, in the pan-cancer study, the association between the total miRNome score and the cancer aggressive phenotype and outcome partially depended on the accumulation of miRNA machinery gene epigenomic alteration or specific mutations. Indeed, after a multivariate analysis with the mutation count or a single mutation status, the total miRNome score continued to be significantly associated with clinical and survival parameters but with a weaker p-value. Furthermore, we observed a different behavior of the total miRNome score across the cancer types, Supplementary Tables S34-S36. A low total miRNome was still associated with late disease stages in KIRC (III/IV, p = 0.005), dependently on any single gene mutant or the accumulation of mutations, in BRCA (II/III/IV, p = 0.016), dependent on DICER1 and AGO1 mutation status, and in LUSC (II/III/IV, p = 0.046), independently of TARBP2, LIHC (III/IV, p = 0.003) and KICH (IV, p = 0.014), both independently of the miRNA machinery gene alteration. Furthermore, a low total miRNome was also significantly found in lymph node disseminated LUSC cancer type (p = 0.028), independently of the miRNA machinery genes (Figure 14).
A high total miRNome score was more frequent in lymph node metastatic PAAD (p = 0.041) and late stages CESC (III/IV, p = 0.030) tumors, controlled by the miRNA machinery gene alterations. Interestingly, after adjusting the miRNA machinery gene alteration, a high score is also independently associated with distant metastasis in BLCA tumors ( Figure 14).
A low total miRNome score was associated with a recurrence and or survival in ACC   The total miRNome marker is dependent on the miRNA biogenesis epigenomic alterations to be a potential prognosis marker in cancer. Heatmap representing the hierarchical clustering of the TCGA cancer types with the likelihood ratio p-values [Log(p-value)] obtained after univariate and multivariate logistic (for clinical parameters) and Cox (for outcome parameters) regression analyses between z-score miRNome count and clinical or survival parameters, adjusted to gene mutants. The sign of Log(p-value) follows the sign of the regression beta-coefficient exponential. Annotation represents similar logistic regression in the pan-cancer study. * representing significant (p < 0.05) logistic regressions. Figure 14. The total miRNome marker is dependent on the miRNA biogenesis epigenomic alterations to be a potential prognosis marker in cancer. Heatmap representing the hierarchical clustering of the TCGA cancer types with the likelihood ratio p-values [Log(p-value)] obtained after univariate and multivariate logistic (for clinical parameters) and Cox (for outcome parameters) regression analyses between z-score miRNome count and clinical or survival parameters, adjusted to gene mutants. The sign of Log(p-value) follows the sign of the regression beta-coefficient exponential. Annotation represents similar logistic regression in the pan-cancer study.

Discussion
Authors should discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted. Insights into core miRNA biogenesis mutations reveal a link between cancer development and progression [18][19][20][21]87,88]. Knockouts or mutations of core miRNA factors illustrate their requirement for normal development, differentiation, and cell maturation in most tissues [89]. Phenotypes in cancer may stem from concomitant depletion of multiple functional miRNAs or failure to maintain subclasses of active miRNAs. Here, we addressed a diverse array of mutations in different core miRNA machinery genes. A meta-analysis of 22 gene variants was conducted. Pooling the data from 45 articles, our results reveal the association of DROSHA (A > G; rs10719), DGCR8 (G > A; rs417309), RAN (A > G; rs3803012), and GEMIN3 (C > A; rs197414) with a 19-93% higher susceptibility to overall cancer.
The DROSHA-DGCR8 microprocessor complex could mediate microRNA maturation by precise cleavage of the stem-loops that are embedded in primary transcripts in a complementary way, and neither is sufficient to process alone [48]. Based on the HaploReg v4.1 online tool for variant annotations (https://pubs.broadinstitute.org/mammals/haploreg/ haploreg.php), DROSHA (A > G; rs10719) variant can alter HMG box protein 1 (hbp1) motif sequence, a transcription factor, and a potent cell cycle inhibitor in normal and cancer cells [90]. DROSHA* rs10719 is located in the 3 UTR region. Based on bioinformatics website prediction (https://snpinfo.niehs.nih.gov/), this point mutation was found to disrupt the binding site of miR-27b and lead to overexpression of DROSHA gene, which in turn was found to facilitate proliferation and inhibit apoptosis of cancer cells [91,92]. Our findings indicate that the rs417309 SNP of DGCR8 might facilitate oncogenesis. The DGCR8 (G > A; rs417309) variant was the most extensively investigated polymorphism in this gene, and the rs417309-A allele was strongly associated with increased cancer susceptibility. This SNP also exists in the 3 UTR region at the binding sites of the miR-106b and miR-579 genes. Gene overexpression was associated with the risk allele rs417309*A, thus facilitating cancer development [48]. The HaploReg website showed that the SNP could modify the binding sites for Doublesex and Mab-3 Related Transcription Factor 1 (DMRT1) and 2 (DMRT2) and zinc finger-like DNA-binding domain, which are associated with testicular germ cell cancer [93]. RAN is essential for the translocation of pre-miRNAs from the nucleus to the cytoplasm through the nuclear pore complex in a GTP-dependent manner [94]. Nucleotide sequence change in RAN (A > G; rs3803012) leads to modification of the binding site of the transcription factor NANOG, which is associated with the differentiation of pluripotent stem cells and various types of cancer [95]. Studies also proposed that the RAN rs3803012*G allele might affect miRNA-199a-3p targeting and decrease RAN mRNA expression in cancer cells, which may affect miRNA biosynthesis [68]. The GEMIN3 (C > A; rs197414) variant is located in the CTCF transcription factor binding site, a transcription factor linked to chromatin remodeling and genome topology [96]. Taken together, these findings are in line with our meta-analysis results and support the role of miRNA machinery gene mutation in human tumors. Apart from TARBP2 and RAN, our trial sequential analysis results showed that the cumulative evidence for miRNA gene machinery variants was inadequate, and additional primary allelic discrimination studies for the other nine genes are warranted to validate the outcomes in various types of cancers.
To identify the role of miRNA machinery gene mutation in association with clinical and pathological parameters, 33 pan-cancer studies in the TCGA were analyzed. Average mutation rates varied, with the most common being DICER1 (accounting for 1.87% of gene mutations in the sample population), followed by PIWIL1 and DROSHA gene mutations (1.54% and 1.36%, respectively). Higher mutation rates of machinery genes were evident in gastrointestinal tumors (esophagus, gastric, gall bladder, and colorectal), uterine cancer, melanoma, and lung carcinoma. Consistently, genetic variants of DICER1 have been found to modulate the risk of gastric cancer [97], endocrine tumors [98], ovarian cancer [99], testicular germ-cell tumors [98], and neurofibromatosis [100], and have been found to be associated with head and neck cancer recurrence [98] and ovarian cancer prognosis [98]. Gene mutations in Piwi-interacting RNA pathway genes confer susceptibility to esophageal cancer [85], renal cell carcinoma [51], hepatocellular cancer [68], and glioblastoma progression [101]. Moreover, multiple SNPs in DROSHA mutations are common in diverse human malignancies, including Wilms tumor [102], esophageal cancer [85], and breast cancer [48].
Another finding in the TCGA was that seven miRNA-related genes were associated with poor prognostic parameters: the nuclear DROSHA and DGCR8 complex, the cytoplasmic DICER1 and TARBP2 enzymes, and miRISC-related components such as AGO1, AGO2, and PIWIL1. Patients with DGCR8, DROSHA, and TARBP2 mutations were twice as likely to present with lymph node metastasis. DGCR8 was associated with higher susceptibility to distant metastasis. While AGO1 and DGCR8 mutations conferred around 2.5 times higher risk of advanced tumor size, DROSHA mutations were 2.5 times more likely to present with poorly differentiated tumors at the time of diagnosis. Multivariate Cox regression analysis revealed that the PIWIL1 mutant was associated with better survival. In contrast, DGCR8 mutation carriers presented with shorter five-year survival. It remains to be defined whether these SNPs affect the expression and function of miRNA machinery genes directly or are merely tagging SNPs.
While the canonical pathway generates the large majority of miRNA species, it is worth noting that emerging studies reveal a growing number of Drosha-independent and Dicer-independent alternative mechanisms that generate functional microRNAs and contribute to distinct phenotypes among core microRNA biogenesis mutants. Interpretation of the consequences of these mechanisms is challenging, given the paucity of currently available publications and public databases. A better understanding of these mechanisms will lead to better prediction of the health effects of each perturbation. Overall, we provide the most comprehensive bioinformatics analysis and meta-analysis on genetic variations in 11 miRNA machinery pathway genes supporting their association with the risk and prognosis of 33 different types of cancer. Nonetheless, it should be noted that although our study has one of the largest collections of pan-cancer patients, it differs from the recent interesting bioinformatic study carried out by Galka-Marciniak et al. [103] in running a systematic review and metanalysis for all published related studies in the specified period of the analysis. A limited number of studies per SNP constrains the ability of polygenic assessment and decreases the confidence of having sufficient power for each cancer subtype analysis. Future independent replication studies and functional characterizations are needed to explore the potential gene-gene and gene-environment interactions.

Conclusions
Our results suggest that genetic polymorphisms of the miRNA-machinery genes may affect cancer susceptibility and progression. Genetic mapping and functional characterizations are warranted to identify causal SNPs and their underlying molecular mechanisms. Defining the landscape of perturbation with a functional consequence may aid in risk stratification for high-risk populations and facilitate the development of future treatments.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15020338/s1, Table S1: Characteristics and selection criteria of single nucleotide Polymorphisms in microRNA machinery genes; Table S2: Scale for methodological quality assessment; Table S3: Quality score assessment for included studies; Table S4: Characteristics of the included studies., Table S5: Distribution of samples according to the type of cancer, Table S6: Meta-analysis of the association between DROSHA (A > G; rs10719) variant and cancer risk.; Table S7: Meta-analysis of the association between DROSHA (G > C; rs6877842) variant and cancer risk.; Table S8: Meta-analysis of the association between DGCR8 (G > A; rs3757) variant and cancer risk; Table S9: Meta-analysis of the association between DGCR8 (G > A; rs417309) variant and cancer risk; Table S10: Meta-analysis of the association between DGCR8 (T > G; rs1640299) variant and cancer risk; Table S11: Meta-analysis of the association between XPO5 (T > G; rs11077) variant and cancer risk; Table S12: Meta-analysis of the association between RAN (C > T; rs14035) variant and cancer risk; Table S13: Meta-analysis of the association between RAN (A > G; rs3803012) variant and cancer risk; Table S14: Meta-analysis of the association between DICER (A > T; rs13078) variant and cancer risk; Table S15: Meta-analysis of the association between DICER (T > C; rs1057035) variant and cancer risk; Table S16: Meta-analysis of the association between DICER (A > G; rs3742330) variant and cancer risk; Table S17: Meta-analysis of the association between TARBP2 (G > A; rs784567) variant and cancer risk; Table S18: Meta-analysis of the association between AGO1 (A > G; rs595961) variant and cancer risk; Table S19: Meta-analysis of the association between AGO1 (G > A; rs636832) variant and cancer risk; Table S20: Meta-analysis of the association between AGO2 (C > A; rs4961280) variant and cancer risk; Table S21: Meta-analysis of the association between GEMIN3 (T > C; rs197412) variant and cancer risk; Table S22: Meta-analysis of the association between GEMIN3 (C > A; rs197414) variant and cancer risk; Table S23: Meta-analysis of the association between GEMIN4 (G > A; rs7813) variant and cancer risk; Table S24: Meta-analysis of the association between GEMIN4 (G > C; rs2740348) variant and cancer risk; Table S25: Meta-analysis of the association between GEMIN4 (C > T; rs3744741) variant and cancer risk; Table S26: Meta-analysis of the association between PIWIL1 rs1106042 G > A variant and cancer risk; Table S27: Meta-analysis of the association between PIWIL1 rs10773771 C > T variant and cancer risk; Table S28: Overall survival analysis for TCGA pancancer dataset, Table S29: Disease-specific survival analysis for TCGA pan-cancer dataset; Table S30: Analysis of mutation count and gene mutants association with clinical parameters-PER-cancer study; Table S31: Analysis of mutation count and gene mutants association with survival parameters-PERcancer study; Table S32: Logistic and cox regression analysis with PCA Index score; Table S33: Analysis of PCA clusters association with clinical and survival parameters; Table S34: Logistic and cox regression analysis with z-score miRNome; Table S35: Logistic and cox regression analysis with z-score miRNome in univariate and multivariate analyses (adjusted to a single gene mutant or the mutation count)-PAN cancer analysis; Table S36: Logistic and cox regression analysis with z-score miRNome in univariate and multivariate analyses (adjusted to a single gene mutant or the mutation count)-PER cancer analysis.  Acknowledgments: A sincere thank you to Loula Burton from Tulane's Research Proposal Development Office for her diligent editing and proofreading of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.