Pathway Analysis of Genome Wide Association Studies (GWAS) Data Associated with Male Infertility

: Background: Infertility is a common condition affecting approximately 10–20% of the reproductive age population. Idiopathic infertility cases are thought to have a genetic basis, but the underlying causes are largely unknown. However, the genetic basis underlying male infertility in humans is only partially understood. The Purpose of the study is to understand the current state of research on the genetics of male infertility and its association with signiﬁcant biological mechanisms. Results: We performed an Identify Candidate Causal SNPs and Pathway (ICSN Pathway) analysis using a genome-wide association study (GWAS) dataset, and NCBI-PubMed search which included 632 SNPs in GWAS and 451 SNPs from the PubMed server, respectively. The ICSN Pathway analysis produced three hypothetical biological mechanisms associated with male infertility: (1) rs8084 and rs7192 → HLA-DRA → inﬂammatory pathways and cell adhesion; rs7550231 and rs2234167 → TNFRSF14 → TNF Receptor Superfamily Member 14 → T lymphocyte proliferation and activation; rs1105879 and rs2070959 → UGT1A6 → UDP glucuronosyltransferase family 1 member A6 → Metabolism of Xenobiotics, androgen, estrogen, retinol, and carbohydrates. Conclusions: We believe that our results may be helpful to study the genetic mechanisms of male infertility. Pathway-based methods have been applied to male infertility GWAS datasets to investigate the biological mechanisms and reported some novel male infertility risk pathways. This pathway analysis using GWAS dataset suggests that the biological process related to inﬂammation and metabolism might contribute to male infertility susceptibility. Our analysis suggests that genetic contribution to male infertility operates through multiple genes affecting common inﬂammatory diseases interacting in functional pathways.


Introduction
Infertility is an emerging global problem affecting men and women equally [1]. Several causes contribute to both male and female infertility, and male infertility is responsible for 50% of the total number of cases [2,3]. The molecular genetics of male infertility are multifactorial and numerous [4][5][6]. Although several causes have been attributed to impaired male infertility, the majority of cases with inadequate spermatogenesis or defective sperm remain to be understood. Factors contributing are the genetics of an individual, the damaging effect of oxidative stress, sperm DNA Integrity, environmental conditions, lifestyle, patient age, and body mass index (BMI) responsible for the onset of this disorder [6][7][8][9][10][11][12]. Anomalies in genetic factors account for 15-30% of cases of impaired spermatogenesis and male infertility [13,14]. Oxidative stress is the major cause of DNA fragmentation in spermatozoa [7,15]. Antioxidants play a protective role, although a balance of reduction and oxidation is required for essential functions, including fertilization [15]. Studies have confirmed the assumption that polymorphisms in oxidative stress genes might increase the risk of sperm DNA damage and unexplained male infertility [16]. During spermatogenesis, the sperm nucleus undergoes a marked rearrangement, which involves the removal of histones and their replacement by various basic nuclear proteins, including highly positively charged protamines [17,18]. It has been suggested that disturbances in the nuclear condensation of sperm might affect genes involved in the production of functional germ cells [19]. In order to investigate whether changes in the amount or structure of protamine could be the cause of human infertility, sequencing of the protamine gene was performed [20]. The authors first identified a heterozygous SNP in the protamine 1 gene in three infertile men [20]. This SNP causes an amino acid change in a highly conserved arginine cluster. Moreover, the SNP created improper phosphorylation, which could alter both DNA binding and protamine to protamine interaction in the sperm nucleus [20].
In a case control study with the Indian population, two rare novel mutations in sperm nuclear protein genes (protamine and transition nuclear protein) are associated with higher sperm DNA fragmentation [21]. In another study, the involvement of sperm nuclear basic proteins in DNA oxidative damage has been elucidated [7]. The altered ratio of protamines and histones determines an unstable binding to DNA and reduced DNA protection from external agents such as heavy metals [7]. They have shown the importance of arginine residues in the Copper ion induced DNA breakage of sperm histones inducing oxidative damage [7]. In another study, the authors reported molecular alteration of small nuclear basic proteins, DNA, and semen parameters in transgenerational inherited effects due to environmental contaminants [22]. They have found involvement in DNA oxidative damage of human small nuclear basic proteins exposed to pollutants, giving new insights into the toxicity mechanisms of some heavy metals [22].
Moreover, little is known about environmental chemical exposure in young women where kallikrein related serine peptidase is regulated [23]. Environmental factors play role in kallikrein related serine peptidase 3 (KLK3) secretion [23]. Some environmental pollutants could activate KLK3 secretion by prostatic epithelial cells and identify as a new biomarker to assess the early effect of environmental exposure on female reproductive health [23]. It is assumed that genetic makeup contributes to 10-15% of spermatogenic failure for in vitro fertilization [24].
It is helpful to understand sperm DNA integrity and unexplained infertility because DNA fragmentation is present in normozoospermic sub-fertile men [25,26]. Previous studies have shown the involvement of genetic polymorphism in specific genes associated with male fertility [27][28][29]. In one of the study, the authors have reported evidence of a single nucleotide polymorphism in the histidine-rich glycoprotein (HRG) named HRG C633T, which is required for male fertility [30].
Although no studies to prove the relationship between HRG C633T SNP and sperm DNA integrity was observed. It is still not clear that tests of sperm DNA integrity with a high predictive value can determine the result of in-vivo fertilization although the outcome of IVF and ICSI require more analysis and clarification [31]. Previous findings did not support the relationship between sperm DNA integrity and assisted reproduction technologies (ART) success [32][33][34]. While others reported a significant reduction in pregnancy rates following IVF or ICSI among couples with high sperm DNA fragmentation [35][36][37][38].
The current understanding of sperm DNA damage and its effects on reproductive outcomes requires further investigation. Thus, association studies from larger genome wide analysis to identify polymorphisms (SNPs) affecting spermatogenesis, which might lead to oligozoospermia or azoospermia induced male infertility [39]. Genotyping helps us in the identification of genetic factors, and the variants can be explored by single nucleotide polymorphism (SNP) array, NGS next-generation sequencing (NGS), microarrays, comparative genomic hybridization-array (array-CGH) to understand rare variants and disease etiology on a molecular level [13,40].
In this study, we used Genome Wide Associations Studies (GWAS) platform to identify potential SNPs which are related to male infertility. The study of the GWAS dataset is an emerging and powerful source to understand the variants associated with particular diseases [41,42]. This study will help us search for hidden prognostic marker genes which are related to male infertility. There are various modifications associated with GWAS studies which include a pathway-based approach (ICSN pathway). This helps us to understand causal variants associated with complex diseases [43,44]. The combination of the GWAS dataset and ICSN pathway analysis will help us to find out candidate SNPs that target key pathways related to complex diseases. We have collected SNPs from the GWAS dataset and PubMed literature related to male infertility and performed an ICSN analysis to find possible mechanisms associated with male infertility. The consideration of these SNPs and pathways associated with these SNPs can help us screen infertile patients and improve prognosis.
HLA is a large, highly polymorphic gene set located on human chromosome 6 and is subdivided into class I and class II regions (MHCI and MHCII, respectively) [45]. HLA-DRA is mostly present on the cell surface of B-Cell dendritic cells, and macrophages. It plays important role in generating an immune response against pathogen derived peptides by presenting the antigenic peptides to T Cell [46]. HLA-DRA molecule gives the ability to individuals to respond to a variety of pathogens [47]. The polymorphism in the subunit region of the HLA-DRA molecule plays important role in generating an immune response against a variety of pathogens [48,49].
TNFRSF14 belongs to TNF (tumor necrosis factor) receptor superfamily. It plays important role in the inflammatory and inhibitory signaling network. It also plays an important role in the proliferation of T cells and the production of antimicrobial agents. In the case of adaptive immune response interaction of TNFRSF14 with T cell provide survival cell for effector T-cell [50]. UGT1A6 encodes for the gene UDP-glucuronosyltransferase 1-6 enzyme, which plays an important role in the metabolism of xenobiotics [46]. It helps in linking glucuronic acid to its substrate, due to which excretion of xenobiotics from cells becomes easier [46].

Candidate SNPs as input:
The SNPs used in the study were extracted from the NHGRI-EBI catalog of human genome wide association studies. We selected 632 SNPs from the GWAS dataset, which are associated with male infertility. We have selected the most common SNPs having Risk Allele Frequency (RAF) > 0.02 and p-value < 5 × 10 −2 . We also selected significant SNPs from NCBI PubMed (pubmed.ncbi.nlm.nih.gov) associated with male infertility. Finally, 1083 SNPs were used for pathway analysis Pathway Analysis associated with male infertility: Pathway Analysis was applied to understand male infertility related SNPs associated with other complex diseases. Thus, we conducted ICSN (Identify Candidate Causal SNPs and Pathway) analysis and utilized SNPs from the GWAS dataset and NCBI PubMed as input. The selection of candidate causal SNPs by LD analysis and most significant functional SNPs were annotated. The second step includes understanding the biological processes for the preselected candidate SNPs using Pathway Based Analysis (PBA). The False Discovery Rate (FDR) and p-value cut-off for PBA were set as 0.05. There are various options available we selected KEGG, GO Biological process, Biocarta and GO Molecular function to get properly defined and descriptive pathways. We selected the parameters for LD analysis CEU: European American (CEPH) as the HapMap population, LD measurement cut off of r2 = 0.8, and distance search LD neighborhood = 200 kb. Functional SNPs defined can alter protein or gene expression in close association with pathway. Functional SNPs include essential splice site and intronic, non-synonymous coding, and regulatory region. Two conditions were applied for ISCN analysis. the first parameter applied was "within gene" only, and the second parameter was "gene set size minimum 1 and maximum 500". Pathways that were enriched from the ICSN based webtool were drawn using a Bubble plot. The plot was generated using the ggplot2 package of R (version 4.0.5) [51]. Moreover, the network of genes associated with enriched pathways was created using the igraph package of R (version 4.0.5) [52].

Results
To perform pathway enrichment analysis, we utilized the most significant SNPs from GWAS (632) and PubMed survey of male infertility related SNPs (451) having p-value < 5 × 10 −2 as input. We performed in-silico analysis using the ICSN pathway web server and identified six candidate SNPs and thirty-two candidate pathways. The SNPs rs8084, rs7550231, rs2234167, rs1105879, and rs2070959 are in linkage disequilibrium (LD) with rs7192, rs4486391, rs10910093, rs2070959, respectively (r2 = 0.80-0.96), (−log10(P) = 5.523, 1.867, 1.745 respectively). The SNP rs7192, which was represented in the original GWAS (−log10(P) = 5.523), was not in LD with any SNP (Table 1). In our study, the SNPs were associated with the three most significant biological pathways which were as follows: (i) SNP rs8084 and rs7192 encoding for the gene HLA-DRA is a Class II MHC molecule displays antigenic peptides on the surface of APCs and is involved in several inflammatory pathways and cell adhesion. HLA-DRA does not have polymorphism in the peptide binding part and acts as a sole alpha chain for DRB1, DRB3, DRB4, and DRB5. SNPs rs7550231 and rs2234167 encode for the gene TNFRSF14 (TNF Receptor Superfamily Member 14) that are expressed in T cells and activate a cascade of T cell signaling in response to ligand binding. SNPs rs1105879 and rs2070959 encoding for UGT1A6 gene (UDP glucuronosyltransferase family 1 member A6) involved in metabolic pathways. UGT1A6 metabolizes Xenobiotics, androgen, estrogen, retinol, and carbohydrates. The candidate SNPs identified after analysis are thus involved in biological and cellular processes including inflammation, metabolism, and cell cycle regulation. The six candidate causal SNPs and 32 candidate causal pathways in Table 1 are associated with several hypothetical biological processes as represented in Supplementary Materials Table S1 (a) rs8084 (essential splice site and intronic) encoding for the gene HLA-DRA (Major Histocompatibility Complex, Class II, DR Alpha) is influencing 20 inflammatory pathways such as MHC pathway, T Cell Receptor activation, Interleukin 5 signaling, B cell activation, Eosinophil signaling, T.Cell Activation, Inflammatory response, B-Cell surface molecule signaling, Graft Versus Host disease signaling, Allograft rejection, Autoimmune thyroid disease, Type 1 Diabetes, Asthma. (b) rs7192 (non-synonymous coding) also encoding for the gene HLA-DRA (Major Histocompatibility Complex, Class II, DR Alpha) is involved in the above mentioned signaling pathways, (c) rs7550231(regulatory region) and rs2234167 (non-synonymous coding) encoding for the gene TNFRSF14 (TNF Receptor Superfamily Member 14) are involved in 2 inflammatory pathways such as Lymphocyte proliferation and T Cell activation signaling pathways, (d) rs1105879 (non-synonymous coding) and rs2070959 (non-synonymous coding) encoding for the gene UGT1A6 (UDP Glucuronosyltransferase Family 1 Member A6) are involved in 10 metabolic pathways such as Drug metabolism, Xenobiotic metabolism, Androgen, and estrogen metabolism, Pentose and Glucuronate conversion, Porphyrin metabolism, Retinol metabolism, UDP-Glycosyl Transferase Activity, Transferase activity, starch, and sucrose metabolism, and response to xenobiotic stimulus signaling pathways (Supplementary Materials Table S1).
We represented causal pathways based on FDR and the number of variants associated with a pathway in a graphical representation (bubble plot). The plot is drawn with the X axis as enriched pathways and the Y axis as −log10(p) value. The bubbles represent the false discovery rate (FDR) and the number of genes mapped with variants. The genes and the mapped variants in each pathway are shown in Supplementary Materials Table S1. In pathways such as UDP Glycosyl Transferase Activity and Transferase activity Transferring Hexosyl Group, the highest number of genes mapped with variants is nine and FDR 0.024. The nine variants are UGT1A8, UGT1A9, UGT1A10, UGT1A6, UGT1A7, UGT1A4, UGT1A5, UGT1A3, CDA, EXTL1, and ATP13A. Furthermore, the most significant FDR (0.001) having 2 variant genes (HLA-DRB1 and HLA-DRA) influences inflammatory pathways such as the MHC pathway, TCRA pathway, IL5 signaling, B-cell Pathway, Eosinophils Pathway, CTLA pathway, Inflammatory pathway, BB cell pathway and B-Lymphocyte pathway (Figure 1). Figure 1. Bubble plot depicting enriched pathways. X-axis represents enriched pathways and Y-axis represents negative log of p-values. The bubble size is proportional to the false discovery rate (FDR) and color represents the number of genes associated with the pathways. The plot was generated using ggplot2 R package [51].
We have also represented the causal pathways in the network form. We started with the GWAS summary table for 24 genes or variants and 32 pathways downloaded from ICSN Pathway (Supplementary Materials Table S1). Figure 2 illustrates our analysis framework.
We grouped this network into 2 clusters (i) 22 immune related traits and (ii) 10 metabolic traits. The yellow and green circular nodes come under the category of immune related pathways; Blue circular nodes represent metabolic pathways. The immune related genes had more associated pathways individually and more shared pathways. We summarized the features of these datasets in Supplementary Materials Table S1. Figure 2. Network of genes associated with enriched pathways. Square and circular nodes represent the genes and enriched pathways respectively. Size of square node is proportional to the number of pathways associated with each gene. The circular node is segmented into colored pies to illustrate the associated genes with each pathway. Human leukocyte antigen (HLA) and UDP glucuronosyltransferase (UGT) genes are colored respectively with shades of yellow and blue. Remaining genes are indicated in shades of green. Edge represents a gene-pathway association. The network was created using igraph R package [52].
Our goal is to investigate infertility related genetic components which share multiple traits at the pathway level. We employed the igraph R package to evaluate and investigate pathways based on the combined effect of multiple variants and genes which may individually represent associations with the diseases or traits and pathways. Although the pathway itself did not show a clear relationship with male infertility related traits, several genes in this pathway are observed to be associated with metabolism and inflammation. For immune related traits such as MHC pathway, CTLA4 pathway, Eosinophil pathway, TCRA pathway, Inflammatory pathway, B lymphocyte pathway, Th1/Th2 Pathway, IL5 Pathway, and CSK pathway. For metabolism such as Drug metabolism, Metabolism of xenobiotics, Androgen and estrogen metabolism, pentose and glucoronate metabolism, Porphyrin and chlorophyll metabolism, Starch and Sucrose metabolism, and retinol metabolism are associated with the variants.
By exploring the diseases associated with the immune related pathways such as Systemic Lupus Erythematosus, Autoimmune thyroid disease, Graft versus host disease, Type I diabetes mellitus, and Asthma we found that in most cases there is a combined effect of multiple genes rather than driven by a single gene (Figure 2 and Supplementary Materials Table S1).

Discussion
In this study, GWAS analysis has been applied with significant thresholds (5 × 10 −2 ) to select genetic variants for ICSN pathway-based analysis. We explored GWAS and male infertility SNP data to examine functional related genes for their combined effects in a pathway based study. Our results provide preliminary knowledge and insights into functional association related to each gene. One of the advantages of pathway-based analysis is that the combined effect of multiple variants instead of the effects of each individual gene, was examined for disease associations. We thus asked whether the same genes contributed to the disease association at the pathway level. We found individual variants are associated with multiple traits to share biological pathways related to inflammation and metabolism. We hypothesized that indirect pathways are likely due to the pathogenicity of male infertility related genetic variants. This finding was consistent with previous studies that immune-related traits (variants in the TGFB3 gene) may be associated with male infertility [53].
Oxidative stress, sperm DNA integrity, changes in the structure of sperm nuclear proteins, and exposure to environmental pollutants are some elements that can impair fertility in men [7,8,14,17,19,22,23]. Previous studies have shown the association of oxidative stress with male infertility [54]. However, no significant associations between the seven polymorphismic genes (NQO1 rs1800566, SOD2 rs4880, GSTM3 rs1571858, rs3814309, rs7483, GSTM5 rs11807, and GSTP1 rs1695) and risk of male infertility has been noted [55]. But the authors have reported by gene-gene interactions a decreased risk of male infertility in GSTM3 and azoospermia [55]. Genome Wide Association studies have identified the SNPs which are associated with complex genetic traits or diseases. From our current study, we have identified 6 SNPs and 32 significant causal pathways. We identified polymorphic variants in the HLA-DRA gene which are rs8084 and rs7192 located on chromosome 6p21.32 and associated with male infertility. rs8084 and rs7192 are found to control spermatogenesis in humans [56]. It is also expressed on spermatocytes and spermatids which helps in cell adhesion and motility [57]. rs7194 which is 834bp apart from rs7192 is found to be associated with non-obstructive azoospermia (NOA) [39]. This key gene HLA-DRA (MHC class II) is responsible for the selection of T cells, antigen sampling, and activation of an immune response. Previous studies have shown variations of certain alleles in the HLA region to be associated with the development of a wide range of diseases from rheumatoid arthritis [58] to narcolepsy [59] and Parkinson's Disease [60]. Krausz and coworkers reviewed meta-analysis data, case-control, and GWAS studies performed on more than 370,000 SNP in key genes related to cell maintenance (general cell functions), apoptotic process, DNA repair, response to reactive oxygen species, etc., in regard to expression in germ cells [13]. The study concluded that a few SNPs are correlating with infertility phenotype. The five genes that were positively correlated are MTHFR, ESR1/ESR2, NOS3, SIRPA/SIRPG and HLA-DRA13 [40]. For HLA-DRA gene, which is involved in inflammation and responsive to immune regulation, shows a significant decrease in male infertility. Concretely, c.82 + 926A > C, c.*63G > A, and c.724 T > G are the susceptibility factors for autoimmune diseases [54]. They assumed that these changes in the nucleotide and variants may have a role in inflammatory responses of patients with antecedents of urogenital in-flammation54. TNFRSF14 (TNF Receptor Superfamily Member 14) is a member of the TNF receptor superfamily also known as herpes virus entry mediator (HVEM) [61]. Mutation and Polymorphism in the genes cause multiple sclerosis, celiac diseases, and rheumatoid arthritis [62][63][64]. The encoded protein activates inflammatory and inhibitory T cell responses [65]. In the context of male infertility, no evidence has been found.
UGT1A6 (UDP Glucuronosyltransferase Family 1 Member A6) plays an important role in the metabolism of xenobiotics and their conversion into water soluble metabolites [66]. Genetic Polymorphism in UGT1A6 has been observed in Pediatric Epileptic Patients and people with an increased risk of cancer, especially lung and breast cancer [67][68][69]. In the context of male infertility, no association studies have directly been reported. However, we also cannot exclude the possibility that these SNPs may not directly cause infertility. To reach a conclusion, further studies with large cohort and molecular studies would be essential. Thus, there is an urgent requirement to understand the multiple dysregulated effects or new associations of these key genes in terms of male fertility. This study will help in understanding complex conditions and various factors that may be essential for the prevention of adverse effects of a considerable number of drugs and to predict infertility risks. Our ICSN pathway analysis may provide the bases for new hypotheses that could not explain the interaction of genes and pathways in the pathogenesis of male infertility. Futuristic studies are needed to validate the causal nature of SNPs and genes underlying the association between identified pathways and human male infertility. Functional studies will identify the potent SNPs in males having oliozoopermia, asthenozoopermia, and male infertility related disorders. Our studies have indicated a better chance to understand polymorphism and various pathways that are involved in multi-systemic diseases.

Conclusions
In conclusion, we analyzed SNPs related to male infertility to identify genetic association with male infertility susceptibility at both SNP and pathway levels. Additional studies are needed to explore the genetic variation of pathways underlying the pathogenesis of male infertility.