Evaluation of Male Fertility-Associated Loci in a European Population of Patients with Severe Spermatogenic Impairment

Infertility is a growing concern in developed societies. Two extreme phenotypes of male infertility are non-obstructive azoospermia (NOA) and severe oligospermia (SO), which are characterized by severe spermatogenic failure (SpF). We designed a genetic association study comprising 725 Iberian infertile men as a consequence of SpF and 1058 unaffected controls to evaluate whether five single-nucleotide polymorphisms (SNPs), previously associated with reduced fertility in Hutterites, are also involved in the genetic susceptibility to idiopathic SpF and specific clinical entities. A significant difference in the allele frequencies of USP8-rs7174015 was observed under the recessive model between the NOA group and both the control group (p = 0.0226, OR = 1.33) and the SO group (p = 0.0048, OR = 1.78). Other genetic associations for EPSTI1-rs12870438 and PSAT1-rs7867029 with SO and between TUSC1-rs10966811 and testicular sperm extraction (TESE) success in the context of NOA were observed. In silico analysis of functional annotations demonstrated cis-eQTL effects of such SNPs likely due to the modification of binding motif sites for relevant transcription factors of the spermatogenic process. The findings reported here shed light on the molecular mechanisms leading to severe phenotypes of idiopathic male infertility, and may help to better understand the contribution of the common genetic variation to the development of these conditions.


Introduction
Male infertility is considered one of the major health concerns in developed societies, affecting 10-15% of couples of childbearing age worldwide. The clinical manifestations of this condition are highly heterogeneous, as they may be influenced by physical, environmental, or genetic causes, the latter being one of the major causes [1]. Indeed, it has been reported that the two most extreme phenotypes of male infertility, i.e., severe oligospermia (SO, very low concentration of spermatozoa in semen) and non-obstructive azoospermia (NOA, complete lack of sperm in the ejaculate due to non-obstructive causes), have an important genetic component [2]. These two male infertility manifestations are characterized by severe spermatogenic impairment (SpF) and their known primary causes include different genetic alterations, such as point mutations on genes with key roles in the male gametogenesis process, Y-chromosome microdeletions, and karyotype abnormalities [3]. However, the etiology remains obscure in most SpF cases, and different pieces of evidence suggest that this idiopathic form of male infertility has a complex etiology in which common variation of the human genome, mostly single-nucleotide polymorphisms (SNPs) and copy-number variants (CNVs), may play a relevant role [1,3].
One of the most successful strategies to investigate the possible influence of common genetic variation in the development of complex traits is the genome-wide association study (GWAS) approach, in which millions of genetic polymorphisms are interrogated in a hypothesis-free fashion across the whole genome [4]. In a previous study, Kosova and colleagues [5] performed a GWAS to determine the possible causes of reduced male fertility in a study cohort composed of Hutterite men with reported fatherhood. Hutterites are a North American ethno-religious population of European descent in which contraception is proscribed, resulting in large family sizes. The authors described different genes associated with family size and several semen parameters, including TUSC1 (MIM*610529; encoding the tumor suppressor candidate 1, which is down-regulated in non-small-cell lung cancer and small-cell lung cancer cell lines), PSAT1 (MIM*610936; encoding a phosphoserine aminotransferase expressed in the testis), EPSTI1 (MIM*607441; encoding the epithelial stromal interaction protein 1 highly expressed in the testis), USP8 (MIM*603158; encoding a ubiquitin specific protein that regulates endosome morphology and it is also highly expressed in the testis), and DPF3 (MIM*601672; encoding a transcription regulator involved in chromatin remodeling) [5].
Taking all the above into consideration, we decided to analyze for the first time whether the genetic markers of male fertility identified in the Hutterite population also conferred risk to severe spermatogenic failure (SpF), in a large cohort of Iberian men diagnosed with SO and NOA. Specific clinical entities of NOA, as well as probability of success in sperm retrieval with testicular sperm extraction (TESE) techniques, were also tested for association.

Study Design and Study Population
An Iberian population of 725 infertile men due to SpF (comprising 495 NOA patients and 230 SO patients) and 1058 unaffected Iberian male controls (both of European descent) were enrolled in this study. Although no principal component analyses (PCA) were performed to detect possible outliers, all participants provided a self-reported European ancestry of the Iberian Peninsula. SpF cases were recruited in different private fertility clinics as well as public centers and hospitals from Spain and Portugal. The control population included 700 population-representative healthy subjects with self-reported fatherhood as well as 358 men with normal spermatozoa number and motility, as previously described [6]. Case and control populations were matched by age, ethnicity and geographical origin (that is, all cases and controls were Iberians with European ethnicity).
Informed written consents were signed by all participants, and the procedures followed in this study were approved by the local ethical committees of every participating center, according to the tenets of the Declaration of Helsinki.
The selection criteria used to include the infertile men were based on a thorough exam of individuals showing total absence of sperm in the ejaculate (NOA) or <5 million spermatozoa/mL semen (SO) confirmed by two high-speed centrifugation processes in two different semen samples, consistent with the guidelines of the World Health Organization [7]. The medical history records were revisited to extract information related to physical examination, karyotype analysis, endocrine analysis of follicle stimulating hormone (FSH), luteinizing hormone (LH), and testosterone, as well as Y-chromosome microdeletions screening, and patients with known, genetic and non-genetic, causes of infertility were excluded from the study. In this regard, only individuals with normal karyotype, absence of Yq azoospermia factor (AZF) deletions, and a normal history of testicular development were included. In addition, those patients with a testicular biopsy performed, were classified into different subgroups according to clinical and histological data, including hypospermatogenesis (HS, extremely low numbers of mature motile sperm cells in few testicular locations), maturation arrest of germ cells (MA, >90% of maturation arrest of the germ line either at spermatogonia or at primary spermatocyte stages), and Sertoli cell-only syndrome (SCO, total absence of germ cells). Two additional subgroups were also established accordingly with the outcome in the TESE techniques (including both TESE and micro-TESE), named TESE-(including those NOA individuals in which no mature sperm cell could be retrieved from the biopsy) and TESE+ (patients with a successful sperm extraction from the biopsy). All the available information about clinical features of the patients is shown in Table S1.

SNP Selection and Genotyping
Three intronic variants of USP8 (rs7174015), DPF3 (rs10129954), and EPSTI1 (rs12870438), as well as two intergenic variants in the regions harboring PSAT1 (rs7867029) and TUSC1 (rs10966811) were selected to determine their possible association with male infertility traits in our study population. The SNP selection was based on the findings by Kosova et al. [5], where they were reported to correlate with family size in a Hutterite population and with semen parameters in an independent cohort of Chicago men.
For newly recruited individuals, genomic DNA was extracted from peripheral white blood cells using the QIAamp ® DNA Blood Midi/Maxi (Qiagen, Hilden, Germany), Wizard ® Genomic DNA Purification Kit Protocol (Promega, Madison, WI, USA), or MagNA Pure LC-DNA LV Isolation kit I (Roche, Basel, Switzerland), following the procedures described by the manufacturers. The genotyping was carried out using the TaqMan TM SNP genotyping technology (Applied Biosystems, Foster City, CA, USA). The real-time quantitative polymerase chain reactions (PCR) and the post-PCR allelic discriminations were per-formed with predesigned TaqMan TM probes (assay IDs: C__26249696_10, C__31364474_20, C__32072246_20, C__30534824_10 and C___3123309_10) on a 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA), as described elsewhere [6].

Statistical Analysis
CaTS Power Calculator for Genetic Studies program [8] was used to estimate the statistical power of our study. All the statistical analyses were performed with the software Plink v1.9 [9]. Possible deviance from Hardy-Weinberg equilibrium (HWE) was evaluated in both cases and controls at the 5% significance level. To test for association between the candidate SNPs and male infertility traits, different case-control comparisons were conducted. In a first step, the whole group of SpF cases was compared against the control one. Afterwards, SpF men were divided into two different subgroups (SO and NOA) and, finally, the NOA set was further subdivided into four additional subgroups (SCO, MA, HS and TESE-). All the established case subgroups were tested against both the control group and the remaining cases not showing the specific clinical phenotype for every subgroup (in order to eliminate having NOA or SO as possible confounding variable). Allele and genotype frequencies of every tested group were compared by means of logistic regression with geographical origin (Spain or Portugal) as covariate, and assuming additive, recessive, dominant, and 2 degree of freedom (genotypic) models. p-values, odds ratios (ORs), and their 95% confidence intervals (CI) were then calculated, and p-values lower than 0.05 were considered statistically significant. Possible multiple testing effects were evaluated with the Bonferroni method.
In addition, to provide an illustrative picture of the putative functional role of the tested variants, we conducted enrichment analyses of both gene ontology (GO) terms and protein-protein interactions (PPI), considering all predicted transcription factors whose binding sites (TFBS) were altered by the lead SNPs and their proxies according to position weight matrices (PWM), using the tools for that purpose of the Retrieval of Interacting Genes/Proteins (STRING) portal [17].

Results
This study was conducted with an appropriate overall statistical power, as shown in Table S4. No significant deviation from HWE either in cases or controls was observed (p < 0.05). The genotyping success rate for every analyzed SNP was over 98%, and the minor allele frequencies (MAF) of the control groups were consistent with those of both the Iberian subpopulation (IBS) and the European super population (EUR) of the 1KGPh3 [18]. All of this evidence reinforces the reliability of the generated data and the proper implementation of the methodology used.
In a first approach, we compared the allele and genotype frequencies of the five analyzed SNPs between the SpF group (which comprises all the infertile individuals of our study cohort) with those of the unaffected control population. No significant differences between them were observed under any of the tested models (Table 1).

Susceptibility to Non-Obstructive Azoospermia and Specific Histological Manifestations
Subsequently, we compared the NOA group and the different NOA subgroups against the unaffected control group. Significant p-values were observed in the analysis of the USP8-rs7174015 SNP frequencies of NOA cases against controls under both the additive and recessive models (P ADD = 0.0402, OR = 1.18, P REC = 0.0226, OR = 1.33), and a suggestive p-value was obtained in the genotypic model (P GENO = 0.0709) ( Table 1). Consistent with this, similar results were obtained when the NOA group was compared against SO samples as control group (P ADD = 0.0323, OR = 1.29; P REC = 0.0048, OR = 1.78; P GENO = 0.0178) ( Table 2). The association under the recessive model remained significant after adjustment for multiple testing (P REC-BONF = 0.0242).
In addition, a trend towards association was evident for this USP8-rs7174015 SNP when the allele frequencies between the subgroup of NOA patients with a negative TESE outcome (TESE-) were compared against both the unaffected control group (P ADD = 0.0594, OR = 1.28, P REC = 0.0977, OR = 1.38) and the subgroup of NOA patients with a positive TESE outcome (TESE+, P ADD = 0.0865, OR = 1.40) (Tables 1 and 2). Finally, suggestive p-values were also yielded in the HS vs. no HS comparison under both the additive (P ADD = 0.0727, OR = 0.64) and recessive (P REC = 0.0824, OR = 0.48) models ( Table 2).
The subphenotype analysis between NOA cases with and without specific histological patterns/TESE success also reached statistical significance in the analysis of the TUSC1-rs10966811 variant. The minor allele of such SNP showed a significant recessive risk of the HS subphenotype (P REC = 0.0205, OR = 2.88). Consistent with this observation, the TUSC1-rs10966811 genotype frequencies were also significantly different between the NOA subgroup of patients with HS and that without this specific spermatogenic failure (P GENO = 0.0295). Similarly, the comparison between TESE-vs. TESE+ NOA patients also demonstrated that this same minor allele conferred risk of an unsuccessful TESE in a recessive manner (P REC = 0.0407, OR = 0.44) ( Table 2).
The remaining analyzed SNPs (DPF3-rs10129954, EPSTI1-rs12870438 and PSAT1-rs7867029) showed no evidence of association with any of the histological patterns considered (either when the NOA subgroups were compared against the control population or in the intra-disease comparisons).

Susceptibility to Severe Oligospermia
A protective effect for SO predisposition was demonstrated for the minor allele of EPSTI1-rs12870438 in the case-control comparison under both the additive and dominant models (P ADD = 0.0229, OR = 0.75, P DOM = 0.0388, OR = 0.70). The genotype distribution of this SNP was considerably different (albeit not significant) between the SO group and the control one (P GENO = 0.0745) ( Table 1). Suggestive p-values were also found for PSAT1-rs7867029 in the SO vs. controls analysis under both the additive and dominant models (P ADD = 0.0728, OR = 0.71; P DOM = 0.0548, OR = 0.67) ( Table 1).
On the other hand, when the SO group was compared against the NOA one (in order to detect SO-specific associations), significant differences in the allele frequencies were found for PSAT1-rs7867029 considering both additive and dominant effects (P ADD = 0.0351, OR = 0.66; P DOM = 0.0187, OR = 0.61). The genotype distributions between SO and NOA groups for this SNP also differed significantly (P GENO = 0.0487) ( Table 2).
No evidence of association was observed in any of the tests performed between SO versus both NOA and control groups for DPF3-rs10129954 or TUSC1-rs10966811 (Tables 1 and 2).

Evaluation of Functional Annotations
We further searched for functional annotations of the 4 polymorphisms that showed significant associations with male infertility traits in this study and their proxies (r 2 > 0.8) in the European super population (EUR) of the 1KGPh3 (Tables S5-S7). None of the lead or proxy variants were located in coding regions, CpG Islands, or miRNA target sequences according to SNPnexus [15]. Because of that, we decided to focus on other possible regulatory effects that may alter the normal gene expression levels in the testis, exploring first the transcriptome data in the GTEx project (analysis release V8) [11].
As indicated in Figure 1, the lead SNP variant USP8-rs7174015 and 19 of its proxies displayed evidence of functionality in the testicular tissue as eQTL, with 11 of them affecting the expression levels of USP8, USP50, and AP4E1, and the remaining ones influencing also the RP11-562A8.5 transcription levels ( Figure 1). Interestingly, these four genes showed a considerable high expression in the testis according to both the Human Protein Atlas [19] (http://www.proteinatlas.org) and the GTEx database [11] (Figures S1-S4). Indeed, a testis-specific expression was evident for USP50 and RP11-562A8 (Figures S2 and S4). Moreover, the SNPs in this linkage disequilibrium (LD) block were also annotated as eQTLs and sQTLs in multiples tissues, including ovary (Tables S5 and S6).
At the cellular level, recently published data from single-cell RNA-seq experiments on puberty human testes (Figure 2A) [20] showed that: (1) USP8 was mostly expressed in spermatogonia, spermatocytes, spermatids, and Sertoli cells ( Figure 2B), (2) USP50 was detected almost exclusively in spermatocytes and spermatids ( Figure 2C), and (3) AP4E1 had a diffuse expression in multiple cell types ( Figure 2D), thus suggesting a possible role of their encoded proteins in the spermatogenic process. No single-cell transcriptome data was available for RP11-562A8. displayed evidence of functionality in the testicular tissue as eQTL, with 11 of them af-fecting the expression levels of USP8, USP50, and AP4E1, and the remaining ones influencing also the RP11-562A8.5 transcription levels ( Figure 1). Interestingly, these four genes showed a considerable high expression in the testis according to both the Human Protein Atlas [19] (http://www.proteinatlas.org) and the GTEx database [11] (Figures S1-S4). Indeed, a testis-specific expression was evident for USP50 and RP11-562A8 ( Figures  S2 and S4). Moreover, the SNPs in this linkage disequilibrium (LD) block were also annotated as eQTLs and sQTLs in multiples tissues, including ovary (Tables S5 and S6). Overlaps are highlighted with different colors: blue for expression quantitative trait locus (eQTL) effects in the testis (affected genes are shown); green for active enhancers, active promoters, and transcription factor binding sites (TFBS) from chromatin immunoprecipitation flowing by sequencing (ChIP) experiments in the testis (using ENCODE data); orange for other epigenetic marks of the ENCODE and Roadmap Epigenomics projects (such as histone methylation and DNAase hypersensitivity); violet for TFBS modifications related to transcription factors involved in spermatogenesis based on protein weight matrix (PWM) data; and pink for functional prediction scores, in which the heatmap displays the probability of functionality for each tested variant (dark pink indicates higher probability), according to the different calculation methods described in Supplementary Tables S2 and S3. At the cellular level, recently published data from single-cell RNA-seq experiments on puberty human testes (Figure 2A) [20] showed that: (1) USP8 was mostly expressed in spermatogonia, spermatocytes, spermatids, and Sertoli cells ( Figure 2B  Overlaps are highlighted with different colors: blue for expression quantitative trait locus (eQTL) effects in the testis (affected genes are shown); green for active enhancers, active promoters, and transcription factor binding sites (TFBS) from chromatin immunoprecipitation flowing by sequencing (ChIP) experiments in the testis (using ENCODE data); orange for other epigenetic marks of the ENCODE and Roadmap Epigenomics projects (such as histone methylation and DNAase hypersensitivity); violet for TFBS modifications related to transcription factors involved in spermatogenesis based on protein weight matrix (PWM) data; and pink for functional prediction scores, in which the heatmap displays the probability of functionality for each tested variant (dark pink indicates higher probability), according to the different calculation methods described in Supplementary Tables S2 and S3. Moreover, six of the above mentioned linked SNPs (including USP8-rs7174015) overlapped with chromatin marks related to active enhancers (H3K37ac and H3K4me1), active promoters (H3K4me3), and with a TFBS of CTCF (which is involved in the conformation of the topologically associated domains) in the adult testis, according to ChIP-seq ENCODE data [13] (Figure 1 and Table S5). These variants also mapped to loci with several different overlapping regulatory marks in multiple tissues (including ovary) and cell lines according to Roadmap Epigenomics, ENCODE, and Ensembl Regulatory Build databases [13,21,22], thus supporting the putative regulatory relevance for this region. The output data obtained from HaploReg [14] for the USP8-rs7174015 LD block highlighted a large number of TFBS that were predicted to be altered by such linked SNPs based on PWM data (Table S5). We decided to prioritize them according to overlap with putative testis-specific TFBS by querying the GeneCards Suite [23] and by performing a comprehensive bibliographic search. Notably, 8 out of all the tested SNPs were predicted to change the binding motif site of transcription factors potentially involved in testicular function (Figure 1, Tables S5 and S8). For instance, rs3098174 and rs56398519 were predicted to change the TFBS of FOXJ1, a transcription factor specifically required for the formation of motile cilia and which has been reported as an important member of a pathway involved in sperm maturation in murine models [24]. Similarly, the rs3098171 SNP modified the TFBS of HSF1, a stress-inducible and DNA-binding transcription factor that plays a central role in the activation of the heat shock response (HSR), and which has been proposed essential for spermatogenesis [25]. Both rs12593481 and rs3131574 SNPs were annotated to alter the TFBS of PAX5 and NR6A1, respectively. These transcription factors have a known key role in spermatogenesis and are highly related to sperm formation and male infertility [26] (Figure 1, Tables S5 and S8). Different scores indicative of a possible functional effect of the tested variants were also calculated with tools such as RegulomeDB, CADD, deppSEA, EIGEN, FATHMM, fitCons and ReMM (Figure 1 and Table S5). Overall, both USP8-rs7174015 and rs12593481 showed higher scores, thus suggesting that they are the most likely causal variants of this LD block.
The USP8-rs7174015 SNP and its proxies were also annotated as eQTLs and sQTLs in multiples tissues (Tables S5 and S6), which highlights the high relevance of this genomic region in regulatory processes. Moreover, six of the above mentioned linked SNPs (including USP8-rs7174015) ove lapped with chromatin marks related to active enhancers (H3K37ac and H3K4me1), acti promoters (H3K4me3), and with a TFBS of CTCF (which is involved in the conformati of the topologically associated domains) in the adult testis, according to ChIP-seq E CODE data [13] (Figure 1 and Table S5). These variants also mapped to loci with seve different overlapping regulatory marks in multiple tissues (including ovary) and cell lin according to Roadmap Epigenomics, ENCODE, and Ensembl Regulatory Build databas On the other hand, TUSC1-rs10966811, EPSTI1-rs12870438, PSAT1-rs7867029 and their corresponding proxies showed no significant effects on gene expression in the testis according to GTEx [11]. However, rs10812205 (a TUSC1-rs10966811 proxy) as well as rs58357177, rs9590722, rs9594826, and rs9594827 (all of them EPSTI1-rs12870438 proxies) overlapped with an open chromatin state in the testis according to ChIP-seq data from ENCODE [13], and other regulatory marks in multiple tissues. Furthermore, the SNPs rs10966813 and rs11789162 (proxies of TUSC1-rs10966811) were located in predicted target sequences of DMRT2 (rs10966813), DMRT7 and DMRT1 (rs11789162) according to Hap-loReg [14], a family of transcription factors with a key role in male sex determination and spermatogenesis [27]. The RegulomeDB score and the other functional prediction scores also suggested that the SNPs rs10812205, rs62534083, rs1535898, rs9590722, rs9594827, and rs9594829 were more likely to exert the functional effect (Table S7).
Finally, to provide a global overview of the possible pathways involved in male infertility associated with the putative causal variants, we accomplished a PPI and biological pathway enrichment analysis with 199 transcription factors that had target sequences altered by such SNPs (Tables S5 and S7). The molecular network of the selected proteins had significantly more interactions than expected (number of nodes, 98; number of edges, 459; average node degree, 9.37; clustering coefficient, 0.372; expected number of edges, 89; PPI enrichment, p < 1 × 10 16 , Figure S5). Regarding the functional enrichment of the network, biological processes with the highest significant p-values were those related to gene expression regulation processes (Table S9), consistent with the provided evidence described above. Interestingly, spermatogenesis (GO:0007283) was one of the GO terms significantly enriched in the transcription factor set (p = 0.0004). Indeed, some members of this biological process, such as YY1, BCL6, HOXA10, ZBTB16 (PLZF), and PAX5 (highlighted in red in Figure S5) represented relevant nodes in the PPI network.

Discussion
Idiopathic male infertility is expected to have a complex etiology likely influenced by genetic, epigenetic, and environmental factors [3]. Regarding its genetic basis, it has been estimated that the most severe expressions of this condition (NOA and SO) have a high heritability with a polygenic inheritance, in which many loci may exert an additive effect on the pathological phenotype [1]. In the present study, we aimed to perform the first attempt to evaluate the potential implication of five SNPs in the arising of SpF phenotype, previously associated with reduced fertility in men [5], in the largest European case-control cohort included in a genetic study to date. However, it should be noted that no PCA or determination of the human Y-chromosome haplogroup was conducted to confirm the ancestry, which represents a limitation of our study. Indeed, interactions between Y-haplogroups and autosomal variants on spermatogenic impairment, such as NOA, have been reported [28].
Our results suggest that both EPSTI1-rs12870438 and PSAT1-rs7867029 are involved in the pathological mechanisms underlying SO, whereas the intergenic SNP USP8-rs7174015 may contribute to the genetic susceptibility to NOA. Additionally, the minor allele of TUSC1-rs10966811 (A) was associated with a higher predisposition to HS-NOA subphenotype and, consequently, with a higher probability of TESE success. This observation should be highlighted, as it could help to develop reliable predictive panels for the likelihood of sperm retrieval from testicular biopsies of infertile men seeking to father a biological child, thus improving substantially the increasingly demanded counseling about the suitability of undergoing surgery in such cases [29].
Consistent with our observations, Kosova and colleagues [5] described that the risk alleles of the associated variants correlated with a decreased fertility in their study cohort. It could be speculated that the presence of such genetic variants may lead to different phenotypes related to male fertility depending on the specific genetic background of the individual, ranging from mild outcomes (such as slight reduced sperm counts or low birth rates) to more severe conditions such as SO or NOA, which supports the notion of idiopathic male infertility as a complex disease [1]. In addition, PSAT1-rs7867029 and USP8-rs7174015 were significantly associated with SO predisposition and EPSTI1-rs12870438 with NOA risk in a low-powered Japanese population comprising 76 NOA patients, 50 SO patients, and 791 fertile men [30]. However, the authors did not observe a correlation of such SNPs with semen parameters in an independent study cohort of Japanese males composed of 791 fertile men and 1224 young men from the general population [31]. In a subsequent study, the same group also reported significant associations of TUSC1-rs10966811 (associated with HS under an NOA microenvironment in our study cohort) and DPF3-rs10129954 (which did not yield significant p-values in our analyses) with SO and SpF, respectively [32]. The discrepancy of the results could be due to different genetic architectures of the regions encompassing those SNPs between Japanese and Iberian populations, or to a possible type I error affecting their results as a consequence of a reduced power (the case population included only 83 NOA patients and 62 SO patients). Indeed, for DFP3-rs10129954 the authors obtained significant p-values under opposite models (recessive and dominant) [32].
Despite the above considerations, our results clearly suggest that TUSC1-rs10966811 may represent a potential marker of disease outcome of NOA infertility. The TUSC1-rs10966811*G allele is associated with the most severe manifestation of this pathology (complete lack of sperm cells in the testis biopsy and thus TESE-), whereas the presence of the TUSC1-rs10966811*A allele is associated with the HS phenotype, the milder histological pattern of NOA. The functional annotations of this SNP are consistent with this idea. TUSC1-rs10966811 is located in a target sequence for YY1, a transcription factor that has been reported to play a major role in spermatogonial stem cell (SSC) maturation, being expressed in spermatocytes, spermatogonia, and spermatids, but not in mature spermatozoa [33,34]. The TUSC1-rs10966811 polymorphism represents a crucial position in the consensus sequence recognized by YY1, and the presence of the G allele correlates with a drastic decrease of the binding affinity (Table S8 and Figure S6). Other important transcription factors for the spermatogenic process have also predicted target sequences in the flanking regions of different TUSC1-rs10966811 proxies, such as BCL6, a repressor whose depletion causes testicular germ cell apoptosis in murine models [35]. This protein is predicted to bind the genomic region containing rs10966813, showing a lower affinity in the presence of the rs10966813*G allele, which is highly linked to TUSC1-rs10966811*G (the risk allele for unsuccessful TESE). In addition, DMRT proteins are a family of testis-specific transcription factors that play a pivotal role in male sex determination and differentiation by controlling testis development and male germ cell proliferation [27]. In this regard, the TUSC1-rs10966811 proxies rs10966813 and rs11789162 overlap with binding motifs of some members of this family, including DMRT1, DMRT2, and DMRT7. The gene encoding DMRT1 is a confirmed NOA-susceptibility locus [36][37][38][39], and the screening of its sequence to detect point mutations has been recently incorporated by some physicians in the routine clinical practice of idiopathic NOA to increase the diagnostic efficiency [40]. Moreover, it has been reported that open chromatin in SSCs is considerably enriched in TFBS for DMRT1 [41]. Moreover, additional transcription factors involved in spermatogenesis have also predicted binding motifs within the TUSC1-rs10966811 haplotype block (Table S8), suggesting that such block could have a potential interest for the development of prognostic markers of NOA.
On the other hand, our data suggest that the intergenic variant USP8-rs7174015*A confers risk to NOA development acting as recessive allele. This result seems consistent, as the allele frequencies were significantly different between the NOA group and both the unaffected control population and the SO group. The results of our in silico analyses were also concordant with this association. Interestingly, USP8-rs7174015 is annotated as an eQTL in the testis, affecting the expression of USP8, USP50, AP4E1, and RP11-562A8.5. The first of them has been reported to be highly expressed in male germ cells, in which it is involved in acrosome biogenesis [42,43]. Regarding USP50, AP4E1, and RP11-562A8.5, although their possible involvement in spermatogenesis has not been previously described, all three genes show a high expression in the testis [11]. Indeed, USP50 has a testis-specific expression, mostly in spermatocytes ( Figure 2). Therefore, our data suggest that USP8-rs7174015*A could exert its pathogenic influence in NOA predisposition by deregulating the baseline gene expression of USP8, USP50, AP4E1, and RP11-562A8.5. Such deregulation could be a consequence of an alteration of a binding protein motif by USP8-rs7174015*A or any of its proxies (Table S8 and Figure S7). In this context, a proxy of this SNP, rs12593481, is located within a consensus sequence for PAX5 and YY1, which are relevant transcription factors in the regulation of the spermatogenic process [33,34,44].
Another highly linked SNP to USP8-rs7174015 is rs3098171, which maps to a putative TFBS for the stress-inducible protein HSF1. The encoding gene of this transcription factor is located within the azoospermia factor b (AZFb) region of the Y-chromosome, and deletion of this region results in severe male infertility [45,46]. HSF proteins are expressed during mammalian spermatogenesis, mainly in spermatocytes and round spermatids [25]. Disruption of different HSF members, such as HSF1 and HSF2, leads to male sterility and complete lack of mature sperm in mice, as these proteins have been reported to play an essential role in the repression of sex chromatin during meiosis [47]. In this regard, the rs3098171*G risk allele, which significantly reduces the expression of the testis-specific gene USP50, decreases drastically the affinity of HSF1 for the TFBS in which this SNP is located. Finally, it should be also noted that, at least, 5 proxies of USP8-rs7174015 are annotated to map active enhancers, active promoters, and/or TFBS in the testis through ChIP-seq studies according to ENCODE [13] (Figure 1), which strongly support a putative functional implication related to their position in the genome.
In relation to the SO-associated polymorphisms EPSTI1-rs12870438 and PSAT1-rs7867029, their allele frequencies in the SO group differed from those in both the control population and the NOA group, respectively (with the two latter cohorts showing similar allele and genotype frequencies), which could be indicative of a potential implication of such SNPs in the etiology of the SpF phenotype severity. Interestingly, the rs9594826 variant, highly linked to EPSTI1-rs12870438, overlaps the target sequence of the transcription factor SIX5, which has been reported to decrease c-kit levels in adult mice, causing an elevated spermatogenic cell apoptosis and Leydig cell hyperproliferation [48]. In this case, a significant decrease in the binding affinity of SIX5 was also evident when the SO risk allele was present in the motif sequence (Table S8 and Figure S8).

Conclusions
In summary, we believe that our study gives an important contribution to the current knowledge about the molecular mechanisms underlying idiopathic SpF. We have evaluated the possible implication in SpF development of previously reported genetic factors associated with male fertility in a well characterized cohort of infertile men of European ancestry. Our findings may shed light on the putative role of common genetic variants in the development of specific male infertility histological patterns. Therefore, this study can contribute to a solid basis for future approaches aimed at developing more effective panels of genetic markers that could anticipate the probability of unsuccessful surgeries for retrieving viable sperm cells from the testis, which represent around half of the total surgeries currently performed in NOA patients [49]. However, additional independent and well-powered SpF cohorts may be analyzed to confirm our findings.
Supplementary Materials: The following are available online at https://www.mdpi.com/2075-442 6/11/1/22/s1, Table S1: Main clinical features of the 725 infertile men included in the study, Table S2: Tools for generating functional prediction scores, Table S3: RegulomeDB scoring scheme, Table S4: Estimation of the statistical power of our study for 725 patients and 1058 controls, Table S5: Overlap of the rs7174015 variant and its proxies with eQTL annotations and epigenetic marks from GTEx, ENCODE and Roadmap Epigenomic projects, Table S6: Overlap of the rs7174015 variant and its proxies with sQTL annotations from the GTEx project, Table S7: Overlap of the rs10966811, rs12870438, and rs7867029 variants and its proxies with eQTL effects and epigenetic marks from GTEx, ENCODE and Roadmap Epigenomic projects, Table S8: In silico transcription factor binding motif alterations concordantly predicted by HaploReg for SNP proxies of rs7174015, rs10966811, and rs12870438, Table S9: Enrichment of gene ontology (GO) biological processes for the set of transcription factors with changed motifs overlapping with the associated variants, Figure S1: Gene expression pattern of USP8 in human testes, Figure S2: Gene expression pattern of USP50 in human testes, Figure S3: Gene expression pattern of AP4E1 in human testes, Figure S4: Gene expression pattern of RP11-562A8.5 in human testes, Figure S5: Interaction network formed for the 199 transcription factors with predicted changed motifs by rs10966811, rs12870438, rs7174015, rs7867029 risk loci and their proxies, Figure S6: Sequence logos of Position Weight Matrices (PWM) for transcription factor binding sites strongly altered by TUSC1-rs10966811 proxies, Figure S7: Sequence logos of Position Weight Matrices (PWM) for transcription factor binding sites strongly altered by USP8-rs7174015 proxies, Figure S8: Sequence logos of Position Weight Matrices (PWM) for transcription factor binding sites strongly altered by EPSTI1-rs12870438 proxies.