Characterization of Global DNA Methylation in Different Gene Regions Reveals Candidate Biomarkers in Pigs with High and Low Levels of Boar Taint

DNA methylation of different gene components, including different exons and introns, or different lengths of exons and introns is associated with differences in gene expression. To investigate the methylation of porcine gene components associated with the boar taint (BT) trait, this study used reduced representation bisulfite sequencing (RRBS) data from nine porcine testis samples in three BT groups (low, medium and high BT). The results showed that the methylation levels of the first exons and first introns were lower than those of the other exons and introns. The first exons/introns of CpG island regions had even lower levels of methylation. A total of 123 differentially methylated promoters (DMPs), 194 differentially methylated exons (DMEs) and 402 differentially methylated introns (DMIs) were identified, of which 80 DMPs (DMP-CpGis), 112 DMEs (DME-CpGis) and 166 DMIs (DMI-CpGis) were discovered in CpG islands. Importantly, GPX1 contained one each of DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi. Gene-GO term relationships and pathways analysis showed DMP-CpGi-related genes are mainly involved in methylation-related biological functions. In addition, gene–gene interaction networks consisted of nodes that were hypo-methylated GPX1, hypo-methylated APP, hypo-methylated ATOX1, hyper-methylated ADRB2, hyper-methylated RPS6KA1 and hyper-methylated PNMT. They could be used as candidate biomarkers for reducing boar taint in pigs, after further validation in large cohorts.

DNA methylation is a major epigenetic mechanism that directly causes a chemical modification to DNA [6] conferred by the covalent transfer of a methyl group to the C-5 position of cytosine [7]. Since being discovered 40 years ago, methylation has been established as a major epigenetic factor influencing gene activities [8].
DNA methylation in the promoter regions is associated with transcriptional repression at the level of gene regulation [9,10], likely because DNA methylation tends to reinforce chromatin-based silencing [11]. However, to silence genes at transcription, DNA methylation at high cytosine and guanine dinucleotide (CpG)-density promoters is not necessary [12,13]. In fact, unmethylated CpG islands are found in more than one-half of the gene promoter regions, but they are not associated with the gene transcriptional activity [13].

RRBS and Animal Data
This study was conducted using reduced representation bisulfite sequencing (RRBS) sequencing data from nine porcine testis tissue samples that are accessible through Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129385). Only cytosines in the cytosine and guanine dinucleotide (CpG) sites were used in our study. Quality control (i.e., removal of adapter and short reads) and alignment of RRBS reads were performed using Trimmomatic software [30] and Bismark software [31], respectively, based on the same parameters used in our previous study [28,29]. Furthermore, the unqualified read coverages of cytosines (i.e., ≤10 counts and ≥99.9th percentile) were also trimmed following previous studies [28,29,32,33]. Based on the publicly available information of experimental animals published in our previous study [28,34], nine testis tissue samples from three BT groups (i.e., low, medium and high BT groups) of pigs were used in this study, so each BT group had three pigs as replicates. Three BT groups for the nine pigs were formed based on the BT estimated breeding values (EBVs) (i.e., sum of the EBVs of skatole concentration and human nose score (HNS)), where pigs of extreme low or high EBVs belonged to low or high BT groups and pigs with EBVs closest to the mean belonged to the medium BT group, which has been presented and described in the previous study [28,34].

Porcine Reference Gene Components
Based on the reference sequence genes (RefSeqGenes) of Sus scrofa (http://genome.ucsc.edu/cgibin/hgTables), a total of 4473 RefSeqGenes with 4412 promoters, 36,286 exons and 31,912 introns Vet. Sci. 2020, 7, 77 3 of 18 were investigated through annotations using R package genomation [35]. Different from our previous study that calculated the differentially methylated cytosines (DMCs) associated with BT trait [28], this study focused on the DMRs of intragenic regions (i.e., promoter, exon and intron regions) as a single entity. In addition, the overlaps of methylated CpG island regions with intragenic regions were also investigated to determine the influences of CpG islands on intragenic methylation. Here, we calculated the methylation levels of the promoter-CpG island/exon-CpG islands/intron-CpG island when cytosines were located in both promoter/exon/intron and CpG island regions, for example, cytosines were located in the promoter-CpG island regions (Figure 1).
Vet. Sci. 2020, 7, x 3 of 17 Based on the reference sequence genes (RefSeqGenes) of Sus scrofa (http://genome.ucsc.edu/cgibin/hgTables), a total of 4473 RefSeqGenes with 4412 promoters, 36,286 exons and 31,912 introns were investigated through annotations using R package genomation [35]. Different from our previous study that calculated the differentially methylated cytosines (DMCs) associated with BT trait [28], this study focused on the DMRs of intragenic regions (i.e., promoter, exon and intron regions) as a single entity. In addition, the overlaps of methylated CpG island regions with intragenic regions were also investigated to determine the influences of CpG islands on intragenic methylation. Here, we calculated the methylation levels of the promoter-CpG island/exon-CpG islands/intron-CpG island when cytosines were located in both promoter/exon/intron and CpG island regions, for example, cytosines were located in the promoter-CpG island regions ( Figure 1). Then, we used R package GeneDMRs (https://github.com/xiaowangCN/GeneDMRs) [36] to obtain weighted methylation levels for each BT group, Q-values by the false discovery rate (FDR) method [37] and methylation differences (i.e., difference in weighted methylation levels between the low and high BT groups) for all promoters, exons, introns and their regions overlapped with CpG islands. Here, if the methylation difference was positive, the region was defined as hyper-methylated; thus, a negative methylation difference represents hypo-methylation.
The weighted methylation levels were calculated by the methylated reads number divided by the total reads number given the weights following the previous study [28]: where and are the methylated and total reads number of the involved cytosine site at a given region of individual , is the total number of cytosine sites involved in this region, is the total individual number of one BT group and is the weight of reads of the involved cytosine site of individual .
The statistical analysis was following the previous study [33] in the logistic regression model: where π i is the weighted methylation levels at the given region, u is the intercept, and T i is the BT group indicator.
The enrichment of GO terms for DMP-CpGi-related genes was determined with the DAVID website (https://david.ncifcrf.gov/) and visualized by the GOplot R package [38]. The pathway enrichments for hypo-methylated and hyper-methylated DMP-CpGi-related genes were performed in clusterprofiler R package [39]. Afterwards, The DMP-CpGi-related genes enriched in significant GO terms and pathways were used to create gene-gene networks by the GeneMANIA tool [40,41]. Then, we used R package GeneDMRs (https://github.com/xiaowangCN/GeneDMRs) [36] to obtain weighted methylation levels for each BT group, Q-values by the false discovery rate (FDR) method [37] and methylation differences (i.e., difference in weighted methylation levels between the low and high BT groups) for all promoters, exons, introns and their regions overlapped with CpG islands. Here, if the methylation difference was positive, the region was defined as hyper-methylated; thus, a negative methylation difference represents hypo-methylation.

Methylation Levels of Promoters, Different Exons and Different Introns
The weighted methylation levels were calculated by the methylated reads number divided by the total reads number given the weights following the previous study [28]: where MR ij and TR ij are the methylated and total reads number of the involved cytosine site j at a given region of individual i, m is the total number of cytosine sites involved in this region, n is the total individual number of one BT group and W ij is the weight of reads of the involved cytosine site j of individual i. The statistical analysis was following the previous study [33] in the logistic regression model: where π i is the weighted methylation levels at the given region, u is the intercept, and T i is the BT group indicator.
The enrichment of GO terms for DMP-CpGi-related genes was determined with the DAVID website (https://david.ncifcrf.gov/) and visualized by the GOplot R package [38]. The pathway enrichments for hypo-methylated and hyper-methylated DMP-CpGi-related genes were performed in clusterprofiler R package [39]. Afterwards, The DMP-CpGi-related genes enriched in significant GO terms and pathways were used to create gene-gene networks by the GeneMANIA tool [40,41].

Methylation Levels of Promoters, Different Exons and Different Introns
In this study, the cytosines in CpG sites were found in only 3029 genes with 2295 promoters, 2725 exons and 4349 introns, based on reduced representation bisulfite sequencing (RRBS) data. According to the distribution of all exon and intron positions, the frequency of positions decreased as the number of ordinal positions increased, so the first ordinal position had the most exons and introns. Additionally, exons and introns were in fewer than 20 ordinal positions; therefore, the number of exons and introns in most genes was less than 20 (Supplementary Figure S1). In fact, the proportions of both the first twenty exons and first twenty introns out of all exons (2590/2725) and introns (4109/4349) were larger than 90% (Figure 2A,C,E). Obviously, the methylation levels of the promoters were lower compared to the methylation levels of the first twenty exons and the first twenty introns ( Figure 2). However, exons/introns at different ordinal positions were found to have different methylation levels, but such differences in the three BT groups were very small (Supplementary Figure S1). The average methylation levels of the first exon (21.64%) and first intron (31.05%) for the three BT groups were lower than the average methylation levels of the other exons and introns (Figure 2A,C,E). However, we investigated the specific genes such as GPX1 and found that methylation levels of intergenic regions of GPX1 were higher than intragenic regions, where low BT groups had higher methylations than other two BT groups ( Figure 3). In this study, the cytosines in CpG sites were found in only 3029 genes with 2295 promoters, 2725 exons and 4349 introns, based on reduced representation bisulfite sequencing (RRBS) data. According to the distribution of all exon and intron positions, the frequency of positions decreased as the number of ordinal positions increased, so the first ordinal position had the most exons and introns. Additionally, exons and introns were in fewer than 20 ordinal positions; therefore, the number of exons and introns in most genes was less than 20 (Supplementary Figure S1). In fact, the proportions of both the first twenty exons and first twenty introns out of all exons (2590/2725) and introns (4109/4349) were larger than 90% (Figure 2A,C,E). Obviously, the methylation levels of the promoters were lower compared to the methylation levels of the first twenty exons and the first twenty introns ( Figure 2). However, exons/introns at different ordinal positions were found to have different methylation levels, but such differences in the three BT groups were very small (Supplementary Figure S1). The average methylation levels of the first exon (21.64%) and first intron (31.05%) for the three BT groups were lower than the average methylation levels of the other exons and introns (Figure 2A,C,E). However, we investigated the specific genes such as GPX1 and found that methylation levels of intergenic regions of GPX1 were higher than intragenic regions, where low BT groups had higher methylations than other two BT groups ( Figure 3).    When considering the CpG island overlapped with intragenic regions, cytosines were found to be involved in 2196 genes with 1942 promoters, 1635 exons and 1796 introns. The methylation of the promoters and the first twenty intragenic regions that were exclusively located in CpG islands was generally lower than the methylation of all promoters and the first twenty gene components, particularly the promoters; the 1st, 13th and 14th exons; and the 1st and 20th introns ( Figure 2). Obviously, the methylation levels interacting with CpG islands of the 13th and 14th exons were significantly (p < 0.001) lower than the same-exon methylations without CpG island interactions (Supplementary Figure S2).
The methylation levels of the first exons were observed to increase as the lengths increased, but this trend was not as notable for the first introns (Supplementary Figure S3). In terms of their overlaps with CpG islands, methylated exons remained stable (Supplementary Figure S3C), but methylated introns tended to have greatly decreased variable lengths (Supplementary Figure S3D).
The methylation differences between the low and high BT groups ranged from −40% to +30% for promoters, first exons and first introns in the same genes, and the first exons had greater methylation differences than were found among the promoters and the first introns. Additionally, most methylation differences in the promoters, first exons and first introns were at the zero point, which indicated that the differential methylation levels of the gene components were generally analogous in the low and high BT groups. The highest Pearson correlation coefficient (PCC) for methylation differences was associated with the comparison of promoters and first exons (PCC = 0.69), while the correlation coefficients were lower for comparisons of promoters with first introns and first exons with first introns (Supplementary Figure S4).    The number of common genes related to DMPs, DMEs and DMIs was three and two of the three genes were related to DMP-CpGis, DME-CpGis and DMI-CpGis ( Figure 5A,B). Among them, TSPAN9 and GPX1 contained one each of DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi ( Table 1). The percentage of DME, DMI, DME-CpGi and DMI-CpGi in the first and other ordinal positions varied. The first exons constituted a relatively small proportion (7.6%~7.8%) of the DMEs regardless of whether they were in CpG islands, while the first introns constituted a relatively larger proportion (14.7%) of the DMIs when they were excluded from CpG islands ( Figure 5C,D). The methylation differences of DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi were close to zero ( Figure 5E,F).

Differentially Methylated Promoters, Exons, Introns and Their Overlaps with CpG Islands
Vet. Sci. 2020, 7, x 6 of 17 The number of common genes related to DMPs, DMEs and DMIs was three and two of the three genes were related to DMP-CpGis, DME-CpGis and DMI-CpGis ( Figure 5A,B). Among them, TSPAN9 and GPX1 contained one each of DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi ( Table 1). The percentage of DME, DMI, DME-CpGi and DMI-CpGi in the first and other ordinal positions varied. The first exons constituted a relatively small proportion (7.6%~7.8%) of the DMEs regardless of whether they were in CpG islands, while the first introns constituted a relatively larger proportion (14.7%) of the DMIs when they were excluded from CpG islands ( Figure 5C,D). The methylation differences of DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi were close to zero ( Figure 5E,F).   Finally, 7 hypo-methylated and 13 hyper-methylated genes were presented as the top 20 DMP-CpGi-related genes.

Methylation Status of Promoters, Different Exons and Different Introns for BT
Most studies have suggested that the DNA methylation levels of introns are lower than those of neighbouring exons in humans and honey bees [42,43]. However, methylation of exons with CpG islands was at a higher level than the methylation of flanking introns in embryonic stem cells and foetal fibroblasts of humans [44]. Our previous results implied that exons in CpG islands had lower methylation levels than the methylations of the intron and other CpG island regions in porcine testis tissue [29]. The findings of this study indicated higher methylation levels of introns at different ordinal positions (Figure 2 and Supplementary Figure S1). Additionally, both our previous study [29] and Chen's study [45] suggested that promoters with lower methylation patterns were specific to

Methylation Status of Promoters, Different Exons and Different Introns for BT
Most studies have suggested that the DNA methylation levels of introns are lower than those of neighbouring exons in humans and honey bees [42,43]. However, methylation of exons with CpG islands was at a higher level than the methylation of flanking introns in embryonic stem cells and foetal fibroblasts of humans [44]. Our previous results implied that exons in CpG islands had lower methylation levels than the methylations of the intron and other CpG island regions in porcine testis tissue [29]. The findings of this study indicated higher methylation levels of introns at different ordinal positions (Figure 2 and Supplementary Figure S1). Additionally, both our previous study [29] and Chen's study [45] suggested that promoters with lower methylation patterns were specific to adult porcine pig testis during spermatogenic cell development. Therefore, such promoter methylation patterns and inversely different methylation trends between exons and introns could be caused by tissue-specific and developmental-specific methylation patterns. Song et al. (2017) [16] revealed different ordinal positions of exons with different methylation, showing that the first exons had the lowest methylation levels, a finding consistent with this study (Figure 2). Furthermore, another study demonstrated the correlations of different methylation levels in different gene positions with the biological features of exons [46]. Such exon-specific methylation levels were considered to be associated with alternative splicing events [14,15,43]; for example, the expression levels of the exons were negatively related to the DNA methylation levels of the first exons [46]. Decreasing methylation levels of the first exons (Figure 2A,C,E), especially the obvious methylation decrease in the first exons in the CpG islands ( Figure 2B,D,F), might be correlated with both promoters and CpG islands. Our results revealed a good relationship (0.689) between promoters and first exons (Supplementary Figure S4A), so first exons can potentially be viewed as integral to the promoter regions [47,48].
It has been reported that the lengths of the exons are strongly linked to exon methylation level [16]; thus, longer genes with many long exons may have higher methylation levels [49]. Based on the length distribution results in this study, methylation levels increased as exon length increased, regardless of whether or not the exons were in CpG islands (Supplementary Figure S3A,C). However, the interaction with CpG islands resulted in the reduced methylation of introns compared to the initial levels (Supplementary Figure S3B,D). Since long introns and high intron variability could cause variable methylation levels in the first introns (Supplementary Figure S3B), CpG islands could play a role in reducing the methylation levels of the first introns, according to their different lengths [26]. Anastasiadi et al. (2018) [26] also indicated that the number of DMRs in the first introns was greater than the number of DMRs in the promoters or first exons. Our study found that 10.9%~14.7% of DMIs were in the first introns ( Figure 5C,D), which means that the DMRs in the first introns constituted a large proportion of DMIs and all DMRs because the number of DMIs (n = 166~402) was higher than the number of DMPs (n = 80~123) or DMEs (n = 112~194) ( Figure 5A,B).

Biological Functions of DMP-CpGis-Related Genes
As reducing methylation in the promoters causes the enhancement of gene expression, the expression levels are negatively correlated with promoter methylation [9]. Generally, promoters in CpG islands are unmethylated, while promoters outside CpG islands are predominantly methylated [13]. Our study showed similar results: reduced methylation levels were observed in the CpG islands of promoters ( Figure 2). Therefore, Weber et al. (2007) [13] proposed that methylation is occasionally not necessary in CpG island promoters, as most of them are in hypo-methylated states, but the existing methylation is enough to inactivate promoter-CpG island regions. Among the 80 DMP-CpGi-related genes, 30 DMP-CpGis were in a hyper-methylated state (Supplementary file 2), and 13 of these were involved in the top 20 DMP-CpGis ( Table 2).
The significant GO terms identified through this study mainly focused on biological functions of methylation ( Figure 6A) and were connected with three hyper-methylated DMP-CpGi-related genes (i.e., BHMT, BHMT2 and GNMT) and one hypo-methylated DMP-CpGi-related gene (i.e., PNMT) ( Table 2). The gene expression patterns of the BHMT and BHMT-2 genes are similar between pigs and humans [50], with BHMT being active in porcine pancreas, kidney and liver cortex [51]. GNMT, a key component that regulates S-adenosyl-methionine (SAM) catabolism, suppresses the increment of SAM to extend lifespan [52], whereas PNMT expression contributes to the mesodermal origin of adrenergic heart cells [53]. In fact, the hyper-methylated genes BHMT, BHMT2 and GNMT were also enriched in the significant pathways, where the other involved DMP-CpGi-related genes ADRA1A, ADRB2, PPP1CB and RPS6KA1 were also hyper-methylated ( Figure 6B). Therefore, the findings of GO terms and pathways could provide a functional understanding of promoter methylation in CpG islands.
Based on gene-gene network results, hyper-methylated genes ADRA1A and PTPRA linking to PEMT were involved in three significant pathways (i.e., the adrenergic signaling in cardiomyocytes (ssc04261), the cGMP-PKG signaling pathway (ssc04022) and salivary secretion (ssc04970)) and one significant GO term (i.e., the protein phosphorylation (GO:0006468)). PTPRA was reported to promote the cell cycle progression and lead to poor prognosis in squamous cell lung cancer [54]. In addition, the hypo-methylated GPX1, one of the DMP, DME, DMI, DMP-CpGi, DME-CpGi and DMI-CpGi related genes, was connected to hypo-methylated APP, hypo-methylated ATOX1, hyper-methylated ADRB2, hyper-methylated RPS6KA1 and hyper-methylated PNMT that were all enriched in significant GO terms and pathways ( Figure 6C). In the other studies, GPX1 was found to be positively correlated with porcine high androstenone and a high activity against lipid peroxidation based on the glutathione metabolism pathway, so they suggested that the interaction partners (e.g., GST families) with GPX1 could be candidate biomarkers in testicular steroid and androstenone biosynthesis of pigs [55].

Implications
To avoid BT odor in porcine meat, selecting low genetic merit of BT with considerable heritability can be effective and efficient [56,57], as surgical castration has implications for animal welfare and results in decreased meat production [58,59]. Currently, multi-omics data analysis was performed to understand complex traits based on systems genomics approaches [60][61][62], in which the epigenome interacted with the genome to affect the transcriptome and subsequent modules such as the proteome and metabolome to a different extent in the design of omics studies [63]. In addition to genomics studies finding significant genetic variants [64][65][66][67] and transcriptomics studies finding differentially expressed genes [68][69][70] associated with BT, an epigenomics study revealed DMCs to decipher the epigenetic regulatory mechanisms of BT [28]. However, this study identified differentially methylated gene components, for example, DMP, first DME and first DMI, that were key components for DNA methylations to regulate gene expressions [9,10,[18][19][20][21][22][23][24]. Thus, the DMRs in gene components are valuably potential epigenetic biomarkers for BT, especially for promoter and first DME showing the similar methylation status [47,48]. Our study used relatively small sample sizes, so it is hard to achieve good quality biomarkers, but we will collect new testis tissue samples in another pig population of the same breed to validate our results in vitro studies by performing additional experiments with the methylated and unmethylated CpGs, and quantitative PCR analysis on target genes.

Conclusions
This study investigated the methylation of porcine gene components (i.e., promoters, different exons and different introns) in BT trait-related genes and identified the related DMRs of the gene components. The results show that the methylation levels of the first exons and first introns were lower than those of the other exons and introns. Moreover, the methylation levels of the first exon/intron CpG islands were even lower. Additionally, as the first exon lengths become longer, their methylation levels increased. According to the differentially methylated analysis, the DMPs/DMEs/DMIs and their overlaps with CpG islands were identified. Analysis of the gene-GO term relationships and pathways of 80 DMP-CpGi-related genes revealed that these genes enriched in significant GO terms and pathways were mainly involved in methylation-related biological functions. The finding that decreasing methylation levels of the first exons were in a strong relationship with promoters, particularly through interactions with CpG islands, caught our attention because they could be considered as a promoter-first exon entity that regulates biological events. Based on the gene-gene network results, hypo-methylated GPX1 linking APP, ATOX1, ADRB2, RPS6KA1 and PNMT could be used as potential candidate biomarkers for boar taint in pigs after further validation in large populations of the same breed.