Local Climate Adaptation in Chinese Indigenous Pig Genomes

Yuqiang Liu; Yang Xu; Guangzhen Li; Wondossen Ayalew; Zhanming Zhong; Zhe Zhang

doi:10.3390/ani15162412

,

and

State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Animals2025, 15(16), 2412;https://doi.org/10.3390/ani15162412

This article belongs to the Special Issue Livestock Genetic Evaluation and Selection

Version Notes

Order Reprints

Review Reports

Simple Summary

Chinese indigenous pigs have evolved in a wide range of climates, making them an excellent model for studying how animals adapt to local environments. In this study, we analyzed whole-genome data from 578 pigs representing 46 native breeds across China and linked genetic differences to environmental conditions, especially precipitation during the wettest season. We found that precipitation plays a key role in shaping the genetic makeup of these pigs. Many of the identified genes are involved in immune function and metabolism, and one gene, MS4A7, stood out as a strong candidate for adaptation to precipitation patterns. Our findings help explain how native pigs have adapted to their environments and provide valuable information for conserving genetic diversity and developing climate-resilient breeding programs.

Abstract

Local adaptation allows animal populations to persist in diverse and changing environments, yet its genomic underpinnings remain poorly characterized in livestock. Chinese indigenous pigs, renowned for their rich phenotypic and ecological diversity, offer a powerful model for investigating environmental adaptation. Here, we integrated whole-genome resequencing data, environmental variables, genotype–environment association (GEA) analyses, and functional annotation to explore the adaptive genomic landscape of 46 native pig breeds across China. Based on 578 individuals and 17.7 million SNPs, we performed genome-wide GEA using latent factor mixed models (LFMMs), identifying 8644 SNPs significantly associated with environmental factors, including 310 linked to precipitation in the wettest quarter (BIO16). Redundancy analysis (RDA) and gradient forest modeling identified BIO16 as a major environmental driver of genomic variation. Functional annotation of BIO16-associated SNPs revealed significant enrichment in regulatory elements and genes highly expressed in the lung, spleen, hypothalamus, and intestine, implicating immune and metabolic pathways in local adaptation. Among the candidate loci, MS4A7 exhibited strong association signals, population differentiation, and tissue-specific regulation, suggesting a role in precipitation-mediated adaptation. This work enhances our understanding of livestock adaptation and informs climate-resilient conservation and breeding strategies.

Keywords:

local adaptation; genotype–environment association (GEA); Chinese indigenous pigs

1. Introduction

The domestic pig (Sus scrofa domesticus) is among the most economically and scientifically valuable livestock species globally. As a primary meat source, pigs contribute significantly to global food security and rural economies. Beyond their agricultural importance, pigs also serve as essential biomedical models, owing to their physiological and anatomical similarities to humans. They are widely used in studies of xenotransplantation, metabolic disorders, and immunology [1,2].

Recent advances in high-throughput genotyping and sequencing technologies have greatly accelerated pig population genomics research. These efforts have uncovered complex patterns of domestication, introgression, and local adaptation in both commercial and indigenous pig populations [3,4,5]. Notably, native Chinese pig breeds display exceptional genetic diversity and phenotypic variation, shaped by geographic isolation, distinct breeding practices, and ecological adaptation [6,7,8]. Understanding their local adaptation is crucial for conserving genetic resources and optimizing breeding strategies under changing environments.

Gene–environment association (GEA) analysis is a powerful approach for identifying genomic loci involved in local adaptation through correlations between allele frequencies and environmental variables [9,10]. GEA methods—such as latent factor mixed models (LFMMs), redundancy analysis (RDA), and gradient forest—have been widely used in plant species like Arabidopsis thaliana [11] and maize [12], and increasingly in animals such as fish, birds, and domestic ruminants [13,14,15,16]. However, applications of GEA in pigs remain limited, and few studies have integrated genomic, environmental, and regulatory layers to elucidate the adaptive landscape of indigenous breeds. Among climatic variables, BIO16 (precipitation of the wettest quarter) represents a critical ecological factor, particularly in subtropical and monsoon-influenced regions like southern China. It reflects seasonal water availability, which can modulate host–pathogen dynamics, thermoregulation, and hydric stress tolerance [11]. In humid and pathogen-rich environments, increased precipitation may impose selection on immune pathways, mucosal barrier function, and metabolic regulation [10]. Studies in birds and domestic ruminants have reported strong associations between precipitation variables and adaptive divergence in immunity-related genomic regions [16,17]. Therefore, focusing on BIO16 in the context of Chinese indigenous pigs may reveal key mechanisms of precipitation-driven local adaptation, with implications for climate-resilient livestock breeding strategies. In this study, we investigate the genomic basis of local environmental adaptation in 46 Chinese indigenous pig breeds through an integrative framework combining population genomics, genotype–environment associations, and functional annotations. We identify key environmental drivers of genomic divergence and assess potential adaptive responses under future climate scenarios, providing critical insights for conservation and climate-resilient breeding.

2. Materials and Methods

2.1. Sample Collection and Genotypic Data Processing

Whole-genome resequencing data were retrieved from the Pig Genome Resequencing Project (PGRP v1) [18]. To minimize the confounding effects of recent selection and admixture, commercial and hybrid breeds were excluded. Only indigenous Chinese breeds with clearly documented geographic origins were retained. To ensure breed assignment consistency and eliminate genetically redundant individuals, pairwise identity-by-state (IBS) analysis was conducted using PLINK v1.90 [19]. Individuals exhibiting IBS values greater than 0.90 were excluded from the dataset. After quality control, the final dataset comprised 578 individuals representing 46 indigenous pig breeds (Supplementary Table S1). SNPs with a minor allele frequency (MAF) < 0.1 and a missing rate > 0.1 were filtered using PLINK v1.90 [20]. MAF filtering was applied at the whole-dataset level (rather than within individual breeds) to remove ultra-rare variants and enhance the robustness of downstream analyses.

2.2. Environmental Data Collection and Preprocessing

Breed-level geographic coordinates were determined based on historical records and documented breed distribution data. A single representative latitude–longitude coordinate was selected for each breed to reflect its long-term local adaptation. Nineteen bioclimatic variables (BIO1–BIO19) (Supplementary Table S2) were extracted from the WorldClim v2.1 database (https://www.worldclim.org/, accessed on 14 March 2025) at a ~5 km (~2.5 arc-min) spatial resolution, representing long-term climatic averages from 1900 to 2000 [19].

2.3. Population Structure and Genetic Diversity Analysis

Principal component analysis (PCA) and pairwise identity-by-state (IBS) matrices were calculated using PLINK v1.90 [20]. A neighbor-joining (NJ) tree was built using MEGA v11 [21] based on the genetic distance matrix and visualized using the R package ggtree V3.16.3 [22]. To reduce LD redundancy, SNPs were pruned using PLINK (--indep-pairwise 500 50 0.1), resulting in a final set of 226,394 SNPs. ADMIXTURE v1.3.0 [23] was employed to estimate ancestry proportions using K = 2–10. Based on clustering patterns and geographic origin, breeds were categorized into four regional groups: ECN (East Central), SCN (South Central), SWCN (Southwest), and NCN (North Central). VCFtools v0.1.17 [24] was used to calculate nucleotide diversity (π) and Tajima’s D across each group in non-overlapping 50-kb windows (--window-pi 50000; --TajimaD 50000).

2.4. Gene–Environment Association Analysis

We performed genome-wide GEA using the latent factor mixed model (LFMM) implemented in the R package LEA [25,26]. To adjust for population structure, we selected the number of latent factors (K) based on PCA and ADMIXTURE results. Specifically, PCA revealed genetic differentiation among geographic groups, and ADMIXTURE was conducted for K = 2–10. Considering the genetic structure, geographic distribution, and domestication history of Chinese indigenous pigs, K = 4 was chosen for subsequent LFMM analysis. The analysis was based on genotypes from 578 individuals representing 46 indigenous pig breeds in China. Nineteen bioclimatic variables (BIO1–BIO19) obtained from the WorldClim v2.1 database were used as environmental predictors. SNPs with p < 1 × 10⁻⁵ were considered to be suggestively associated, following the original LFMM framework, where this threshold corresponds approximately to |z| > 4, offering a conservative control for false positives in genome-wide scans [27].

2.5. Environmental Variable Selection

PCA of 19 bioclimatic variables was performed using the R packages FactoMineR and factoextra [28]. To address multicollinearity, pairwise Pearson correlation coefficients were computed and visualized with the corrplot package. Environmental variable importance in explaining genetic variation was assessed using the gradientForest R package [29]. SNPs (LD-pruned with\--indep-pairwise 500 50 0.1) were used to calculate population-level allele frequencies per breed [29,30]. The gradient forest model was fitted with 500 trees, and variable importance was evaluated based on both split accuracy and cumulative R². To reduce multicollinearity and retain the most informative predictors for downstream analyses, we first computed a Pearson correlation matrix among the 19 bioclimatic variables and excluded one variable from any pair with |r| ≥ 0.8. We then ranked the remaining variables by importance using the gradientForest model (500 trees; importance summarized by split accuracy and cumulative R²) based on the joint criteria of low pairwise correlation (|r| < 0.8) and high gradientForest importance. For RDA, we used SNPs that were pruned for LD across all 578 individuals. Model significance and the significance of each constrained axis were assessed using a permutation test with 999 permutations, and the percentage of variance explained by each axis was calculated.

2.6. Environmental vs. Geographic Contributions to Genetic Differentiation

We employed Mantel and partial Mantel tests to evaluate the relative contributions of geographic isolation-by-distance (IBD) and environmental isolation-by-environment (IBE) to genetic differentiation. Pairwise F_ST values were calculated separately for neutral variants (LD-pruned SNPs) and putatively adaptive variants (LFMM-associated SNPs, p < 1 × 10⁻⁵). Geographic distances were derived from great-circle distances between breed-level coordinates, while environmental distances were calculated as Euclidean distances using standardized values of all 19 bioclimatic variables. Statistical significance was assessed through 999 permutations using the mantel() and mantel.partial() functions in the R package vegan.

2.7. Functional Annotation and Enrichment Analyses

Functional annotation was performed for the SNPs significantly associated with BIO16 (p < 1 × 10⁻⁵): SnpEff v4.3 [31] for genomic feature classification, chromatin state annotations from 14 porcine tissues [32] to define regulatory elements (e.g., promoters, enhancers), and gene expression profiles from the PigGTEx database [18] to identify tissue-specific transcriptional activity. Tissue specificity was quantified using t-statistics as described by Finucane et al. [33], and the top 1000 genes with the highest t-values in each tissue were selected as tissue-specific gene sets. Enrichment of significant SNPs within these sets was assessed using Fisher’s exact test. Enrichment ratios were calculated using the oddsratio() function in the R package fmsb [34], and statistical significance was tested using 10,000 permutation replicates implemented in regioneR. Details of the analytical framework are available in reference [35]. To identify candidate genes, we defined a 50 kb window centered on each significant SNP (±25 kb). Gene coordinates were retrieved from the Sus scrofa 11.1 genome annotation file (GTF format), and intersections between SNP windows and gene bodies were determined using BEDTools v2.25.0 [36]. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted using the clusterProfiler R package [37], applying default settings.

2.8. MS4A7 Locus Analysis and Selection Scan

We focused on the top-ranked SNP, which showed the strongest association with BIO16. To investigate potential selective pressure at this locus, we stratified the dataset into high (top 10%) and low (bottom 10%) BIO16 groups, with 58 individuals in each. Cross-population extended haplotype homozygosity (XP-EHH) scores were computed using selscan v2.0 [38] in non-overlapping 2 kb windows, designating the high-BIO16 group as the test population. Weir and Cockerham’s F_ST, nucleotide diversity (π), and Tajima’s D were also calculated using VCFtools v0.1.17 in matching 2 kb windows. Linkage disequilibrium (LD) decay around the SNP was evaluated using PLINK v1.90, with the parameters --r2, --ld-window-kb 200, and --ld-window-r2 0. Genotype–phenotype associations were examined using Student’s t-tests. Chromatin state annotations were obtained for 14 representative porcine tissues, covering 14 chromatin states indicative of key regulatory features (e.g., active promoters, enhancers, repressed regions), based on ENCODE-like epigenomic resources [32]. Meanwhile, gene expression profiles of MS4A7 across 35 tissues were retrieved from the PigGTEx database [18].

3. Results

3.1. Population Genetic Structure and Diversity

We conducted genome-wide population genetic analyses on 46 Chinese indigenous pig populations (Figure 1a, Supplementary Tables S1 and S2). Genotype filtering was performed using a MAF < 0.1 and a missing rate threshold of >0.1, yielding 17,700,714 high-quality SNPs from an initial total of 33,485,897. After further LD pruning with the --indep-pairwise 500 50 0.1 option, a total of 226,394 SNPs were selected for subsequent analysis. PCA based on 226,394 high-quality SNPs revealed a clear population structure. Populations from the East China subclade (ECN) were distinctly separated along the first principal component (PC1), while populations from South China (SCN), Southwest China (SWCN), and North China (NCN) displayed further separation along PC2 (Figure 1b). ADMIXTURE analysis (K = 2 and K = 3) further supported these findings, showing a consistent genetic structure pattern with ECN populations exhibiting a distinct ancestral component at K = 3 (Figure 1c). Nucleotide diversity (π) analysis indicated that SCN (π = 0.003389) and SWCN (π = 0.003274) populations harbored higher genetic diversity levels compared to ECN populations (π = 0.002789), suggesting increased genetic polymorphism in southern populations (Figure 1d). Genome-wide assessments of Tajima’s D values showed consistently positive values across all four population groups (Figure 1e), with SCN and SWCN populations having slightly higher averages (1.47) than ECN (1.43) and NCN (1.18). These values reflect regional differences in the distribution of allele frequencies across Chinese indigenous pig populations.

Figure 1. Population structure and genetic diversity of Chinese indigenous pigs. (a) Geographic distribution of 46 indigenous pig populations grouped into four regions: ECN (blue), NCN (light blue), SCN (red), and SWCN (orange). Circle size reflects sample size. (b) PCA of 578 individuals based on genome-wide SNPs; PC1 and PC2 explain 18.68% and 12.64% of variation, respectively. (c) ADMIXTURE plots at K = 2 and K = 3, illustrating population structure and ancestry components. (d) Nucleotide diversity (π) distributions across population groups. (e) Distribution of Tajima’s D values by group, with dashed lines indicating group means.

3.2. Environmental Gradient Analysis and Variable Selection

We conducted a PCA on the distributions of 46 Chinese indigenous pig populations based on 19 bioclimatic variables. The first three principal components explained a total of 91.7% of the environmental variation, with PC1 and PC2 accounting for 64.7% and 19.3%, respectively. Populations from ECN, NCN, and SWCN exhibited compact distribution patterns in PCA space, suggesting homogeneous environmental backgrounds across these groups. In contrast, SCN populations were more widely scattered, indicating greater ecological diversity or adaptive plasticity in this region (Figure 2a). The biplot revealed distinct loadings and directional contributions of environmental variables in the principal component space. Notably, BIO4, BIO3, and BIO7 contributed strongly to PC1, whereas BIO8 and BIO15 had higher loadings on PC2. Scatterplot analysis based on environmental PCA further revealed fine-scale differentiation among pig populations (Figure 2b).

Figure 2. Environmental PCA and key variable selection. (a) Biplot of the principal component analysis (PCA) based on 19 bioclimatic variables. Arrows indicate the loading directions of environmental variables in the PC1–PC2 space, and arrow color denotes the cos² value (i.e., the contribution of each variable to the principal components), ranging from blue-green (low contribution) to red (high contribution). (b) Distribution of 46 Chinese indigenous pig populations in the environmental PCA space defined by PC1 and PC2. Each point represents a population, colored by genetic group: ECN (blue), NCN (light blue), SCN (red), and SWCN (orange). Ellipses indicate 95% confidence intervals of population clustering. (c) Pairwise Pearson correlation matrix among the 19 bioclimatic variables. Color intensity and square size indicate the strength and direction of correlation (red = positive, blue = negative). (d) Bar plot showing the ranked importance of each environmental variable in the PCA. Orange bars denote the variables retained for subsequent analyses.

To reduce redundancy and identify key ecological drivers, we performed pairwise correlation analysis and PCA-based importance ranking of the 19 bioclimatic variables (Figure 2c). The correlation matrix showed that many variable pairs had Pearson correlation coefficients > |0.8|, indicating strong collinearity. Variables exhibiting high pairwise correlations were excluded to avoid multicollinearity in downstream analyses. We further employed gradient forest modeling with the “gradientForest” R package to evaluate and rank the importance of each variable (Figure 2d). Based on contribution scores and low inter-variable correlation, we selected six representative bioclimatic variables: BIO2, BIO3, BIO4, BIO8, BIO15, and BIO16. These variables comprehensively capture major ecological gradients in temperature and precipitation and act as informative predictors of environmental heterogeneity across Chinese indigenous pig populations.

3.3. Environmental Gradients Influence Pig Genomic Structure

We performed RDA based on 226,394 SNPs that were pruned for linkage disequilibrium (LD) across 578 individuals from Chinese indigenous pig populations to assess the influence of environmental variables on genomic variation. A permutation test (n = 999) confirmed that all six environmental predictors (BIO2, BIO3, BIO4, BIO8, BIO15, and BIO16) were significantly associated with genomic variation (p < 0.01) (Supplementary Table S6). The first three constrained axes (RDA1–RDA3) explained 44.5%, 18.8%, and 13.3% of the constrained variance, respectively, accounting for 76.6% of the total explained variation (Supplementary Table S7). For example, pigs from East China (ECN) aligned with gradients of BIO4, while those from Southwest China (SWCN) were mainly associated with precipitation indices (BIO15, BIO16). Additional variables such as BIO3 and BIO8 also exhibited notable effects (Figure 3a,b).

Figure 3. Redundancy analysis (RDA) and distance-based correlation reveal environmental influence on genomic variation in Chinese indigenous pig populations. (a,b) RDA results showing the relationship between genomic variation and environmental variables across 578 individuals from 46 local pig populations. Each point represents an individual, colored by population group (ECN, NCN, SCN, SWCN). Blue arrows indicate the direction and strength of the six selected environmental variables in the constrained ordination space. The projections of arrows on RDA axes reflect the relative contribution of each environmental factor. (c,d) Correlation between pairwise FST and geographic or environmental distance. (c) Relationship between F_ST and geographic distance among populations based on background SNPs (blue) and LFMM-identified environment-associated SNPs (pink). The top-left inset shows Mantel correlation coefficients and significance values. (d) Relationship between FST and environmental distance after controlling for geographic distance. Partial Mantel test results are shown in the top-left corner. Significant correlations for LFMM SNPs but not background SNPs suggest an independent role of environmental selection in shaping population differentiation.

To disentangle the relative effects of geography and environment on genetic differentiation, we conducted Mantel and partial Mantel tests using FST matrices derived from environment-associated SNPs (n = 8644; p < 1 × 10⁻⁵) and putatively neutral SNPs (n = 226,394). Environment-associated SNPs showed a significant correlation with geographic distance (Mantel’s r = 0.46, p = 0.001), while neutral SNPs did not (r = 0.08, p = 0.178). When geographic distance was controlled, a significant correlation between FST and environmental distance remained for environment-associated SNPs (partial Mantel’s r = 0.25, p = 0.001), but disappeared for neutral SNPs (r = −0.13, p = 0.930), suggesting an independent role of environmental selection in shaping population differentiation (Figure 3c,d).

3.4. Functional Annotation of BIO16-Associated Loci

To assess the impact of MAF thresholds on GEA outcomes, we compared the results of LFMM analysis using two different filters: the default MAF > 0.1 and a more lenient MAF > 0.05. The topological patterns of significant SNPs remained consistent across these two thresholds, and the association signals, measured as −log10(p), showed an exceptionally high Pearson correlation (r = 0.992) between the two datasets (Supplementary Figure S1). This strong correlation supports the robustness of our findings, indicating that our GEA results are stable across different MAF cutoffs.

To determine an appropriate window size for candidate gene identification, we calculated linkage disequilibrium (LD) decay across all pig populations using the PopLDdecay tool. Pairwise r² values were computed to evaluate the distance at which LD decays significantly. As shown in Supplementary Figure S2, r² values declined rapidly within the first 40–50 kb and plateaued thereafter, suggesting that most LD was confined to a 50 kb window. Therefore, we selected a ±25 kb window around each significant SNP, which allows for the capture of the majority of LD-linked regions while minimizing the inclusion of distant, unrelated genes.

A total of 8644 SNPs showed suggestive associations with environmental variables (p < 1 × 10⁻⁵), including 310 SNPs associated with BIO16, a key environmental variable related to precipitation in the wettest quarter (Figure 4a). Based on prior analyses, we identified BIO16 as a major factor shaping the geographic and genomic landscape of Chinese indigenous pig populations. To explore how local pigs respond at the molecular level to selective pressures from precipitation gradients, we focused on BIO16. We then performed multilayered annotation and functional analysis of the significantly associated genomic loci to elucidate potential adaptive mechanisms. This approach enabled us to identify key loci likely involved in local adaptation to varying precipitation environments.

Figure 4. Multilayer annotation and enrichment analysis of SNPs significantly associated with BIO16. (a) Manhattan plot of genome-wide LFMM association analysis for BIO16. The y-axis represents −log₁₀(p-value); the red dashed line denotes the genome-wide significance threshold (5 × 10⁻⁸), and the gray dashed line marks the suggestive threshold (1 × 10⁻⁵). (b) Enrichment of significant SNPs across genomic functional categories. *** p < 0.001, * p < 0.05; dashed line indicates enrichment fold = 1. (c) Heatmap of chromatin state enrichment across tissues. Each cell represents a combination of chromatin state (rows) and tissue (columns). Red indicates positive enrichment; blue indicates negative enrichment. *** p < 0.001, ** p < 0.01, * p < 0.05. (d) Enrichment analysis of significant SNPs in the top 1000 tissue-specific genes from 34 tissues. The y-axis indicates enrichment fold; ** p < 0.01, * p < 0.05; dashed line indicates enrichment fold = 1.

Genomic feature enrichment analysis revealed that these SNPs were significantly enriched in upstream regulatory regions (fold enrichment = 1.83, p = 2.09 × 10⁻¹¹) and intergenic region (fold enrichment = 1.4, p = 3.67 × 10⁻¹³), indicating potential cis-regulatory effects (Figure 4b). Conversely, no significant enrichment was detected in non-functional regions such as 5′ and 3′ untranslated regions (UTRs) or synonymous coding sites. Chromatin state enrichment analysis (Figure 4c), based on 14-state chromatin annotations across 14 porcine tissues, revealed significant enrichment of SNPs in active regulatory elements—including enhancers and open chromatin—in tissues such as the cortex, jejunum, and duodenum. These enrichments suggest that BIO16-associated SNPs may influence gene regulation in a tissue-specific manner. Tissue-specific gene enrichment analysis (Figure 4d) further demonstrated that significant SNPs were overrepresented in the top 1000 most specifically expressed genes in tissues including the lung, spleen, hypothalamus, intestine, and adipose tissue. These enrichments were statistically significant (p < 0.05), with fold enrichment consistently above 1, highlighting their potential biological relevance. In total, 147 candidate genes were identified within ±25 kb flanking windows of significant SNPs (Supplementary Table S3). GO (Supplementary Table S4) and KEGG (Supplementary Table S5) pathway enrichment analyses were subsequently performed using the clusterProfiler package. Results revealed significant enrichment in pathways associated with immune responses and intracellular signal transduction. In particular, the “C-type lectin receptor signaling pathway,” involving genes such as CALML4, PPP3CB, and NFKB2, was significantly enriched (p = 0.0065), suggesting a potential role in environmental sensing and immune adaptation. Collectively, these findings highlight precipitation-related selective signatures at both regulatory and expression levels, offering novel insights into the mechanisms of local environmental adaptation in Chinese indigenous pigs.

3.5. MS4A7 as a Candidate Gene for Precipitation-Driven Local Adaptation

Genome-wide LFMM analysis identified an SNP (2_11304356_T_A) within the 11.29–11.30 Mb region of chromosome 2 that showed the strongest association with BIO16. This SNP is located in an intronic region of the MS4A7 gene. LD analysis revealed that the top SNP exhibited low LD with neighboring loci, indicating an independent association signal. Multiple lines of selection evidence—including nucleotide diversity (π), Tajima’s D, F_ST, and XP-EHH—converged at this locus, suggesting marked population differentiation and potential local adaptation driven by precipitation gradients (Figure 5a).

Figure 5. Multilayered annotation of the MS4A7 gene region associated with BIO16. (a) Local annotation of the MS4A7 region. The top panel shows the local association and LD plot (r² indicating linkage strength); the bottom panels display the distributions of π, Tajima’s D, F_ST, and XP-EHH, highlighting a strong selective signal centered at 2_11304356_T_A. (b) Distribution of BIO16 values among individuals with different genotypes. A/T carriers show significantly higher BIO16 values than A/A and T/T individuals, *** p < 0.001, ** p < 0.01, ns not significant. (c) Chromatin state and tissue-specific regulatory annotation. The upper panel presents regulatory activity (e.g., enhancers, open chromatin) across different tissues in the MS4A7 region; the lower panel shows gene structure and the position of the top SNP relative to exons and UTRs. (d) Expression profile of MS4A7 across 35 pig tissues. TPM data indicate moderate to high expression levels in the lung, hypothalamus, small intestine, spleen, and adipose tissue, suggesting diverse functional roles.

Genotype–environment association analysis demonstrated that individuals with the heterozygous A/T genotype had significantly higher BIO16 values compared to homozygous A/A or T/T genotypes (Figure 5b), supporting a potential regulatory role for this SNP in precipitation responsiveness. Furthermore, breed-level genotype frequency analysis of the top SNP (2_11304356_T_A) revealed that heterozygotes (A/T) were more prevalent in breeds from regions with higher precipitation, while the T/T genotype dominated in drier areas (Supplementary Figure S3). This breed-level pattern reinforces the environmental relevance of the SNP and supports a role for heterozygosity in local adaptation to varying precipitation conditions (Supplementary Figure S3). Chromatin state annotation across 14 porcine tissues revealed that the top SNP lies within a repressed chromatin region in the cerebral cortex, suggesting transcriptional inactivation at this site in brain tissue. Conversely, active chromatin marks were observed at the 5′UTR and exon 5 of MS4A7 in the cortex and cerebellum, but these regions were strongly repressed in the liver (Figure 5c), indicating distinct tissue-specific regulatory patterns. Transcriptomic analysis using PigGTEx data across 35 tissues showed that MS4A7 is moderately to highly expressed in the lung, hypothalamus, adipose tissue, spleen, and small intestine (Figure 5d), tissues central to immune, neuroendocrine, and metabolic regulation. Furthermore, data from the PigBioBank resource provide additional support for the functional role of MS4A7. Phenome-wide association study (PheWAS) results showed significant associations between MS4A7 variants and reproduction-related traits, such as teat number, as well as carcass and production traits (Supplementary Figure S3; Supplementary Table S8). Transcriptome-wide association study (TWAS) further revealed that MS4A7 expression levels in multiple tissues (e.g., liver and cross-tissue panels) were significantly correlated with teat number, meat-to-fat ratio, and fatty acid composition, with the strongest signal reaching a p-value of 1.58 × 10⁻⁷ (Supplementary Table S9). This integrative evidence from chromatin state, tissue-specific expression, and large-scale genotype–phenotype datasets supports the hypothesis that MS4A7 contributes to precipitation-driven local adaptation and simultaneously influences economically important traits in Chinese indigenous pigs.

4. Discussion

This study presents an integrative analysis of the genomic landscape and ecological adaptation in Chinese indigenous pig populations. For genotype–environment association, we determined a representative coordinate for each breed based on the geographic center of its historical distribution and extracted the long-term environmental mean at that location. This approach better reflects historical selective pressures compared to using contemporary sampling coordinates, and we further subdivided broadly distributed breeds (e.g., Tibetan pigs) into geographically distinct subpopulations to improve environmental matching. Principal component and ADMIXTURE analyses consistently identified the ECN group as a genetically distinct cluster, likely shaped by prolonged geographic isolation and independent domestication history [3,6]. In contrast, SCN and SWCN populations exhibited higher nucleotide diversity and Tajima’s D values. These patterns may suggest that southern pig populations experienced more complex demographic histories, such as gene flow or balancing selection, potentially shaped by their ecologically diverse habitats [39]. However, we note that Tajima’s D is sensitive to population structure, which can inflate D values even in the absence of balancing selection. Specifically, genetic divergence among subpopulations, as revealed by our PCA and ADMIXTURE analyses, can lead to elevated nucleotide diversity (π) without a corresponding increase in segregating sites, thereby biasing D estimates upward. As such, the observed positive Tajima’s D values should be interpreted with caution and considered in conjunction with other statistics and functional evidence to robustly infer selection.

The ecological basis of this genetic divergence was further supported by environmental PCA, which showed marked differences among groups along major environmental gradients, particularly temperature and precipitation seasonality. Notably, the SCN population displayed a wide distribution in the environmental PCA space, indicating broader habitat variability and higher adaptive plasticity. By integrating principal component loadings and gradient forest analysis, we identified six key environmental variables—BIO2, BIO3, BIO4, BIO8, BIO15, and BIO16—as representative predictors for downstream analyses. These variables encapsulate the core ecological gradients relevant to the habitat differentiation of local pig breeds and align with commonly applied frameworks in adaptation studies of livestock and wild species [30,40,41].

Building on this foundation, we applied a multilayered analytical framework that integrates genome-wide diversity, environmental ordination, RDA, LFMM, and Mantel tests to dissect the adaptive genetic basis of local pig populations. RDA results underscored the central role BIO4 and precipitation-related factors (BIO15 and BIO16) in shaping genomic variation, particularly highlighting stronger precipitation-related adaptation in the SWCN group. These findings align with broader patterns observed in livestock and wildlife, where environmental gradients often drive localized genomic differentiation [17,42,43]. Partial Mantel tests further demonstrated that environmental distances retained a significant association with genetic differentiation even after controlling for geographic distance, reinforcing the independent role of environmental selection in shaping adaptive divergence. Similar conclusions have been drawn in goats [44] and sheep [39], validating the utility of joint environmental and genomic analyses in disentangling signals of selection from neutral evolution [45,46].

The significantly associated SNPs with BIO16 were preferentially enriched in upstream regulatory and intergenic regions, suggesting potential roles in gene expression modulation rather than direct coding changes, which aligns with previous findings that regulatory variants contribute substantially to environmental adaptation [47,48,49]. The tissue-specific enrichment analysis further revealed that these SNPs were overrepresented in genes expressed in the lung, intestine, hypothalamus, adipose tissue, and spleen—organs critically involved in immune regulation, metabolic balance, and water homeostasis. These physiological systems are known to mediate adaptive responses to hydric stress and environmental variation [50]. For instance, lung and intestinal epithelia maintain fluid balance, while adipose and hypothalamic tissues modulate energy and stress responses [51,52]. Moreover, GO and KEGG enrichment analyses of candidate genes indicated associations with cilium organization, epithelial function, and immune-related pathways, implying that traits such as mucosal immunity, barrier integrity, and pathogen resistance may play important roles in adaptation to high-precipitation environments. Together, these findings imply that the identified SNPs may underlie regulatory mechanisms enabling local pigs to cope with spatial and seasonal changes in precipitation.

Among these candidate loci, the top SNP identified was 2_11304356_T_A, located within an intronic region of the MS4A7 gene on chromosome 2. This SNP exhibited the strongest association with BIO16 and showed convergent evidence of selection from multiple statistics, including elevated F_ST, Tajima’s D, and XP-EHH (Figure 5a). Notably, genotype–environment association analysis revealed that individuals with the A/T heterozygous genotype had significantly higher BIO16 values than both homozygotes (A/A and T/T), suggesting a potential case of overdominance or heterozygote advantage (Figure 5b). This pattern is indicative of balancing selection, which may favor heterozygous individuals in ecologically stressful environments, such as those with high precipitation and pathogen exposure. Breed-level genotype frequency analysis further revealed that A/T heterozygotes were more prevalent in breeds from regions with higher precipitation, whereas the T/T genotype dominated in drier areas (Figure 5c), reinforcing the environmental relevance of this SNP. Regulatory annotation showed that while 2_11304356_T_A lies within a repressive chromatin region in the cortex, active regulatory elements were identified at the 5′UTR and exon 5 in multiple peripheral tissues, including adipose tissue and cerebellum (Figure 5c). Furthermore, transcriptomic data from PigGTEx demonstrated moderate to high expression of MS4A7 in tissues closely related to immune and barrier functions, such as the lung, small intestine, spleen, hypothalamus, and adipose tissue (Figure 5d) suggesting an immunoregulatory role under humid, pathogen-rich conditions [53,54]. Heterozygote advantage at this locus may result from the combined functional benefits of both alleles, potentially broadening pathogen recognition repertoires or balancing immune activation with metabolic costs [55]. Similar cases of overdominance have been documented in the MHC loci of vertebrates, where increased allelic diversity enhances resistance to a wider range of pathogens [56]. The co-occurrence of balancing selection signals (e.g., heterozygote advantage, positive Tajima’s D) and directional selection signals (e.g., high XP-EHH) at this locus suggests a complex adaptive scenario. It is plausible that balancing selection maintains genetic diversity at the population level, while recent local directional selection acts on specific alleles in populations exposed to high-BIO16 environments. Collectively, these results underscore MS4A7 as a compelling candidate gene underlying precipitation-mediated environmental adaptation in Chinese indigenous pigs. While the causality between MS4A7 function and precipitation adaptation remains to be experimentally validated, future work incorporating gene expression assays and genome editing under contrasting precipitation environments is warranted.

This study integrates whole-genome variation, genotype–environment association, and multilayer functional annotation to systematically uncover the genomic basis of local adaptation. Our findings highlight BIO16 as a key environmental driver and provide novel insights into the evolutionary dynamics of livestock adaptation, with implications for climate-resilient conservation and breeding strategies. In practical terms, the adaptive loci identified here can serve as molecular markers to prioritize indigenous breeds for conservation, thereby preserving adaptive diversity under climate change [8]. Moreover, incorporating such loci into genomic selection programs may improve resilience-related traits such as pathogen resistance and metabolic stability, supporting sustainable pig breeding [57]. Finally, these findings provide a valuable genomic resource that can be introgressed into commercial populations to enhance global pig improvement. However, we acknowledge that our sampling lacked sufficient representation from arid northwestern regions of China, which may limit the generalizability of our findings across all climatic zones. Future efforts should aim to incorporate additional samples from these underrepresented areas to enhance the ecological coverage and analytical resolution of local adaptation studies.

5. Conclusions

This study reveals the genomic signatures of local climate adaptation in Chinese indigenous pigs, with BIO16 identified as a key environmental factor. By integrating genome-wide association analysis, environmental modeling, and functional annotation, we identified candidate loci and regulatory mechanisms associated with climate adaptation, including the gene MS4A7. These findings improve our understanding of environmental adaptation in livestock and provide a valuable foundation for conservation and climate-resilient breeding programs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ani15162412/s1, Table S1: Breed information of 578 Chinese indigenous pigs; Table S2: Geographic coordinates and 19 bioclimatic variables for 46 Chinese indigenous pig breeds; Table S3: Candidate loci associated with the BIO16 variable and their corresponding annotated genes; Table S4: GO enrichment results of candidate genes associated with BIO16; Table S5: KEGG enrichment results of candidate genes associated with BIO16; Table S6: Significance of environmental predictors based on RDA permutation tests; Table S7: Eigenvalues and proportion of constrained genetic variation explained by the first six RDA axes; Table S8: Phenome-wide association study (PheWAS) results for MS4A7 (ENSSSCG00000013102) based on PigBioBank database; Table S9: Transcriptome-wide association study (TWAS) results for MS4A7 (ENSSSCG00000013102) across pig tissues; Figure S1: Robustness of genotype–environment association (GEA) results under different minor allele frequency (MAF) thresholds for BIO16 (precipitation in the wettest quarter); Figure S2: Genome-wide linkage disequilibrium (LD) decay across four Chinese indigenous pig population groups; Figure S3: Genotype frequency distribution of the top BIO16-associated SNP (2_11304356_T_A) across 46 Chinese indigenous pig breeds; Figure S4: Phenome-wide association analysis (PheWAS) of the candidate gene MS4A7 (ENSSSCG00000013102) based on PigBioBank data.

Author Contributions

Study design: Z.Z. (Zhe Zhang); Genotype and environmental data processing: Y.L., G.L., Y.X., Z.Z. (Zhanming Zhong) and W.A.; Population Genetic Structure, and Genetic Diversity: Y.L., Y.X., Z.Z. (Zhanming Zhong) and G.L.; Environmental and genomic correlation analysis Env-GWAS analyses: Y.L.; genotype–environment association (GEA) and annotation and enrichment of SNPs significantly associated: Y.L., G.L., Z.Z. (Zhanming Zhong), Y.X. and W.A.; Drafting the manuscript: Y.L., Y.X. and W.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China Agriculture Research System (CARS-35), and Guangxi Science and Technology Program Project (GuikeJB23023003). All the funders had no role in study design, data collection, analysis, decision to publish or preparation of the manuscript.

Data Availability Statement

All the data necessary to evaluate the conclusions of this paper are included within the main manuscript and/or the Supplementary Materials. The 578 resequenced datasets analyzed in this study were derived from our previously published pig genomics reference panel (PGRP v1) [18]). Detailed information on these datasets is provided in Supplementary Table S1.

Acknowledgments

We are very grateful to all the researchers who contributed to the publicly available data used in this research. We are grateful to the National Supercomputing Center in Wuxi for doing the numerical calculations in this paper on its supercomputer system.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Perleberg, C.; Kind, A.; Schnieke, A. Genetically engineered pigs as models for human disease. Dis. Model. Mech. 2018, 11, dmm030783. [Google Scholar] [CrossRef]
Lunney, J.K.; Van Goor, A.; Walker, K.E.; Hailstock, T.; Franklin, J.; Dai, C. Importance of the pig as a human biomedical model. Sci. Transl. Med. 2021, 13, eabd5758. [Google Scholar] [CrossRef] [PubMed]
Ai, H.; Fang, X.; Yang, B.; Huang, Z.; Chen, H.; Mao, L.; Zhang, F.; Zhang, L.; Cui, L.; He, W.; et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat. Genet. 2015, 47, 217–225. [Google Scholar] [CrossRef]
Zhang, M.; Yang, Q.; Ai, H.; Huang, L. Revisiting the Evolutionary History of Pigs via De Novo Mutation Rate Estimation in A Three-Generation Pedigree. Genom. Proteom. Bioinform. 2022, 20, 1040–1052. [Google Scholar] [CrossRef] [PubMed]
Larson, G.; Liu, R.; Zhao, X.; Yuan, J.; Fuller, D.; Barton, L.; Dobney, K.; Fan, Q.; Gu, Z.; Liu, X.H.; et al. Patterns of East Asian pig domestication, migration, and turnover revealed by modern and ancient DNA. Proc. Natl. Acad. Sci. USA 2010, 107, 7686–7691. [Google Scholar] [CrossRef]
Li, M.; Tian, S.; Jin, L.; Zhou, G.; Li, Y.; Zhang, Y.; Wang, T.; Yeung, C.K.; Chen, L.; Ma, J.; et al. Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat. Genet. 2013, 45, 1431–1438. [Google Scholar] [CrossRef]
Bosse, M.; Megens, H.J.; Frantz, L.A.; Madsen, O.; Larson, G.; Paudel, Y.; Duijvesteijn, N.; Harlizius, B.; Hagemeijer, Y.; Crooijmans, R.P.; et al. Genomic analysis reveals selection for Asian genes in European pigs following human-mediated introgression. Nat. Commun. 2014, 5, 4392. [Google Scholar] [CrossRef]
Wang, Z.; Song, B.; Yao, J.; Li, X.; Zhang, Y.; Tang, Z.; Yi, G. Whole-genome analysis reveals distinct adaptation signatures to diverse environments in Chinese domestic pigs. J. Anim. Sci. Biotechnol. 2024, 15, 97. [Google Scholar] [CrossRef] [PubMed]
Rellstab, C.; Gugerli, F.; Eckert, A.J.; Hancock, A.M.; Holderegger, R. A practical guide to environmental association analysis in landscape genomics. Mol. Ecol. 2015, 24, 4348–4370. [Google Scholar] [CrossRef]
Capblancq, T.; Luu, K.; Blum, M.G.B.; Bazin, E. Evaluation of redundancy analysis to identify signatures of local adaptation. Mol. Ecol. Resour. 2018, 18, 1223–1233. [Google Scholar] [CrossRef]
Hancock, A.M.; Brachi, B.; Faure, N.; Horton, M.W.; Jarymowycz, L.B.; Sperone, F.G.; Toomajian, C.; Roux, F.; Bergelson, J. Adaptation to climate across the Arabidopsis thaliana genome. Science 2011, 334, 83–86. [Google Scholar] [CrossRef]
Li, F.; Gates, D.J.; Buckler, E.S.; Hufford, M.B.; Janzen, G.M.; Rellan-Alvarez, R.; Rodriguez-Zapata, F.; Romero Navarro, J.A.; Sawers, R.J.H.; Snodgrass, S.J.; et al. Environmental data provide marginal benefit for predicting climate adaptation. PLoS Genet. 2025, 21, e1011714. [Google Scholar] [CrossRef]
Zhang, C.L.; Zhang, J.; Tuersuntuoheti, M.; Zhou, W.; Han, Z.; Li, X.; Yang, R.; Zhang, L.; Zheng, L.; Liu, S. Landscape genomics reveals adaptive divergence of indigenous sheep in different ecological environments of Xinjiang, China. Sci. Total Environ. 2023, 904, 166698. [Google Scholar] [CrossRef]
Layton, K.K.S.; Snelgrove, P.V.R.; Dempson, J.B. Genomic evidence of past and future climate-linked loss in a migratory Arctic fish. Nat. Clim. Change 2021, 158, 1758–6798. [Google Scholar] [CrossRef]
Campagna, L.; Toews, D.P.L. The genomics of adaptation in birds. Curr. Biol. 2022, 32, R1173–R1186. [Google Scholar] [CrossRef]
Passamonti, M.M.; Somenzi, E.; Barbato, M.; Chillemi, G.; Colli, L.; Joost, S.; Milanesi, M.; Negrini, R.; Santini, M.; Vajana, E.; et al. The Quest for Genes Involved in Adaptation to Climate Change in Ruminant Livestock. Animals 2021, 11, 2833. [Google Scholar] [CrossRef] [PubMed]
Benjelloun, B.; Alberto, F.J.; Streeter, I.; Boyer, F.; Coissac, E.; Stucki, S.; BenBati, M.; Ibnelbachyr, M.; Chentouf, M.; Bechchari, A.; et al. Characterizing neutral genomic diversity and selection signatures in indigenous populations of Moroccan goats (Capra hircus) using WGS data. Front. Genet. 2015, 6, 107. [Google Scholar] [CrossRef] [PubMed]
Teng, J.; Gao, Y.; Yin, H.; Bai, Z.; Liu, S.; Zeng, H.; Pig, G.C.; Bai, L.; Cai, Z.; Zhao, B.; et al. A compendium of genetic regulatory effects across pig tissues. Nat. Genet. 2024, 56, 112–123. [Google Scholar] [CrossRef]
Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 2017, 37, 4302–4315. [Google Scholar] [CrossRef]
Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, 7. [Google Scholar] [CrossRef] [PubMed]
Tamura, K.; Stecher, G.; Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol. Biol. Evol. 2021, 38, 3022–3027. [Google Scholar] [CrossRef]
Yu, G.; Lam, T.T.; Zhu, H.; Guan, Y. Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree. Mol. Biol. Evol. 2018, 35, 3041–3043. [Google Scholar] [CrossRef] [PubMed]
Alexander, D.H.; Novembre, J.; Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009, 19, 1655–1664. [Google Scholar] [CrossRef]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef] [PubMed]
Caye, K.; Jumentier, B.; Lepeule, J.; Francois, O. LFMM 2: Fast and Accurate Inference of Gene-Environment Associations in Genome-Wide Studies. Mol. Biol. Evol. 2019, 36, 852–860. [Google Scholar] [CrossRef] [PubMed]
Frichot, E.; François, O.; O’Meara, B. LEA: An R package for landscape and ecological association studies. Methods Ecol. Evol. 2015, 6, 925–929. [Google Scholar] [CrossRef]
Frichot, E.; Schoville, S.D.; Bouchard, G.; Francois, O. Testing for associations between loci and environmental gradients using latent factor mixed models. Mol. Biol. Evol. 2013, 30, 1687–1699. [Google Scholar] [CrossRef]
Lê, S.; Josse, J.; Husson, F. FactoMineR: An R Package for Multivariate Analysis. J. Stat. Softw. 2008, 25, 1–18. [Google Scholar] [CrossRef]
Sang, Y.; Long, Z.; Dan, X.; Feng, J.; Shi, T.; Jia, C.; Zhang, X.; Lai, Q.; Yang, G.; Zhang, H.; et al. Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia. Nat. Commun. 2022, 13, 6541. [Google Scholar] [CrossRef]
Ellis, N.; Smith, S.J.; Pitcher, C.R. Gradient forests: Calculating importance gradients on physical predictors. Ecology 2012, 93, 156–168. [Google Scholar] [CrossRef]
Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef]
Pan, Z.; Yao, Y.; Yin, H.; Cai, Z.; Wang, Y.; Bai, L.; Kern, C.; Halstead, M.; Chanthavixay, G.; Trakooljul, N.; et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat. Commun. 2021, 12, 5848. [Google Scholar] [CrossRef]
Finucane, H.K.; Reshef, Y.A.; Anttila, V.; Slowikowski, K. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018, 50, 621–629. [Google Scholar] [CrossRef]
Nakazawa, M. Functions for Medical Statistics Book with Some Demographic Data. 2024. Available online: https://rdrr.io/cran/fmsb/ (accessed on 28 March 2025).
Xu, Z.; Lin, Q.; Cai, X.; Zhong, Z.; Teng, J.; Li, B.; Zeng, H.; Gao, Y.; Cai, Z.; Wang, X.; et al. Integrating large-scale meta-GWAS and PigGTEx resources to decipher the genetic basis of 232 complex traits in pigs. Natl. Sci. Rev. 2025, 12, nwaf048. [Google Scholar] [CrossRef]
Patwardhan, M.N.; Wenger, C.D.; Davis, E.S.; Phanstiel, D.H. Bedtoolsr: An R package for genomic data analysis and manipulation. J. Open Source Softw. 2019, 4, 1742. [Google Scholar] [CrossRef] [PubMed]
Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021, 2, 100141. [Google Scholar] [CrossRef]
Szpiech, Z.A. selscan 2.0: Scanning for sweeps in unphased data. Bioinformatics 2024, 40, btae006. [Google Scholar] [CrossRef] [PubMed]
Lv, F.H.; Agha, S.; Kantanen, J.; Colli, L.; Stucki, S.; Kijas, J.W.; Joost, S.; Li, M.H.; Ajmone Marsan, P. Adaptations to climate-mediated selective pressures in sheep. Mol. Biol. Evol. 2014, 31, 3324–3343. [Google Scholar] [CrossRef]
Benjelloun, B.; Boyer, F.; Streeter, I.; Zamani, W.; Engelen, S.; Alberti, A.; Alberto, F.J.; BenBati, M.; Ibnelbachyr, M.; Chentouf, M.; et al. An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity. Mol. Ecol. Resour. 2019, 19, 1497–1515. [Google Scholar] [CrossRef] [PubMed]
Fitzpatrick, M.C.; Keller, S.R. Ecological genomics meets community-level modelling of biodiversity: Mapping the genomic landscape of current and future environmental adaptation. Ecol. Lett. 2015, 18, 1–16. [Google Scholar] [CrossRef]
Vallejo-Trujillo, A.; Kebede, A.; Lozano-Jaramillo, M.; Dessie, T.; Smith, J.; Hanotte, O.; Gheyas, A.A. Ecological niche modelling for delineating livestock ecotypes and exploring environmental genomic adaptation: The example of Ethiopian village chicken. Front. Ecol. Evol. 2022, 10, 866587. [Google Scholar] [CrossRef]
Wu, R.; Qi, J.; Li, W.; Wang, L.; Shen, Y.; Liu, J.; Teng, Y.; Roos, C.; Li, M. Landscape genomics analysis provides insights into future climate change-driven risk in rhesus macaque. Sci. Total Environ. 2023, 899, 165746. [Google Scholar] [CrossRef] [PubMed]
Kim, E.S.; Elbeltagy, A.R.; Aboul-Naga, A.M.; Rischkowsky, B.; Sayre, B.; Mwacharo, J.M.; Rothschild, M.F. Multiple genomic signatures of selection in goats and sheep indigenous to a hot arid environment. Heredity 2016, 116, 255–264. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Feng, X.; Diao, S.; Liu, Y.; Zhong, Z.; Cai, X.; Li, G.; Teng, J.; Liu, X.; Li, J.; et al. Deciphering genetic characteristics of South China and North China indigenous pigs through selection signatures. BMC Genom. 2024, 25, 1191. [Google Scholar] [CrossRef]
Wadgymar, S.M.; DeMarche, M.L.; Josephs, E.B.; Sheth, S.N.; Anderson, J.T. Local adaptation: Causal agents of selection and adaptive trait divergence. Annu. Rev. Ecol. Evol. Syst. 2022, 53, 87–111. [Google Scholar] [CrossRef]
Marand, A.P.; Eveland, A.L.; Kaufmann, K.; Springer, N.M. cis-Regulatory Elements in Plant Development, Adaptation, and Evolution. Annu. Rev. Plant Biol. 2023, 74, 111–137. [Google Scholar] [CrossRef] [PubMed]
Nasser, J.; Bergman, D.T.; Fulco, C.P.; Guckelberger, P.; Doughty, B.R.; Patwardhan, T.A.; Jones, T.R.; Nguyen, T.H.; Ulirsch, J.C.; Lekschas, F.; et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 2021, 593, 238–243. [Google Scholar] [CrossRef]
Pena-Martinez, E.G.; Rodriguez-Martinez, J.A. Decoding Non-Coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases. Front. Biosci. (Sch. Ed.) 2024, 16, 4. [Google Scholar] [CrossRef]
Cramer, M.N.; Gagnon, D.; Laitano, O.; Crandall, C.G. Human temperature regulation under heat stress in health, disease, and injury. Physiol. Rev. 2022, 102, 1907–1989. [Google Scholar] [CrossRef]
Hewitt, R.J.; Lloyd, C.M. Regulation of immune responses by the airway epithelial cell landscape. Nat. Rev. Immunol. 2021, 21, 347–362. [Google Scholar] [CrossRef] [PubMed]
Vanhaecke, T.; Perrier, E.T.; Melander, O. A Journey Through the Early Evidence Linking Hydration to Metabolic Health. Ann. Nutr. Metab. 2020, 76 (Suppl. S1), 4–9. [Google Scholar] [CrossRef]
Jiang, Y.; Shu, Z.; Cheng, L.; Wang, H.; He, T.; Fu, L.; Zhao, C.; Li, X.; Zeng, W. MS4A7 based metabolic gene signature as a prognostic predictor in lung adenocarcinoma. Front. Mol. Biosci. 2025, 12, 1591446. [Google Scholar] [CrossRef]
Zhou, L.; Lu, Y.; Qiu, X.; Chen, Z.; Tang, Y.; Meng, Z.; Yan, C.; Du, H.; Li, S.; Lin, J.D. Lipid droplet efferocytosis attenuates proinflammatory signaling in macrophages via TREM2- and MS4A7-dependent mechanisms. Cell Rep. 2025, 44, 115310. [Google Scholar] [CrossRef]
Hedrick, P.W. What is the evidence for heterozygote advantage selection? Trends Ecol. Evol. 2012, 27, 698–704. [Google Scholar] [CrossRef] [PubMed]
Penn, D.J.; Potts, W.K. The Evolution of Mating Preferences and Major Histocompatibility Complex Genes. Am. Nat. 1999, 153, 145–164. [Google Scholar] [CrossRef]
Liu, C.; Huang, R.; Su, G.; Hou, L.; Zhou, W.; Liu, Q.; Qiu, Z.; Zhao, Q.; Li, P. Introgression of pigs in Taihu Lake region possibly contributed to the improvement of fertility in Danish Large White pigs. BMC Genom. 2023, 24, 733. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Population structure and genetic diversity of Chinese indigenous pigs. (a) Geographic distribution of 46 indigenous pig populations grouped into four regions: ECN (blue), NCN (light blue), SCN (red), and SWCN (orange). Circle size reflects sample size. (b) PCA of 578 individuals based on genome-wide SNPs; PC1 and PC2 explain 18.68% and 12.64% of variation, respectively. (c) ADMIXTURE plots at K = 2 and K = 3, illustrating population structure and ancestry components. (d) Nucleotide diversity (π) distributions across population groups. (e) Distribution of Tajima’s D values by group, with dashed lines indicating group means.

Figure 2. Environmental PCA and key variable selection. (a) Biplot of the principal component analysis (PCA) based on 19 bioclimatic variables. Arrows indicate the loading directions of environmental variables in the PC1–PC2 space, and arrow color denotes the cos² value (i.e., the contribution of each variable to the principal components), ranging from blue-green (low contribution) to red (high contribution). (b) Distribution of 46 Chinese indigenous pig populations in the environmental PCA space defined by PC1 and PC2. Each point represents a population, colored by genetic group: ECN (blue), NCN (light blue), SCN (red), and SWCN (orange). Ellipses indicate 95% confidence intervals of population clustering. (c) Pairwise Pearson correlation matrix among the 19 bioclimatic variables. Color intensity and square size indicate the strength and direction of correlation (red = positive, blue = negative). (d) Bar plot showing the ranked importance of each environmental variable in the PCA. Orange bars denote the variables retained for subsequent analyses.

Figure 3. Redundancy analysis (RDA) and distance-based correlation reveal environmental influence on genomic variation in Chinese indigenous pig populations. (a,b) RDA results showing the relationship between genomic variation and environmental variables across 578 individuals from 46 local pig populations. Each point represents an individual, colored by population group (ECN, NCN, SCN, SWCN). Blue arrows indicate the direction and strength of the six selected environmental variables in the constrained ordination space. The projections of arrows on RDA axes reflect the relative contribution of each environmental factor. (c,d) Correlation between pairwise FST and geographic or environmental distance. (c) Relationship between F_ST and geographic distance among populations based on background SNPs (blue) and LFMM-identified environment-associated SNPs (pink). The top-left inset shows Mantel correlation coefficients and significance values. (d) Relationship between FST and environmental distance after controlling for geographic distance. Partial Mantel test results are shown in the top-left corner. Significant correlations for LFMM SNPs but not background SNPs suggest an independent role of environmental selection in shaping population differentiation.

Figure 4. Multilayer annotation and enrichment analysis of SNPs significantly associated with BIO16. (a) Manhattan plot of genome-wide LFMM association analysis for BIO16. The y-axis represents −log₁₀(p-value); the red dashed line denotes the genome-wide significance threshold (5 × 10⁻⁸), and the gray dashed line marks the suggestive threshold (1 × 10⁻⁵). (b) Enrichment of significant SNPs across genomic functional categories. *** p < 0.001, * p < 0.05; dashed line indicates enrichment fold = 1. (c) Heatmap of chromatin state enrichment across tissues. Each cell represents a combination of chromatin state (rows) and tissue (columns). Red indicates positive enrichment; blue indicates negative enrichment. *** p < 0.001, ** p < 0.01, * p < 0.05. (d) Enrichment analysis of significant SNPs in the top 1000 tissue-specific genes from 34 tissues. The y-axis indicates enrichment fold; ** p < 0.01, * p < 0.05; dashed line indicates enrichment fold = 1.

Figure 5. Multilayered annotation of the MS4A7 gene region associated with BIO16. (a) Local annotation of the MS4A7 region. The top panel shows the local association and LD plot (r² indicating linkage strength); the bottom panels display the distributions of π, Tajima’s D, F_ST, and XP-EHH, highlighting a strong selective signal centered at 2_11304356_T_A. (b) Distribution of BIO16 values among individuals with different genotypes. A/T carriers show significantly higher BIO16 values than A/A and T/T individuals, *** p < 0.001, ** p < 0.01, ns not significant. (c) Chromatin state and tissue-specific regulatory annotation. The upper panel presents regulatory activity (e.g., enhancers, open chromatin) across different tissues in the MS4A7 region; the lower panel shows gene structure and the position of the top SNP relative to exons and UTRs. (d) Expression profile of MS4A7 across 35 pig tissues. TPM data indicate moderate to high expression levels in the lung, hypothalamus, small intestine, spleen, and adipose tissue, suggesting diverse functional roles.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).