This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes
by
Weixi Xiang
Weixi Xiang 1,2,
Zhaoxin Li
Zhaoxin Li 1,2,
Qixin Sun
Qixin Sun 1,2
,
Xiujuan Chai
Xiujuan Chai 1,2,* and
Tan Sun
Tan Sun 1,2
1
Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
2
Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Submission received: 4 July 2025
/
Revised: 20 August 2025
/
Accepted: 21 August 2025
/
Published: 24 August 2025
(This article belongs to the Section
Pigs)
Simple Summary
Predicting complex genetic traits is essential for improving swine-breeding programs, but traditional methods face limitations. This study introduces a novel deep learning framework, using a Transformer model, to more accurately predict swine phenotypes. The model first learns the fundamental patterns of the pig genome from genetic data and is then fine-tuned to predict key economic traits. Our results show this method outperforms existing approaches, like GBLUP. This enhanced accuracy provides breeders with a powerful tool for selecting superior animals, potentially accelerating genetic gain and delivering substantial economic benefits to the swine industry.
Abstract
Accurate genomic prediction of complex phenotypes is crucial for accelerating genetic progress in swine breeding. However, conventional methods like Genomic Best Linear Unbiased Prediction (GBLUP) face limitations in capturing complex non-additive effects that contribute significantly to phenotypic variation, restricting the potential accuracy of phenotype prediction. To address this challenge, we introduce a novel framework based on a self-supervised, pre-trained encoder-only Transformer model. Its core novelty lies in tokenizing SNP sequences into non-overlapping 6-mers (sequences of 6 SNPs), enabling the model to directly learn local haplotype patterns instead of treating SNPs as independent markers. The model first undergoes self-supervised pre-training on the unlabeled version of the same SNP dataset used for subsequent fine-tuning, learning intrinsic genomic representations through a masked 6-mer prediction task. Subsequently, the pre-trained model is fine-tuned on labeled data to predict phenotypic values for specific economic traits. Experimental validation demonstrates that our proposed model consistently outperforms baseline methods, including GBLUP and a Transformer of the same architecture trained from scratch (without pre-training), in prediction accuracy across key economic traits. This outperformance suggests the model’s capacity to capture non-linear genetic signals missed by linear models. This research contributes not only a new, more accurate methodology for genomic phenotype prediction but also validates the potential of self-supervised learning to decipher complex genomic patterns for direct application in breeding programs. Ultimately, this approach offers a powerful new tool to enhance the rate of genetic gain in swine production by enabling more precise selection based on predicted phenotypes.
Share and Cite
MDPI and ACS Style
Xiang, W.; Li, Z.; Sun, Q.; Chai, X.; Sun, T.
A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes. Animals 2025, 15, 2485.
https://doi.org/10.3390/ani15172485
AMA Style
Xiang W, Li Z, Sun Q, Chai X, Sun T.
A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes. Animals. 2025; 15(17):2485.
https://doi.org/10.3390/ani15172485
Chicago/Turabian Style
Xiang, Weixi, Zhaoxin Li, Qixin Sun, Xiujuan Chai, and Tan Sun.
2025. "A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes" Animals 15, no. 17: 2485.
https://doi.org/10.3390/ani15172485
APA Style
Xiang, W., Li, Z., Sun, Q., Chai, X., & Sun, T.
(2025). A Self-Supervised Pre-Trained Transformer Model for Accurate Genomic Prediction of Swine Phenotypes. Animals, 15(17), 2485.
https://doi.org/10.3390/ani15172485
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.