GIPA: A High-Throughput Computational Toolkit for Genomic Identity and Parentage Analysis in Modern Crop Breeding
Abstract
1. Introduction
- Dual-Functionality: Seamlessly switch between identity verification and parentage discovery.
- Advanced Error Correction: A sliding-window algorithm minimizes the impact of sporadic genotyping errors on final calculations.
- Intelligent Sample Classification: Automatically distinguishes inbred from hybrid lines based on heterozygosity, refining the parentage search space.
- Integrated High-Quality Visualization: Generates intuitive, chromosome-level heatmaps that provide a clear visual representation of genomic similarity, surpassing the abstract nature of phylogenetic trees for this application.
2. Materials and Methods
2.1. Software Architecture and Implementation
2.2. Identity Analysis Module
2.3. Parentage Analysis Module
2.4. Visualization Module
2.5. Usage and Parameters
3. Results
3.1. Case Study 1: Identity Analysis for Soybean Backcross Breeding
3.2. Case Study 2: Parentage Analysis of Commercial Maize Hybrids
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
GIPA | Genomic Identity and Parentage Analysis |
SNP | Single Nucleotide Polymorphisms |
VCF | Variant Call Format |
References
- He, J.; Zhao, X.; Laroche, A.; Lu, Z.-X.; Liu, H.; Li, Z. Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding. Front. Plant Sci. 2014, 5, 484. [Google Scholar] [CrossRef]
- Varshney, R.K.; Graner, A.; Sorrells, M.E. Genomics-assisted breeding for crop improvement. Trends Plant Sci. 2005, 10, 621–630. [Google Scholar] [CrossRef]
- Bohra, A.; Chand Jha, U.; Godwin, I.D.; Kumar Varshney, R. Genomic interventions for sustainable agriculture. Plant Biotechnol. J. 2020, 18, 2388–2405. [Google Scholar] [CrossRef]
- Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef] [PubMed]
- Rasheed, A.; Hao, Y.; Xia, X.; Khan, A.; Xu, Y.; Varshney, R.K.; He, Z. Crop breeding chips and genotyping platforms: Progress, challenges, and perspectives. Mol. Plant 2017, 10, 1047–1064. [Google Scholar] [CrossRef] [PubMed]
- Gill, T.; Gill, S.K.; Saini, D.K.; Chopra, Y.; de Koff, J.P.; Sandhu, K.S. A comprehensive review of high throughput phenotyping and machine learning for plant stress phenotyping. Phenomics 2022, 2, 156–183. [Google Scholar] [CrossRef]
- Collard, B.C.; Jahufer, M.; Brouwer, J.; Pang, E.C.K. An introduction to markers, quantitative trait loci (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts. Euphytica 2005, 142, 169–196. [Google Scholar] [CrossRef]
- Meuwissen, T.H.; Hayes, B.J.; Goddard, M. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef]
- Korte, A.; Farlow, A. The advantages and limitations of trait analysis with GWAS: A review. Plant Methods 2013, 9, 29. [Google Scholar] [CrossRef]
- Varshney, R.K.; Bohra, A.; Yu, J.; Graner, A.; Zhang, Q.; Sorrells, M.E. Designing future crops: Genomics-assisted breeding comes of age. Trends Plant Sci. 2021, 26, 631–649. [Google Scholar] [CrossRef]
- Schnable, P.S.; Ware, D.; Fulton, R.S.; Stein, J.C.; Wei, F.; Pasternak, S.; Liang, C.; Zhang, J.; Fulton, L.; Graves, T.A. The B73 maize genome: Complexity, diversity, and dynamics. Science 2009, 326, 1112–1115. [Google Scholar] [CrossRef]
- Liu, Y.; Du, H.; Li, P.; Shen, Y.; Peng, H.; Liu, S.; Zhou, G.-A.; Zhang, H.; Liu, Z.; Shi, M. Pan-genome of wild and cultivated soybeans. Cell 2020, 182, 162–176.e13. [Google Scholar] [CrossRef]
- Jones, A.G.; Ardren, W.R. Methods of parentage analysis in natural populations. Mol. Ecol. 2003, 12, 2511–2523. [Google Scholar] [CrossRef]
- Frisch, M.; Melchinger, A.E. Selection theory for marker-assisted backcrossing. Genetics 2005, 170, 909–917. [Google Scholar] [CrossRef] [PubMed]
- Hospital, F. Selection in backcross programmes. Philos. Trans. R. Soc. B Biol. Sci. 2005, 360, 1503–1511. [Google Scholar] [CrossRef]
- Fraiture, M.-A.; Herman, P.; Taverniers, I.; De Loose, M.; Deforce, D.; Roosens, N.H. Current and new approaches in GMO detection: Challenges and solutions. BioMed Res. Int. 2015, 2015, 392872. [Google Scholar] [CrossRef]
- Josia, C.; Mashingaidze, K.; Amelework, A.B.; Kondwakwenda, A.; Musvosvi, C.; Sibiya, J. SNP-based assessment of genetic purity and diversity in maize hybrid breeding. PLoS ONE 2021, 16, e0249505. [Google Scholar] [CrossRef]
- Myles, S.; Boyko, A.R.; Owens, C.L.; Brown, P.J.; Grassi, F.; Aradhya, M.K.; Prins, B.; Reynolds, A.; Chia, J.-M.; Ware, D. Genetic structure and domestication history of the grape. Proc. Natl. Acad. Sci. USA 2011, 108, 3530–3535. [Google Scholar] [CrossRef]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.; Daly, M.J. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed]
- Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23, 2633–2635. [Google Scholar] [CrossRef] [PubMed]
- Felsenstein, J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 1985, 39, 783–791. [Google Scholar] [CrossRef]
- Jones, O.R.; Wang, J. COLONY: A program for parentage and sibship inference from multilocus genotype data. Mol. Ecol. Resour. 2010, 10, 551–555. [Google Scholar] [CrossRef] [PubMed]
- Huisman, J. Pedigree reconstruction from SNP data: Parentage assignment, sibship clustering and beyond. Mol. Ecol. Resour. 2017, 17, 1009–1024. [Google Scholar] [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; Subgroup, G.P.D.P. The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
- Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
- McKinney, W. Data structures for statistical computing in Python. Scipy 2010, 445, 51–56. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Huang, X.; Feng, Q.; Qian, Q.; Zhao, Q.; Wang, L.; Wang, A.; Guan, J.; Fan, D.; Weng, Q.; Huang, T. High-throughput genotyping by whole-genome resequencing. Genome Res. 2009, 19, 1068–1076. [Google Scholar] [CrossRef]
- Pompanon, F.; Bonin, A.; Bellemain, E.; Taberlet, P. Genotyping errors: Causes, consequences and solutions. Nat. Rev. Genet. 2005, 6, 847–859. [Google Scholar] [CrossRef] [PubMed]
- Naruya Saitou, M.N. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987, 4, 406–425. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Qi, Y.; Sun, G.; Zhang, S.; Li, W.; Wang, Y. Improving Soybean Breeding Efficiency Using Marker-Assisted Selection. Mol. Plant Breed. 2024, 15, 259–268. [Google Scholar] [CrossRef]
- Bhat, J.A.; Feng, X.; Mir, Z.A.; Raina, A.; Siddique, K.H. Recent advances in artificial intelligence, mechanistic models, and speed breeding offer exciting opportunities for precise and accelerated genomics-assisted breeding. Physiol. Plant. 2023, 175, e13969. [Google Scholar] [CrossRef] [PubMed]
- Mumm, R.H. A look at product development with genetically modified crops: Examples from maize. J. Agric. Food Chem. 2013, 61, 8254–8259. [Google Scholar] [CrossRef]
- Gowda, M.; Worku, M.; Nair, S.K.; Palacios-Rojas, N.; Huestis, G.; Prasanna, B. Quality Assurance/Quality Control (QA/QC) in Maize Breeding and Seed Production: Theory and Practice; CIMMYT: Nairobi, Kenya, 2017; Volume 13. [Google Scholar]
- Sundaram, R.M.; Vishnupriya, M.; Laha, G.S.; Rani, N.S.; Rao, P.S.; Balachandran, S.M.; Reddy, G.A.; Sarma, N.P.; Sonti, R.V. Introduction of bacterial blight resistance into Triguna, a high yielding, mid-early duration rice variety. Biotechnol. J. Healthc. Nutr. Technol. 2009, 4, 400–407. [Google Scholar] [CrossRef] [PubMed]
Parent 1 Genotype | Parent 2 Genotype | Expected Offspring Genotype(s) |
---|---|---|
AA | AA | AA |
BB | BB | BB |
AA | BB | AB |
AA | AB | AA, AB |
BB | AB | BB, AB |
AB | AB | AA, AB, BB |
Parameter | Short | Description |
---|---|---|
--vcf | -v | Path to the input VCF file (required). |
--sample | -s | Name of the query sample (required). |
--refs | -r | Path to a text file listing the reference samples (required). |
--out | -o | Prefix for all output files (default: output). |
--chr | -c | Restrict analysis to a specific chromosome. |
--threads | -t | Number of threads to use (default: 1). |
--heatmap-window | -hw | Window size for heatmaps (kb) (default: 50). |
--filter-times | -ft | Filter times for sliding window (default: 2) |
--filter-window | -fw | Sliding window size (default: 5) |
--find_parents | Activates the parentage analysis module. | |
--generate-heatmaps | Generates heatmap visualizations. |
Sample | Chromosome | Identity (%) | Compared_SNPs | Matched_SNPs |
---|---|---|---|---|
TianLong1 | Whole genome | 98.02 | 3,141,257 | 3,078,990 |
Chr01 | 97.86 | 164,761 | 161,229 | |
ZhongH13 | Whole genome | 69.17 | 3,193,227 | 2,208,701 |
Chr01 | 72.58 | 167,232 | 121,371 | |
HuaXia1Hao | Whole genome | 67.64 | 3,297,907 | 2,230,761 |
Chr01 | 51.97 | 169,733 | 88,216 | |
WanDou28 | Whole genome | 67.52 | 3,204,985 | 2,163,943 |
Chr01 | 73.84 | 169,055 | 124,833 | |
KenFeng16 | Whole genome | 66.23 | 3,195,894 | 2,116,533 |
Chr01 | 79.27 | 170,140 | 134,878 | |
ZhongH35 | Whole genome | 65.81 | 3,281,181 | 2,159,290 |
Chr01 | 63.13 | 174,396 | 110,092 | |
KeShan1Hao | Whole genome | 65.14 | 3,078,978 | 2,005,742 |
Chr01 | 69.37 | 166,057 | 115,192 | |
KenDou40 | Whole genome | 65.1 | 3,181,010 | 2,070,929 |
Chr01 | 77.6 | 169,144 | 131,257 | |
HeiKe60Hao | Whole genome | 64.8 | 3,086,118 | 1,999,768 |
Chr01 | 85.83 | 163,723 | 140,523 | |
HeiHe45 | Whole genome | 63.87 | 3,118,058 | 1,991,467 |
Chr01 | 75.31 | 163,857 | 123,397 |
Sample | Parental Combination | Match (%) | Informative_SNPs | Matched_SNPs |
---|---|---|---|---|
JK968 | Jing724 × Jing92 | 98.46 | 5,565,837 | 5,480,162 |
CT3354 × Jing92 | 86.65 | 5,396,030 | 4,675,800 | |
Chang7-2 × Jing724 | 78.39 | 5,459,145 | 4,279,460 | |
CT1669 × Jing92 | 69.46 | 5,631,796 | 3,911,826 | |
CT3354 × Chang7-2 | 67.91 | 5,280,258 | 3,585,815 | |
YF303 | CT1669 × CT3354 | 97.6 | 5,639,112 | 5,503,557 |
CT1669 × Jing724 | 87.48 | 5,730,026 | 5,012,759 | |
CT3354 × Jing724 | 77.92 | 5,802,146 | 4,520,876 | |
CT3354 × Zheng58 | 68.29 | 5,427,176 | 3,706,144 | |
CT1669 × Zheng58 | 67.28 | 5,625,454 | 3,785,022 | |
ZD958 | Chang7-2 × Zheng58 | 97.32 | 5,308,431 | 5,166,072 |
Zheng58 × Jing92 | 80.94 | 5,299,261 | 4,289,174 | |
Chang7-2 × Jing92 | 70.33 | 5,251,192 | 3,693,262 | |
Chang7-2 × Jing724 | 66.45 | 5,078,815 | 3,374,692 | |
CT3354 × Chang7-2 | 65.72 | 4,999,678 | 3,285,750 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Y.-F.; Ma, X.-Y.; Wan, Y.; Shen, Z.-C.; Ye, Y.-X. GIPA: A High-Throughput Computational Toolkit for Genomic Identity and Parentage Analysis in Modern Crop Breeding. Agronomy 2025, 15, 2441. https://doi.org/10.3390/agronomy15102441
Yu Y-F, Ma X-Y, Wan Y, Shen Z-C, Ye Y-X. GIPA: A High-Throughput Computational Toolkit for Genomic Identity and Parentage Analysis in Modern Crop Breeding. Agronomy. 2025; 15(10):2441. https://doi.org/10.3390/agronomy15102441
Chicago/Turabian StyleYu, Yi-Fan, Xiao-Ya Ma, Yue Wan, Zhi-Cheng Shen, and Yu-Xuan Ye. 2025. "GIPA: A High-Throughput Computational Toolkit for Genomic Identity and Parentage Analysis in Modern Crop Breeding" Agronomy 15, no. 10: 2441. https://doi.org/10.3390/agronomy15102441
APA StyleYu, Y.-F., Ma, X.-Y., Wan, Y., Shen, Z.-C., & Ye, Y.-X. (2025). GIPA: A High-Throughput Computational Toolkit for Genomic Identity and Parentage Analysis in Modern Crop Breeding. Agronomy, 15(10), 2441. https://doi.org/10.3390/agronomy15102441