A Tool for the Design of the Minimal Fingerprinting SNP Set: Use Case for Barley
Abstract
:1. Introduction
2. Materials and Methods
2.1. The MDSearch Algorithm
2.2. Testing Dataset
2.3. Illumina Reads Preparation and Mapping
2.4. SNP-Calling and Filtration for Minimal Discriminatory SNP Set Selection
3. Results and Discussion
3.1. Filtration and Identification of Highly Polymorphic SNPs
- Initially, from a set of 850,174 detected mutations, 148,108 indels and complex mutations were removed (--snps-only), retaining only SNPs.
- Subsequently, out of 702,066 SNPs, 7252 multiallelic SNPs were removed (--min-alleles 2 --max-alleles 2), eliminating SNPs that discriminated more than two alleles.
- The SNPs were then sorted based on their ability to discriminate between all 254 studied varieties (--geno 0), resulting in the removal of SNPs that were unable to distinguish between all cultivars (SNP call rate < 100%). After this step, 41,112 SNPs remained.
- The remaining variants were filtered by minor allele frequency (MAF; --maf 0.01). Minor allele frequency indicates the prevalence of the reference variant of SNP. For highly polymorphic single-nucleotide mutations without homozygous genotypes in diploid species, the MAF cannot exceed 0.5, meaning that 50% of varieties have a reference SNP variant, and 50% have an alternative one. After appropriate evaluation, 37,450 SNPs with MAF < 1% were removed because such a low polymorphism frequency may indicate a genotyping error. Before the next stage, the set included 3662 SNPs.
- Homozygous SNPs are the most suitable for creating the SNP barcode, so 3166 SNPs that were heterozygous in at least one of the cultivars were removed from the set (--keep <filename>). After this procedure, 496 homozygous SNPs remained in the set.
- For certification, it is important to use SNPs that are not inherited linked, because this may affect the accuracy of the passport during the variety cultivation. Therefore, the pairwise linkage of 496 SNPs was analyzed in a sliding window of 50 SNPs with a step of 5 SNPs (--indep-pairwise 50 5 0.2). As a result, SNP pairs with linkage greater than 0.2 were removed, meaning that the genetic proximity of the two genotypes was at least 0.2. After removing 319 SNPs, 177 SNPs remained for further analysis.
- The optimal set of markers is considered to be one in which the markers are evenly distributed throughout the genome, which also reduces the risk of having linked markers in the set. Therefore, at the final stage of filtering, SNPs with a distance of less than 10 Mbp (--bp-space 10,000,000) were removed. As a result, 89 SNPs were excluded from the analysis, and the remaining 88 SNPs were utilized to identify the minimum discriminatory set of SNPs.
3.2. A Minimal SNP Set Development
3.3. Developed SNP Set Comparison with Known Sets
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Muñoz-Amatriaín, M.; Cuesta-Marcos, A.; Hayes, P.M.; Muehlbauer, G.J. Barley genetic variation: Implications for crop improvement. Brief. Funct. Genom. 2014, 13, 341–350. [Google Scholar] [CrossRef]
- Bohra, A.; Chand Jha, U.; Godwin, I.D.; Kumar Varshney, R. Genomic interventions for sustainable agriculture. Plant Biotechnol. J. 2020, 18, 2388–2405. [Google Scholar] [CrossRef]
- Hasan, N.; Choudhary, S.; Naaz, N.; Sharma, N.; Laskar, R.A. Recent advancements in molecular marker-assisted selection and applications in plant breeding programmes. J. Genet. Eng. Biotechnol. 2021, 19, 128. [Google Scholar] [CrossRef] [PubMed]
- Abed, A.; Belzile, F. Comparing single-SNP, multi-SNP, and haplotype-based approaches in association studies for major traits in barley. Plant Genome 2019, 12, 190036. [Google Scholar] [CrossRef]
- Close, T.J.; Bhat, P.R.; Lonardi, S.; Wu, Y.; Rostoks, N.; Ramsay, L.; Druka, A.; Stein, N.; Svensson, J.T.; Wanamaker, S.; et al. Development and implementation of high-throughput SNP genotyping in barley. BMC Genom. 2009, 10, 582. [Google Scholar] [CrossRef] [PubMed]
- Comadran, J.; Kilian, B.; Russell, J.; Ramsay, L.; Stein, N.; Ganal, M.; Shaw, P.; Bayer, M.; Thomas, W.; Marshall, D.; et al. Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nat. Genet. 2012, 44, 1388–1392. [Google Scholar] [CrossRef] [PubMed]
- Bayer, M.M.; Rapazote-Flores, P.; Ganal, M.; Hedley, P.E.; Macaulay, M.; Plieske, J.; Ramsay, L.; Russell, J.; Shaw, P.D.; Thomas, W.; et al. Development and evaluation of a barley 50K iSelect SNP array. Front. Plant Sci. 2017, 8, 1792. [Google Scholar] [CrossRef]
- Zhang, G.; Li, C. Genetics and Improvement of Barley Malt Quality; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Mohammadi, S.A.; Abdollahi Sisi, N.; Sadeghzadeh, B. The influence of breeding history, origin and growth type on population structure of barley as revealed by SSR markers. Sci. Rep. 2020, 10, 19165. [Google Scholar] [CrossRef] [PubMed]
- Pasam, R.K.; Sharma, R.; Walther, A.; Özkan, H.; Graner, A.; Kilian, B. Genetic diversity and population structure in a legacy collection of spring barley landraces adapted to a wide range of climates. PLoS ONE 2014, 9, e116164. [Google Scholar] [CrossRef]
- Song, L.; Wang, R.; Yang, X.; Zhang, A.; Liu, D. Molecular markers and their applications in marker-assisted selection (MAS) in bread wheat (Triticum aestivum L.). Agriculture 2023, 13, 642. [Google Scholar] [CrossRef]
- Gale, K.; Jiang, H.; Westcott, M. An optimization method for the identification of minimal sets of discriminating gene markers: Application to cultivar identification in wheat. J. Bioinform. Comput. Biol. 2005, 3, 269–279. [Google Scholar] [CrossRef] [PubMed]
- Fujii, H.; Ogata, T.; Shimada, T.; Endo, T.; Iketani, H.; Shimizu, T.; Yamamoto, T.; Omura, M. Minimal marker: An algorithm and computer program for the identification of minimal sets of discriminating DNA markers for efficient variety identification. J. Bioinform. Comput. Biol. 2013, 11, 1250022. [Google Scholar] [CrossRef]
- Henning, J.A.; Coggins, J.; Peterson, M. Simple SNP-based minimal marker genotyping for Humulus lupulus L. identification and variety validation. BMC Res. Notes 2015, 8, 542. [Google Scholar] [CrossRef] [PubMed]
- Allen, A.M.; Barker, G.L.; Wilkinson, P.; Burridge, A.; Winfield, M.; Coghill, J.; Uauy, C.; Griffiths, S.; Jack, P.; Berry, S.; et al. Discovery and development of exome-based, co-dominant single nucleotide polymorphism markers in hexaploid wheat (Triticum aestivum L.). Plant Biotechnol. J. 2013, 11, 279–295. [Google Scholar] [CrossRef] [PubMed]
- Owen, H.; Pearson, K.; Roberts, A.M.; Reid, A.; Russell, J. Single nucleotide polymorphism assay to distinguish barley (Hordeum vulgare L.) varieties in support of seed certification. Genet. Resour. Crop Evol. 2019, 66, 1243–1256. [Google Scholar] [CrossRef]
- Wendler, N.; Mascher, M.; Nöh, C.; Himmelbach, A.; Scholz, U.; Ruge-Wehling, B.; Stein, N. Unlocking the secondary gene-pool of barley with next-generation sequencing. Plant Biotechnol. J. 2014, 12, 1122–1131. [Google Scholar] [CrossRef]
- Szigat, G. Amphidiploid Hybrids between Hordeum vulgare and Hordeum bulbosum-Basis for the Development of New Initial Material for Winter Barley Breeding; Vortraege fuer Pflanzenzuechtung: Brussels, Belgium, 1991; Volume 20. [Google Scholar]
- Ruge-Wehling, B.; Linz, A.; Habekuß, A.; Wehling, P. Mapping of Rym16 Hb, the second soil-borne virus-resistance gene introgressed from Hordeum bulbosum. Theor. Appl. Genet. 2006, 113, 867–873. [Google Scholar] [CrossRef]
- Mascher, M.; Wicker, T.; Jenkins, J.; Plott, C.; Lux, T.; Koh, C.S.; Ens, J.; Gundlach, H.; Boston, L.B.; Tulpová, Z.; et al. Long-read sequence assembly: A technical evaluation in barley. Plant Cell 2021, 33, 1888–1906. [Google Scholar] [CrossRef]
- Andrews, S. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinform. 2010. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc (accessed on 10 June 2024).
- Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve years of SAMtools and BCFtools. Gigascience 2021, 10, giab008. [Google Scholar] [CrossRef] [PubMed]
- Garrison, E.; Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 2012, arXiv:1207.3907. [Google Scholar]
- Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, s13742-015. [Google Scholar] [CrossRef] [PubMed]
- Carlson, C.S.; Eberle, M.A.; Rieder, M.J.; Yi, Q.; Kruglyak, L.; Nickerson, D.A. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 2004, 74, 106–120. [Google Scholar] [CrossRef] [PubMed]
- Baniecki, M.L.; Moon, J.; Sani, K.; Lemieux, J.E.; Schaffner, S.F.; Sabeti, P.C. Development of a SNP barcode to genotype Babesia microti infections. PLoS Neglected Trop. Dis. 2019, 13, e0007194. [Google Scholar] [CrossRef] [PubMed]
- Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
- Hayden, M.; Tabone, T.; Nguyen, T.; Coventry, S.; Keiper, F.; Fox, R.; Chalmers, K.; Mather, D.; Eglinton, J. An informative set of SNP markers for molecular characterisation of Australian barley germplasm. Crop Pasture Sci. 2009, 61, 70–83. [Google Scholar] [CrossRef]
- Templeton, A.R. The theory of speciation via the founder principle. Genetics 1980, 94, 1011–1038. [Google Scholar] [CrossRef]
# SNP | Chromosome Coordinate a | Ref. Allele | Alt. Allele | MAF b | PIC c |
---|---|---|---|---|---|
1 | 1H:5,550,434 | T | A | 0.11 | 0.20 |
2 | 1H:402,453,297 | G | A | 0.36 | 0.46 |
3 | 2H:39,017,947 | T | C | 0.11 | 0.20 |
4 | 2H:81,911,835 | G | T | 0.19 | 0.31 |
5 | 2H:560,080,229 | A | G | 0.38 | 0.47 |
6 | 3H:205,461,025 | C | G | 0.24 | 0.36 |
7 | 3H:447,564,794 | C | A | 0.23 | 0.35 |
8 | 3H:525,419,066 | G | C | 0.38 | 0.47 |
9 | 3H:566,880,335 | T | C | 0.27 | 0.39 |
10 | 3H:615,767,065 | T | C | 0.28 | 0.40 |
11 | 4H:453,072,008 | T | C | 0.15 | 0.26 |
12 | 5H:456,058,599 | G | C | 0.43 | 0.49 |
13 | 5H:519,706,238 | A | G | 0.13 | 0.23 |
14 | 5H:545,182,553 | T | C | 0.11 | 0.20 |
15 | 6H:13,191,080 | T | G | 0.44 | 0.49 |
16 | 6H:71,795,477 | A | T | 0.45 | 0.50 |
17 | 7H:208,590,826 | A | G | 0.12 | 0.21 |
18 | 7H:457,261,245 | G | C | 0.38 | 0.47 |
19 | 7H:616,514,313 | G | A | 0.38 | 0.47 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ermolaev, A.; Samarina, M.; Strembovskiy, I.; Kroupin, P.; Karlov, G.; Kharchenko, P.; Voronov, S.; Eroshenko, L.; Kryuchenko, E.; Laptina, Y.; et al. A Tool for the Design of the Minimal Fingerprinting SNP Set: Use Case for Barley. Agronomy 2024, 14, 1802. https://doi.org/10.3390/agronomy14081802
Ermolaev A, Samarina M, Strembovskiy I, Kroupin P, Karlov G, Kharchenko P, Voronov S, Eroshenko L, Kryuchenko E, Laptina Y, et al. A Tool for the Design of the Minimal Fingerprinting SNP Set: Use Case for Barley. Agronomy. 2024; 14(8):1802. https://doi.org/10.3390/agronomy14081802
Chicago/Turabian StyleErmolaev, Aleksey, Mariya Samarina, Ilya Strembovskiy, Pavel Kroupin, Gennady Karlov, Pyotr Kharchenko, Sergey Voronov, Lyubov Eroshenko, Elizaveta Kryuchenko, Yulia Laptina, and et al. 2024. "A Tool for the Design of the Minimal Fingerprinting SNP Set: Use Case for Barley" Agronomy 14, no. 8: 1802. https://doi.org/10.3390/agronomy14081802
APA StyleErmolaev, A., Samarina, M., Strembovskiy, I., Kroupin, P., Karlov, G., Kharchenko, P., Voronov, S., Eroshenko, L., Kryuchenko, E., Laptina, Y., Avdeev, S., Shirnin, S., Igonin, V., & Divashuk, M. (2024). A Tool for the Design of the Minimal Fingerprinting SNP Set: Use Case for Barley. Agronomy, 14(8), 1802. https://doi.org/10.3390/agronomy14081802