Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (15)

Search Parameters:
Keywords = linkage disequilibrium pruning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 2734 KB  
Article
Using Whole-Genome Sequencing Data Reveals the Population Structure and Selection Signatures for Reproduction Traits in Duolang Sheep
by Keyao Wang, Qianjun Li, Zhigang Niu, Zhengfen Xue, Shiyuan Li, Jiabao Yan, Yang Chen, Yanlong Zhang, Hongcai Shi and Xiangdong Ding
Animals 2025, 15(23), 3466; https://doi.org/10.3390/ani15233466 - 1 Dec 2025
Viewed by 486
Abstract
Duolang sheep, a meat–fat dual-purpose breed indigenous to Xinjiang, China, has been cultivated traditionally by the local Uyghur people for its prolificacy and precocious sexual maturity, while little research on the population structure and trait inheritance characteristics of Duolang sheep is available. This [...] Read more.
Duolang sheep, a meat–fat dual-purpose breed indigenous to Xinjiang, China, has been cultivated traditionally by the local Uyghur people for its prolificacy and precocious sexual maturity, while little research on the population structure and trait inheritance characteristics of Duolang sheep is available. This study employed whole-genome resequencing data from a cohort of 60 Duolang sheep to dissect their genetic population structure and genes related to reproductive traits. A total of 1565 Gb of high-quality data with an average depth of 14.06× was generated. After SNP calling and quality control, 31,300,060 SNPs were identified. Following linkage disequilibrium (LD)-based pruning, a total of 4,479,177 high-quality SNPs were retained for subsequent analyses. Based on these SNPs, the internal genetic structure of the Duolang sheep population was elucidated, with 14 kinship outliers detected through principal component analysis (PCA). Furthermore, LD decay analysis revealed that the r2 declined below 0.1 at approximately 10 kb, indicating a relatively low level of selection pressure in the population. Within the population, Tajima’s D and iHS methods detected 517,218 and 82,534 candidate SNPs under selection, respectively, with 24,453 SNPs overlapping between the two methods. By splitting Duolang sheep into single-lamb (n = 29) and multiple-lamb (n = 12) subgroups according to litter size, 267,654 SNPs were identified by XP-CLR, while 184,179 SNPs suffering from selection were detected by FST and 62,150 by XP-EHH. Functional enrichment analysis of selected genes reveals the selection directions (domestication, growth, and reproduction) and related candidate genes in the Duolang sheep population, including ESRRA, ESRRB, OXT, FSHR, ESR2, GNRHR, and BMPR1B. This study provides the first comprehensive genomic landscape of Duolang sheep, elucidating genetic signatures of its adaptive traits. Full article
(This article belongs to the Section Animal Genetics and Genomics)
Show Figures

Figure 1

17 pages, 580 KB  
Article
Association of BMAL1 and CLOCK Gene Polymorphisms with Preeclampsia Risk with Subtype Analysis
by Fan Xia, Peiwen Wang, Ziye Li, Jiehua Wei, Jianhui Wei, Yuhang Wu, Chu Liu, Shanyu Lin, Suyan Guo, Linbin He, Mengshi Chen, Lizhang Chen and Tingting Wang
Int. J. Mol. Sci. 2025, 26(21), 10797; https://doi.org/10.3390/ijms262110797 - 6 Nov 2025
Cited by 1 | Viewed by 761
Abstract
Preeclampsia (PE), a major cause of maternal and perinatal morbidity, is a hypertensive pregnancy disorder with poorly defined pathogenesis. While dysregulation of core circadian genes including brain and muscle ARNT-like 1 (BMAL1; also termed ARNTL) and circadian locomotor output cycles [...] Read more.
Preeclampsia (PE), a major cause of maternal and perinatal morbidity, is a hypertensive pregnancy disorder with poorly defined pathogenesis. While dysregulation of core circadian genes including brain and muscle ARNT-like 1 (BMAL1; also termed ARNTL) and circadian locomotor output cycles kaput (CLOCK) has been implicated in PE, the contribution of their genetic polymorphisms to PE remains unclear. In this case–control study, polymorphisms in BMAL1 and CLOCK were genotyped using MassARRAY in 202 PE patients (97 early-onset [eoPE], 105 late-onset [loPE]) and 400 controls. Following genotyping and linkage disequilibrium-pruning (r2 > 0.8) to retain representative tag SNPs, the final set for association analysis comprised three non-redundant BMAL1 SNPs (rs4757144, rs11022780, rs969485) and one CLOCK SNP (rs1048004). After confounder adjustment, no significant associations were detected for CLOCK variants, whereas the BMAL1 rs11022780 variant demonstrated a significant protective effect against PE (TT vs. CC: OR = 0.26 [95% CI 0.09–0.78]; recessive model: OR = 0.25 [95% CI 0.09–0.74]), particularly in the eoPE subgroup. Expression quantitative trait locus (eQTL) analysis confirmed that this SNP correlated with BMAL1 mRNA expression in whole blood, and protein–protein interaction analysis highlighted BMAL1′s central role in circadian networks, implying a genetically influenced regulatory mechanism of PE through BMAL1 expression. Full article
(This article belongs to the Special Issue Molecular Research on Reproductive Physiology and Endocrinology)
Show Figures

Figure 1

15 pages, 2928 KB  
Article
Genome-Wide Genetic Diversity and Population Structure of Sillago sinica (Perciformes, Sillaginidae) from the Coastal Waters of China: Implications for Phylogeographic Pattern and Fishery Management
by Tianyan Yang, Yan Sun and Peiyi Xiao
Biology 2025, 14(10), 1329; https://doi.org/10.3390/biology14101329 - 26 Sep 2025
Viewed by 708
Abstract
The ability to detect population structure and determine the extent of genetic variation among populations is critical for understanding genetic background and effective fishery management. Fifty-eight individuals of S. sinica were resequenced with an average depth of 24× based on the Illumina sequencing [...] Read more.
The ability to detect population structure and determine the extent of genetic variation among populations is critical for understanding genetic background and effective fishery management. Fifty-eight individuals of S. sinica were resequenced with an average depth of 24× based on the Illumina sequencing platform. A total of 7,409,691 high-quality single nucleotide polymorphisms (SNPs) and 327,698 linkage disequilibrium-pruned SNPs were detected by comparing with the reference genome, and the average nucleotide diversity (π) and polymorphism information content (PIC) for all SNPs were 0.0036 ± 0.0023 and 0.2358 ± 0.1013, respectively, indicating the relatively low level of genetic diversity caused by limited gene flow and small effective population size (Ne). Integrated analyses of principal component analysis (PCA), ADMIXTURE, fixation index (Fst), and cladogram showed a significant genetic divergence between the north group (Dongying and Rushan populations) and the south group (Wenzhou and Zhoushan populations), which might be related to the differences in natural and geographical environments. The comprehensive results confirmed the genetic heterogeneity of S. sinica populations from the northern and southern sea areas of China, and suggested that regionalization fishery management should be adopted for further resource protection and utilization of S. sinica. Full article
(This article belongs to the Special Issue Genetic Variability within and between Populations)
Show Figures

Figure 1

17 pages, 1743 KB  
Article
Prioritized SNP Selection from Whole-Genome Sequencing Improves Genomic Prediction Accuracy in Sturgeons Using Linear and Machine Learning Models
by Hailiang Song, Wei Wang, Tian Dong, Xiaoyu Yan, Chenfan Geng, Song Bai and Hongxia Hu
Int. J. Mol. Sci. 2025, 26(14), 7007; https://doi.org/10.3390/ijms26147007 - 21 Jul 2025
Cited by 2 | Viewed by 1683
Abstract
Genomic prediction has emerged as a powerful tool in aquaculture breeding, but its effectiveness depends on the careful selection of informative single nucleotide polymorphisms (SNPs) and the application of appropriate prediction models. This study aimed to enhance genomic prediction accuracy in Russian sturgeon [...] Read more.
Genomic prediction has emerged as a powerful tool in aquaculture breeding, but its effectiveness depends on the careful selection of informative single nucleotide polymorphisms (SNPs) and the application of appropriate prediction models. This study aimed to enhance genomic prediction accuracy in Russian sturgeon (Acipenser gueldenstaedtii) by optimizing SNP selection strategies and exploring the performance of linear and machine learning models. Three economically important traits—caviar yield, caviar color, and body weight—were selected due to their direct relevance to breeding goals and market value. Whole-genome sequencing (WGS) data were obtained from 971 individuals with an average sequencing depth of 13.52×. To reduce marker density and eliminate redundancy, three SNP selection strategies were applied: (1) genome-wide association study (GWAS)-based prioritization to select trait-associated SNPs; (2) linkage disequilibrium (LD) pruning to retain independent markers; and (3) random sampling as a control. Genomic prediction was conducted using both linear (e.g., GBLUP) and machine learning models (e.g., random forest) across varying SNP densities (1 K to 50 K). Results showed that GWAS-based SNP selection consistently outperformed other strategies, especially at moderate densities (≥10 K), improving prediction accuracy by up to 3.4% compared to the full WGS dataset. LD-based selection at higher densities (30 K and 50 K) achieved comparable performance to full WGS. Notably, machine learning models, particularly random forest, exceeded the performance of linear models, yielding an additional 2.0% increase in accuracy when combined with GWAS-selected SNPs. In conclusion, integrating WGS data with GWAS-informed SNP selection and advanced machine learning models offers a promising framework for improving genomic prediction in sturgeon and holds promise for broader applications in aquaculture breeding programs. Full article
Show Figures

Figure 1

16 pages, 1589 KB  
Article
GWAS Enhances Genomic Prediction Accuracy of Caviar Yield, Caviar Color and Body Weight Traits in Sturgeons Using Whole-Genome Sequencing Data
by Hailiang Song, Tian Dong, Wei Wang, Xiaoyu Yan, Chenfan Geng, Song Bai and Hongxia Hu
Int. J. Mol. Sci. 2024, 25(17), 9756; https://doi.org/10.3390/ijms25179756 - 9 Sep 2024
Cited by 5 | Viewed by 2719
Abstract
Caviar yield, caviar color, and body weight are crucial economic traits in sturgeon breeding. Understanding the molecular mechanisms behind these traits is essential for their genetic improvement. In this study, we performed whole-genome sequencing on 673 Russian sturgeons, renowned for their high-quality caviar. [...] Read more.
Caviar yield, caviar color, and body weight are crucial economic traits in sturgeon breeding. Understanding the molecular mechanisms behind these traits is essential for their genetic improvement. In this study, we performed whole-genome sequencing on 673 Russian sturgeons, renowned for their high-quality caviar. With an average sequencing depth of 13.69×, we obtained approximately 10.41 million high-quality single nucleotide polymorphisms (SNPs). Using a genome-wide association study (GWAS) with a single-marker regression model, we identified SNPs and genes associated with these traits. Our findings revealed several candidate genes for each trait: caviar yield: TFAP2A, RPS6KA3, CRB3, TUBB, H2AFX, morc3, BAG1, RANBP2, PLA2G1B, and NYAP1; caviar color: NFX1, OTULIN, SRFBP1, PLEK, INHBA, and NARS; body weight: ACVR1, HTR4, fmnl2, INSIG2, GPD2, ACVR1C, TANC1, KCNH7, SLC16A13, XKR4, GALR2, RPL39, ACVR2A, ADCY10, and ZEB2. Additionally, using the genomic feature BLUP (GFBLUP) method, which combines linkage disequilibrium (LD) pruning markers with GWAS prior information, we improved genomic prediction accuracy by 2%, 1.9%, and 3.1% for caviar yield, caviar color, and body weight traits, respectively, compared to the GBLUP method. In conclusion, this study enhances our understanding of the genetic mechanisms underlying caviar yield, caviar color, and body weight traits in sturgeons, providing opportunities for genetic improvement of these traits through genomic selection. Full article
Show Figures

Figure 1

11 pages, 1131 KB  
Article
A Diverging Species within the Stewartia gemmata (Theaceae) Complex Revealed by RAD-Seq Data
by Hanyang Lin, Wenhao Li and Yunpeng Zhao
Plants 2024, 13(10), 1296; https://doi.org/10.3390/plants13101296 - 8 May 2024
Cited by 1 | Viewed by 1812
Abstract
Informed species delimitation is crucial in diverse biological fields; however, it can be problematic for species complexes. Showing a peripatric distribution pattern, Stewartia gemmata and S. acutisepala (the S. gemmata complex) provide us with an opportunity to study species boundaries among taxa undergoing nascent [...] Read more.
Informed species delimitation is crucial in diverse biological fields; however, it can be problematic for species complexes. Showing a peripatric distribution pattern, Stewartia gemmata and S. acutisepala (the S. gemmata complex) provide us with an opportunity to study species boundaries among taxa undergoing nascent speciation. Here, we generated genomic data from representative individuals across the natural distribution ranges of the S. gemmata complex using restriction site-associated DNA sequencing (RAD-seq). Based on the DNA sequence of assembled loci containing 41,436 single-nucleotide polymorphisms (SNPs) and invariant sites, the phylogenetic analysis suggested strong monophyly of both the S. gemmata complex and S. acutisepala, and the latter was nested within the former. Among S. gemmata individuals, the one sampled from Mt. Tianmu (Zhejiang) showed the closest evolutionary affinity with S. acutisepala (which is endemic to southern Zhejiang). Estimated from 2996 high-quality SNPs, the genetic divergence between S. gemmata and S. acutisepala was relatively low (an Fst of 0.073 on a per-site basis). Nevertheless, we observed a proportion of genomic regions showing relatively high genetic differentiation on a windowed basis. Up to 1037 genomic bins showed an Fst value greater than 0.25, accounting for 8.31% of the total. After SNPs subject to linkage disequilibrium were pruned, the principal component analysis (PCA) showed that S. acutisepala diverged from S. gemmata along the first and the second PCs to some extent. By applying phylogenomic analysis, the present study determines that S. acutisepala is a variety of S. gemmata and is diverging from S. gemmata, providing empirical insights into the nascent speciation within a species complex. Full article
(This article belongs to the Special Issue Plant Molecular Phylogenetics and Evolutionary Genomics III)
Show Figures

Figure 1

12 pages, 1447 KB  
Article
Genetic Diversity, Linkage Disequilibrium, and Population Structure in a Common Bean Reference Collection
by Daniel Ambachew, Jorge Mario Londoño, Nohra Rodriguez Castillo, Asrat Asfaw and Matthew Wohlgemuth Blair
Agronomy 2024, 14(5), 985; https://doi.org/10.3390/agronomy14050985 - 8 May 2024
Cited by 6 | Viewed by 2747
Abstract
An in-depth understanding of the extent and pattern of genetic diversity and population structure in crop populations is of paramount importance for any crop improvement program to efficiently promote the translation of genetic diversity into genetic gain. A reference collection of 150 common [...] Read more.
An in-depth understanding of the extent and pattern of genetic diversity and population structure in crop populations is of paramount importance for any crop improvement program to efficiently promote the translation of genetic diversity into genetic gain. A reference collection of 150 common bean genotypes selected from the International Center for Tropical Agriculture’s global core collection was evaluated using single-nucleotide polymorphism (SNP) markers to quantify the amount of genetic diversity, linkage disequilibrium, and population structure. The cultivars and landraces of the collection were diverse and originated from 14 countries, and wild accessions were used as controls for each gene pool. The collection was genotyped using an SNP array, generating a total of 5398 locus calls distributed across the entire bean genome. The SNP data quality was checked, and two datasets were generated. The first dataset (Dataset_1) comprised a set of 5108 SNPs and 150 genotypes after filtering for 10% missing alleles and an MAF < 0.05. The second dataset (Dataset_2) comprised a set of 2300 SNPs that remained after removing any null-allele SNPs and LD pruning for a criterion of r2 < 0.2. Dataset_1 was used for a principal coordinate analysis (PCoA), phylogenetic relationship determination, an analysis of molecular variance (AMOVA), and a discriminant analysis of principal components. Dataset_2 was used for a population structure analysis using STRUCTURE software and is proposed for a genome-wide association study (GWAS). The population structure analysis split the reference collection into two subpopulations according to an Andean or Mesoamerican gene pool. The Mesoamerican populations displayed higher genetic differentiation and tended to split into more groups that were somewhat aligned with common bean races. Andean beans were characterized by a larger average LD but lower LD percentage, a small average genetic distance between members of the population, and a higher major allele frequency, which suggested narrower genetic diversity compared to the Mesoamerican gene pool. In conclusion, the results indicated the presence of high genetic diversity, which is useful for a GWAS. However, the presence of significant linkage disequilibrium requires that genetic distance be considered as a co-factor for any further genetic studies. Overall, the molecular variation observed in the genotypes shows that this reference collection is valuable as a genebank-derived diversity panel which is useful for marker trait association studies. Full article
(This article belongs to the Special Issue Marker Assisted Selection and Molecular Breeding in Major Crops)
Show Figures

Figure 1

17 pages, 2819 KB  
Article
Selective Genotyping and Phenotyping for Optimization of Genomic Prediction Models for Populations with Different Diversity
by Marina Ćeran, Vuk Đorđević, Jegor Miladinović, Marjana Vasiljević, Vojin Đukić, Predrag Ranđelović and Simona Jaćimović
Plants 2024, 13(7), 975; https://doi.org/10.3390/plants13070975 - 28 Mar 2024
Cited by 1 | Viewed by 2316
Abstract
To overcome the different challenges to food security caused by a growing population and climate change, soybean (Glycine max (L.) Merr.) breeders are creating novel cultivars that have the potential to improve productivity while maintaining environmental sustainability. Genomic selection (GS) is an [...] Read more.
To overcome the different challenges to food security caused by a growing population and climate change, soybean (Glycine max (L.) Merr.) breeders are creating novel cultivars that have the potential to improve productivity while maintaining environmental sustainability. Genomic selection (GS) is an advanced approach that may accelerate the rate of genetic gain in breeding using genome-wide molecular markers. The accuracy of genomic selection can be affected by trait architecture and heritability, marker density, linkage disequilibrium, statistical models, and training set. The selection of a minimal and optimal marker set with high prediction accuracy can lower genotyping costs, computational time, and multicollinearity. Selective phenotyping could reduce the number of genotypes tested in the field while preserving the genetic diversity of the initial population. This study aimed to evaluate different methods of selective genotyping and phenotyping on the accuracy of genomic prediction for soybean yield. The evaluation was performed on three populations: recombinant inbred lines, multifamily diverse lines, and germplasm collection. Strategies adopted for marker selection were as follows: SNP (single nucleotide polymorphism) pruning, estimation of marker effects, randomly selected markers, and genome-wide association study. Reduction of the number of genotypes was performed by selecting a core set from the initial population based on marker data, yet maintaining the original population’s genetic diversity. Prediction ability using all markers and genotypes was different among examined populations. The subsets obtained by the model-based strategy can be considered the most suitable for marker selection for all populations. The selective phenotyping based on makers in all cases had higher values of prediction ability compared to minimal values of prediction ability of multiple cycles of random selection, with the highest values of prediction obtained using AN approach and 75% population size. The obtained results indicate that selective genotyping and phenotyping hold great potential and can be integrated as tools for improving or retaining selection accuracy by reducing genotyping or phenotyping costs for genomic selection. Full article
Show Figures

Figure 1

22 pages, 2600 KB  
Article
Genome-Wide SNP and Indel Discovery in Abaca (Musa textilis Née) and among Other Musa spp. for Abaca Genetic Resources Management
by Cris Francis C. Barbosa, Jayson C. Asunto, Rhosener Bhea L. Koh, Daisy May C. Santos, Dapeng Zhang, Ernelea P. Cao and Leny C. Galvez
Curr. Issues Mol. Biol. 2023, 45(7), 5776-5797; https://doi.org/10.3390/cimb45070365 - 12 Jul 2023
Cited by 7 | Viewed by 4597
Abstract
Abaca (Musa textilis Née) is an economically important fiber crop in the Philippines. Its economic potential, however, is hampered by biotic and abiotic stresses, which are exacerbated by insufficient genomic resources for varietal identification vital for crop improvement. To address these gaps, [...] Read more.
Abaca (Musa textilis Née) is an economically important fiber crop in the Philippines. Its economic potential, however, is hampered by biotic and abiotic stresses, which are exacerbated by insufficient genomic resources for varietal identification vital for crop improvement. To address these gaps, this study aimed to discover genome-wide polymorphisms among abaca cultivars and other Musa species and analyze their potential as genetic marker resources. This was achieved through whole-genome Illumina resequencing of abaca cultivars and variant calling using BCFtools, followed by genetic diversity and phylogenetic analyses. A total of 20,590,381 high-quality single-nucleotide polymorphisms (SNP) and DNA insertions/deletions (InDels) were mined across 16 abaca cultivars. Filtering based on linkage disequilibrium (LD) yielded 130,768 SNPs and 13,620 InDels, accounting for 0.396 ± 0.106 and 0.431 ± 0.111 of gene diversity across these cultivars. LD-pruned polymorphisms across abaca, M. troglodytarum, M. acuminata and M. balbisiana enabled genetic differentiation within abaca and across the four Musa spp. Phylogenetic analysis revealed the registered varieties Abuab and Inosa to accumulate a significant number of mutations, eliciting further studies linking mutations to their advantageous phenotypes. Overall, this study pioneered in producing marker resources in abaca based on genome-wide polymorphisms vital for varietal authentication and comparative genotyping with the more studied Musa spp. Full article
(This article belongs to the Special Issue Molecular Breeding and Genetics Research in Plants)
Show Figures

Figure 1

15 pages, 2133 KB  
Article
Establishing a Prediction Model for the Efficacy of Platinum—Based Chemotherapy in NSCLC Based on a Two Cohorts GWAS Study
by Qi Xiao, Chenxue Mao, Ying Gao, Hanxue Huang, Bing Yu, Lulu Yu, Xi Li, Xiaoyuan Mao, Wei Zhang, Jiye Yin and Zhaoqian Liu
J. Clin. Med. 2023, 12(4), 1318; https://doi.org/10.3390/jcm12041318 - 7 Feb 2023
Cited by 1 | Viewed by 2729
Abstract
Platinum drugs combined with other agents have been the first-line treatment for non-small cell lung cancer (NSCLC) in the past decades. To better evaluate the efficacy of platinum–based chemotherapy in NSCLC, we establish a platinum chemotherapy response prediction model. Here, a total of [...] Read more.
Platinum drugs combined with other agents have been the first-line treatment for non-small cell lung cancer (NSCLC) in the past decades. To better evaluate the efficacy of platinum–based chemotherapy in NSCLC, we establish a platinum chemotherapy response prediction model. Here, a total of 217 samples from Xiangya Hospital of Central South University were selected as the discovery cohort for a genome-wide association analysis (GWAS) to select SNPs. Another 216 samples were genotyped as a validation cohort. In the discovery cohort, using linkage disequilibrium (LD) pruning, we extract a subset that does not contain correlated SNPs. The SNPs with p < 10−3 and p < 10−4 are selected for modeling. Subsequently, we validate our model in the validation cohort. Finally, clinical factors are incorporated into the model. The final model includes four SNPs (rs7463048, rs17176196, rs527646, and rs11134542) as well as two clinical factors that contributed to the efficacy of platinum chemotherapy in NSCLC, with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.726. Full article
Show Figures

Graphical abstract

10 pages, 2607 KB  
Article
Screening Discriminating SNPs for Chinese Indigenous Pig Breeds Identification Using a Random Forests Algorithm
by Jun Gao, Lingwei Sun, Shushan Zhang, Jiehuan Xu, Mengqian He, Defu Zhang, Caifeng Wu and Jianjun Dai
Genes 2022, 13(12), 2207; https://doi.org/10.3390/genes13122207 - 25 Nov 2022
Cited by 13 | Viewed by 2659
Abstract
Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese [...] Read more.
Chinese indigenous pig breeds have unique genetic characteristics and a rich diversity; however, effective breed identification methods have not yet been well established. In this study, a genotype file of 62,822 single-nucleotide polymorphisms (SNPs), which were obtained from 1059 individuals of 18 Chinese indigenous pig breeds and 5 cosmopolitan breeds, were used to screen the discriminating SNPs for pig breed identification. After linkage disequilibrium (LD) pruning filtering, this study excluded 396 SNPs on non-constant chromosomes and retained 20.92~−27.84% of SNPs for each of the 18 autosomes, leaving a total of 14,823 SNPs. The principal component analysis (PCA) showed the largest differences between cosmopolitan and Chinese pig breeds (PC1 = 10.452%), while relatively small differences were found among the 18 indigenous pig breeds from the Yangtze River Delta region of China. Next, a random forest (RF) algorithm was used to filter these SNPs and obtain the optimal number of decision trees (ntree = 1000) using corresponding out-of-bag (OOB) error rates. By comparing two different SNP ranking methods in the RF analysis, the mean decreasing accuracy (MDA) and mean decreasing Gini index (MDG), the effects of panels with different numbers of SNPs on the assignment accuracy, and the statistics of SNP distribution on each chromosome in the panels, a panel of 1000 of the most breed-discriminative tagged SNPs were finally selected based on the MDA screening method. A high accuracy (>99.3%) was obtained by the breed prediction of 318 samples in the RF test set; thus, a machine learning classification method was established for the multi-breed identification of Chinese indigenous pigs based on a low-density panel of SNPs. Full article
(This article belongs to the Special Issue Pig Genomics, Quantitative Traits and Breeding)
Show Figures

Figure 1

18 pages, 3691 KB  
Article
Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs
by Dongwon Seo, Sunghyun Cho, Prabuddha Manjula, Nuri Choi, Young-Kuk Kim, Yeong Jun Koh, Seung Hwan Lee, Hyung-Yong Kim and Jun Heon Lee
Animals 2021, 11(1), 241; https://doi.org/10.3390/ani11010241 - 19 Jan 2021
Cited by 26 | Viewed by 5481
Abstract
A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this [...] Read more.
A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers. Full article
(This article belongs to the Section Animal Genetics and Genomics)
Show Figures

Figure 1

12 pages, 2562 KB  
Article
A Bioinformatics Pipeline to Identify a Subset of SNPs for Genomics-Assisted Potato Breeding
by Catja Selga, Alexander Koc, Aakash Chawade and Rodomiro Ortiz
Plants 2021, 10(1), 30; https://doi.org/10.3390/plants10010030 - 24 Dec 2020
Cited by 16 | Viewed by 4608
Abstract
Modern potato breeding methods following a genomic-led approach provide means for shortening breeding cycles and increasing breeding efficiency across selection cycles. Acquiring genetic data for large breeding populations remains expensive. We present a pipeline to reduce the number of single nucleotide polymorphisms (SNPs) [...] Read more.
Modern potato breeding methods following a genomic-led approach provide means for shortening breeding cycles and increasing breeding efficiency across selection cycles. Acquiring genetic data for large breeding populations remains expensive. We present a pipeline to reduce the number of single nucleotide polymorphisms (SNPs) to lower the cost of genotyping. First, we reduced the number of individuals to be genotyped with a high-throughput method according to the multi-trait variation as defined by principal component analysis of phenotypic characteristics. Next, we reduced the number of SNPs by pruning for linkage disequilibrium. By adjusting the square of the correlation coefficient between two adjacent loci, we obtained reduced subsets of SNPs. We subsequently tested these SNP subsets by two methods; (1) a genome-wide association study (GWAS) for marker identification, and (2) genomic selection (GS) to predict genomic estimated breeding values. The results indicate that both GWAS and GS can be done without loss of information after SNP reduction. The pipeline allows for creating custom SNP subsets to cover all variation found in any particular breeding population. Low-throughput genotyping will reduce the genotyping cost associated with large populations, thereby making genomic breeding methods applicable to large potato breeding populations by reducing genotyping costs. Full article
(This article belongs to the Special Issue Plant Genetic Resources and Breeding of Clonally Propagated Crops)
Show Figures

Figure 1

14 pages, 1988 KB  
Article
Genotyping-by-Sequencing Reveals Molecular Genetic Diversity in Italian Common Bean Landraces
by Lucia Lioi, Diana L. Zuluaga, Stefano Pavan and Gabriella Sonnante
Diversity 2019, 11(9), 154; https://doi.org/10.3390/d11090154 - 3 Sep 2019
Cited by 17 | Viewed by 4830
Abstract
The common bean (Phaseolus vulgaris L.) is one of the main legumes worldwide and represents a valuable source of nutrients. Independent domestication events in the Americas led to the formation of two cultivated genepools, namely Mesoamerican and Andean, to which European material [...] Read more.
The common bean (Phaseolus vulgaris L.) is one of the main legumes worldwide and represents a valuable source of nutrients. Independent domestication events in the Americas led to the formation of two cultivated genepools, namely Mesoamerican and Andean, to which European material has been brought back. In this study, Italian common bean landraces were analyzed for their genetic diversity and structure, using single nucleotide polymorphism (SNP) markers derived from genotyping-by-sequencing (GBS) technology. After filtering, 11,866 SNPs were obtained and 798 markers, pruned for linkage disequilibrium, were used for structure analysis. The most probable number of subpopulations (K) was two, consistent with the presence of the two genepools, identified through the phaseolin diagnostic marker. Some landraces were admixed, suggesting probable hybridization events between Mesoamerican and Andean material. When increasing the number of possible Ks, the Andean germplasm appeared to be structured in two or three subgroups. The subdivision within the Andean material was also observed in a principal coordinate analysis (PCoA) plot and a dendrogram based on genetic distances. The Mesoamerican landraces showed a higher level of genetic diversity compared to the Andean landraces. Calculation of the fixation index (FST) at individual SNPs between the Mesoamerican and Andean genepools and within the Andean genepool evidenced clusters of highly divergent loci in specific chromosomal regions. This work may help to preserve landraces of the common bean from genetic erosion, and could represent a starting point for the identification of interesting traits that determine plant adaptation. Full article
(This article belongs to the Section Plant Diversity)
Show Figures

Figure 1

11 pages, 2966 KB  
Article
Genetic Diversity of Field Pennycress (Thlaspi arvense) Reveals Untapped Variability and Paths Toward Selection for Domestication
by Katherine Frels, Ratan Chopra, Kevin M. Dorn, Donald L. Wyse, M. David Marks and James A. Anderson
Agronomy 2019, 9(6), 302; https://doi.org/10.3390/agronomy9060302 - 11 Jun 2019
Cited by 22 | Viewed by 7548
Abstract
Evaluation of genetic diversity within wild populations is an essential process for improvement and domestication of new crop species. This process involves evaluation of population structure and individual accessions based on genetic markers, growth habits, and geographic collection area. In this study, accessions [...] Read more.
Evaluation of genetic diversity within wild populations is an essential process for improvement and domestication of new crop species. This process involves evaluation of population structure and individual accessions based on genetic markers, growth habits, and geographic collection area. In this study, accessions of field pennycress were analyzed to identify population structure and variation in germplasm available for breeding. A total of 9157 genome-wide single nucleotide polymorphisms (SNPs) were identified among the 121 accessions analyzed, and linkage disequilibrium based pruning resulted in 3497 SNPs. Bayesian cluster analysis was implemented in STRUCTURE v2.3.4 to identify four population groups. These groups were confirmed based on principal components analysis and geographic origins. Pairwise diversity among accessions was evaluated and revealed considerable genetic variation. Notably, a subset of accessions from Armenia with exceptional genetic variation was identified. This survey is the first to report significant genetic diversity among pennycress accessions and explain some of the phenotypic differences previously observed in the germplasm. Understanding variation in pennycress accessions will be a crucial step for selection, breeding, and domestication of a new cash cover crop for cold climates. Full article
(This article belongs to the Special Issue Crop Domestication and Evolution)
Show Figures

Figure 1

Back to TopTop