Next Article in Journal
Accumulation of C-CTX1 in Muscle Tissue of Goldfish (Carassius auratus) by Dietary Experience
Previous Article in Journal
More Attention than Usual”: A Thematic Analysis of Dog Ownership Experiences in the UK during the First COVID-19 Lockdown
Open AccessArticle

Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

1
Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea
2
Bio-AI Convergence Research Center, Chungnam National University, Daejeon 34134, Korea
3
SELS Center, Division of Biotechnology, Advanced Institute of Environment and Bioscience, Chonbuk National University, Iksan 54596, Korea
4
Department of Computer Science and Engineering, Chungnam National University, Daejeon 34134, Korea
5
Insilicogen Inc., Yongin 16954, Korea
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Animals 2021, 11(1), 241; https://doi.org/10.3390/ani11010241
Received: 21 December 2020 / Revised: 13 January 2021 / Accepted: 15 January 2021 / Published: 19 January 2021
(This article belongs to the Section Animal Genetics and Genomics)
Classifying a target population at the genetic level can provide important information for the preservation and commercial use of a breed. In this study, the minimum number of markers was used in combination, to distinguish target populations based on high-density single nucleotide polymorphism (SNP) array data. Subsequently, a genome-wide association study for filtering target-population-specific SNPs between the case and control groups and principal component analysis with machine learning algorithms could be used to explore various combinations with the minimum number of markers. In addition, the optimal combination of SNP markers was able to produce stable results for the target population in verification studies, in which samples were analyzed.
A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers. View Full-Text
Keywords: single nucleotide polymorphism (SNP); principal component analysis (PCA); genome-wide association study (GWAS); linkage disequilibrium (LD); machine learning single nucleotide polymorphism (SNP); principal component analysis (PCA); genome-wide association study (GWAS); linkage disequilibrium (LD); machine learning
Show Figures

Figure 1

MDPI and ACS Style

Seo, D.; Cho, S.; Manjula, P.; Choi, N.; Kim, Y.-K.; Koh, Y.J.; Lee, S.H.; Kim, H.-Y.; Lee, J.H. Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs. Animals 2021, 11, 241. https://doi.org/10.3390/ani11010241

AMA Style

Seo D, Cho S, Manjula P, Choi N, Kim Y-K, Koh YJ, Lee SH, Kim H-Y, Lee JH. Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs. Animals. 2021; 11(1):241. https://doi.org/10.3390/ani11010241

Chicago/Turabian Style

Seo, Dongwon; Cho, Sunghyun; Manjula, Prabuddha; Choi, Nuri; Kim, Young-Kuk; Koh, Yeong J.; Lee, Seung H.; Kim, Hyung-Yong; Lee, Jun H. 2021. "Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs" Animals 11, no. 1: 241. https://doi.org/10.3390/ani11010241

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop