High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Microsatellite Marker Genotype Dataset
2.2. Baseline Model: RF Model
2.3. Optimized Model: AutoGluon Approach
2.4. Optimized Model: AutoGluon
3. Results
3.1. Performance Evaluation of the Hyperparameter-Tuned Random Forest Model
3.2. Cross-Validation Results
3.3. Loci Importance in Prediction Process
3.4. Using AutoGluon to Optimize Predictive Models in Chicken Breed Classification
4. Discussion
4.1. Performance Metrics, Impurity Criteria, and Marker Optimization for RF Models
4.2. Model Performance, Computational Considerations, and Locus Importance
4.3. Computational Trade-Offs in Model Selection
4.4. Contributions to the Identification and Conservation of Breeds
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Weigend, S.; Romanov, M.N.; Rath, D. Methodologies to Identify, Evaluate and Conserve Poultry Genetic Resources. In Proceedings of the XXII World’s Poultry Congress, Istanbul, Turkey, 8–13 June 2004; World’s Poultry Science Association (WPSA)—Turkish Branch: Istanbul, Turkey, 2004; p. 84. [Google Scholar]
- Gjedrem, T.; Baranski, M. Selective Breeding in Aquaculture: An Introduction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010; Volume 10. [Google Scholar]
- Felius, M.; Theunissen, B.; Lenstra, J. Conservation of cattle genetic resources: The role of breeds. J. Agric. Sci. 2015, 153, 152–162. [Google Scholar] [CrossRef]
- Dalvit, C.; De Marchi, M.; Dal Zotto, R.; Gervaso, M.; Meuwissen, T.; Cassandro, M. Breed assignment test in four Italian beef cattle breeds. Meat Sci. 2008, 80, 389–395. [Google Scholar] [CrossRef] [PubMed]
- Felix, G.A.; Soares Fioravanti, M.C.; Cassandro, M.; Tormen, N.; Quadros, J.; Soares Juliano, R.; Alves do Egito, A.; de Moura, M.I.; Piovezan, U. Bovine breeds identification by trichological analysis. Animals 2019, 9, 761. [Google Scholar] [CrossRef]
- Peng, W.; Yang, H.; Cai, K.; Zhou, L.; Tan, Z.; Wu, K. Molecular identification of the Danzhou chicken breed in China using DNA barcoding. Mitochondrial DNA Part B 2019, 4, 2459–2463. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, P.; Mustafi, S.; Mukherjee, K.; Dan, S.; Roy, K.; Mandal, S.N.; Banik, S. Image-based identification of animal breeds using deep learning. In Deep Learning for Unmanned Systems; Springer: Berlin/Heidelberg, Germany, 2021; pp. 415–445. [Google Scholar]
- Addisu, H.; Hailu, M.; Zewdu, W. Indigenous chicken production system and breeding practice in North Wollo, Amhara Region, Ethiopia. Poult. Fish. Wildl. Sci. 2013, 1, 108. [Google Scholar]
- Desta, T.T.; Wakeyo, O. Breeding practice of indigenous village chickens, and traits and breed preferences of smallholder farmers. Vet. Med. Sci. 2024, 10, e1517. [Google Scholar] [CrossRef]
- Tanglertpaibul, N.; Budi, T.; Nguyen, C.P.T.; Singchat, W.; Wongloet, W.; Kumnan, N.; Chalermwong, P.; Luu, A.H.; Noito, K.; Panthum, T. Samae Dam chicken: A variety of the Pradu Hang Dam breed revealed from microsatellite genotyping data. Anim. Biosci. 2024, 37, 2033. [Google Scholar] [CrossRef]
- Vieira, M.L.C.; Santini, L.; Diniz, A.L.; Munhoz, C.d.F. Microsatellite markers: What they mean and why they are so useful. Genet. Mol. Biol. 2016, 39, 312–328. [Google Scholar] [CrossRef]
- Weising, K.; Winter, P.; Hüttel, B.; Kahl, G. Microsatellite markers for molecular breeding. J. Crop Prod. 1997, 1, 113–143. [Google Scholar] [CrossRef]
- McCouch, S.R.; Chen, X.; Panaud, O.; Temnykh, S.; Xu, Y.; Cho, Y.G.; Huang, N.; Ishii, T.; Blair, M. Microsatellite marker development, mapping and applications in rice genetics and breeding. Plant Mol. Biol. 1997, 35, 89–99. [Google Scholar] [CrossRef]
- Guichoux, E.; Lagache, L.; Wagner, S.; Chaumeil, P.; Léger, P.; Lepais, O.; Lepoittevin, C.; Malausa, T.; Revardel, E.; Salin, F. Current trends in microsatellite genotyping. Mol. Ecol. Resour. 2011, 11, 591–611. [Google Scholar] [CrossRef] [PubMed]
- Balloux, F.; Lugon-Moulin, N. The estimation of population differentiation with microsatellite markers. Mol. Ecol. 2002, 11, 155–165. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.-S.; Chen, C.; Berthouly-Salazar, C.; Chazara, O.; Lee, Y.; Chang, C.; Chang, K.; Bed’Hom, B.; Tixier-Boichard, M. A global analysis of molecular markers and phenotypic traits in local chicken breeds in Taiwan. Anim. Genet. 2012, 43, 172–182. [Google Scholar] [CrossRef] [PubMed]
- Abebe, A.S.; Mikko, S.; Johansson, A.M. Genetic diversity of five local Swedish chicken breeds detected by microsatellite markers. PLoS ONE 2015, 10, e0120580. [Google Scholar] [CrossRef]
- Sartore, S.; Sacchi, P.; Soglia, D.; Maione, S.; Schiavone, A.; De Marco, M.; Ceccobelli, S.; Lasagna, E.; Rasero, R. Genetic variability of two Italian indigenous chicken breeds inferred from microsatellite marker analysis. Br. Poult. Sci. 2016, 57, 435–443. [Google Scholar] [CrossRef]
- Fathi, M.; Al-Homidan, I.; Motawei, M.; Abou-Emera, O.; El-Zarei, M. Evaluation of genetic diversity of Saudi native chicken populations using microsatellite markers. Poult. Sci. 2017, 96, 530–536. [Google Scholar] [CrossRef]
- Wattanadilokcahtkun, P.; Chalermwong, P.; Singchat, W.; Wongloet, W.; Chaiyes, A.; Tanglertpaibul, N.; Budi, T.; Panthum, T.; Ariyaraphong, N.; Ahmad, S.F. Genetic admixture and diversity in Thai domestic chickens revealed through analysis of Lao Pa Koi fighting cocks. PLoS ONE 2023, 18, e0289983. [Google Scholar] [CrossRef]
- Duc, T.L.; Leiva, R.G.; Casari, P.; Östberg, P.-O. Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey. ACM Comput. Surv. (CSUR) 2019, 52, 94. [Google Scholar] [CrossRef]
- Ghahramani, Z. Unsupervised learning. In Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 72–112. [Google Scholar]
- Cord, M.; Cunningham, P. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Rokach, L.; Maimon, O. Top-down induction of decision trees classifiers-a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2005, 35, 476–487. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
- Qi, Y. Random forest for bioinformatics. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 307–323. [Google Scholar]
- Montesinos López, O.A.; Montesinos López, A.; Crossa, J. Random forest for genomic prediction. In Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer: Berlin/Heidelberg, Germany, 2022; pp. 633–681. [Google Scholar]
- Breiman, L. Some Infinity Theory for Predictor Ensembles; Technical Report 579; Statistics Department UCB: Berkeley, CA, USA, 2000. [Google Scholar]
- Breiman, L. Randomizing outputs to increase prediction accuracy. Mach. Learn. 2000, 40, 229–242. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Breiman, L. Consistency for a Simple Model of Random Forests; Technical Report; University of California at Berkeley: Berkeley, CA, USA, 2004; Volume 670. [Google Scholar]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323–329. [Google Scholar] [CrossRef] [PubMed]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
- Rabiei, N.; Soltanian, A.R.; Farhadian, M.; Bahreini, F. the performance evaluation of the random forest algorithm for a gene selection in identifying genes associated with resectable pancreatic cancer in microarray dataset: A retrospective study. Cell J. 2023, 25, 347. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Berlin, Germany, 13–20 July 2012; pp. 154–168. [Google Scholar]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
- Burócziová, M.; Říha, J. Horse breed discrimination using machine learning methods. J. Appl. Genet. 2009, 50, 375–377. [Google Scholar] [CrossRef]
- Hata, A.; Nunome, M.; Suwanasopee, T.; Duengkae, P.; Chaiwatana, S.; Chamchumroon, W.; Suzuki, T.; Koonawootrittriron, S.; Matsuda, Y.; Srikulnath, K. Origin and evolutionary history of domestic chickens inferred from a large population study of Thai red junglefowl and indigenous chickens. Sci. Rep. 2021, 11, 2035. [Google Scholar] [CrossRef]
- Singchat, W.; Chaiyes, A.; Wongloet, W.; Ariyaraphong, N.; Jaisamut, K.; Panthum, T.; Ahmad, S.F.; Chaleekarn, W.; Suksavate, W.; Inpota, M. Red junglefowl resource management guide: Bioresource reintroduction for sustainable food security in Thailand. Sustainability 2022, 14, 7895. [Google Scholar] [CrossRef]
- FAO. Molecular genetic characterization of animal genetic resources. In FAO Animal Production and Health Guidelines; FAO: Rome, Italy, 2011; Volume 9. [Google Scholar]
- Shwartz-Ziv, R.; Goldblum, M.; Li, Y.; Bruss, C.B.; Wilson, A.G. Simplifying neural network training under class imbalance. Adv. Neural Inf. Process. Syst. 2023, 36, 35218–35245. [Google Scholar]
- Ahsan, M.M.; Mahmud, M.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
- Alshaer, H. Studying the Effects of Feature Scaling in Machine Learning. Master’s Thesis, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2021. [Google Scholar]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1145. [Google Scholar]
- Van Rossum, G. Python programming language. In Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, USA, 17–22 June 2007; pp. 1–36. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Kramer, O. Machine Learning for Evolution Strategies; Springer: Berlin/Heidelberg, Germany, 2016; Volume 20. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Hjerpe, A. Computing Random Forests Variable Importance Measures (vim) on Mixed Numerical and Categorical Data. Master’s Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2016. [Google Scholar]
- Xu, Z.; Dan, C.; Khim, J.; Ravikumar, P. Class-weighted classification: Trade-offs and robust approaches. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 10544–10554. [Google Scholar]
- Falkner, S.; Klein, A.; Hutter, F. Practical hyperparameter optimization for deep learning. In Proceedings of the ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B (Methodol.) 1974, 36, 111–133. [Google Scholar] [CrossRef]
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for multi-class classification: An overview. arXiv 2020, arXiv:2008.05756. [Google Scholar] [CrossRef]
- Tangirala, S. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 612–619. [Google Scholar] [CrossRef]
- Murad, M.A.H.; Paul, M.K. A Hybrid Preprocessing Approach for the Classification of Class Imbalanced Data. In Proceedings of the 2023 6th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 7–9 December 2023; pp. 1–6. [Google Scholar]
- Akay, M.F. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 2009, 36, 3240–3247. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Elkan, C.; Naryanaswamy, B. Optimal thresholding of classifiers to maximize F1 measure. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, France, 14–18 September 2014; pp. 225–239. [Google Scholar]
- Bicego, M.; Mensi, A. Null/No Information Rate (NIR): A statistical test to assess if a classification accuracy is significant for a given problem. arXiv 2023, arXiv:2306.06140. [Google Scholar] [CrossRef]
- Agarwal, A.; Kenney, A.M.; Tan, Y.S.; Tang, T.M.; Yu, B. MDI+: A flexible random forest-based feature importance framework. arXiv 2023, arXiv:2307.01932. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8697–8710. [Google Scholar]
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. Autogluon-tabular: Robust and accurate automl for structured data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Li, Z.; Lu, T.; He, X.; Montillet, J.-P.; Tao, R. An improved cyclic multi model-eXtreme gradient boosting (CMM-XGBoost) forecasting algorithm on the GNSS vertical time series. Adv. Space Res. 2023, 71, 912–935. [Google Scholar] [CrossRef]
- Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer Nature: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Putnová, L.; Štohl, R. Comparing assignment-based approaches to breed identification within a large set of horses. J. Appl. Genet. 2019, 60, 187–198. [Google Scholar] [CrossRef]
- Kasarda, R.; Moravčíková, N.; Mészáros, G.; Simčič, M.; Zaborski, D. Classification of cattle breeds based on the random forest approach. Livest. Sci. 2023, 267, 105143. [Google Scholar] [CrossRef]
- Quinteiro, J.; Sotelo, C.G.; Rehbein, H.; Pryde, S.E.; Medina, I.; Pérez-Martín, R.; Rey-Mendez, M.; Mackie, I. Use of mtDNA direct polymerase chain reaction (PCR) sequencing and PCR− restriction fragment length polymorphism methodologies in species identification of canned tuna. J. Agric. Food Chem. 1998, 46, 1662–1669. [Google Scholar] [CrossRef]
- Oravcová, M. Pedigree analysis in White Shorthaired goat: First results. Arch. Anim. Breed. 2013, 56, 547–554. [Google Scholar] [CrossRef]
- Jasielczuk, I.; Gurgul, A.; Szmatoła, T.; Radko, A.; Majewska, A.; Sosin, E.; Litwińczuk, Z.; Rubiś, D.; Ząbek, T. The use of SNP markers for cattle breed identification. J. Appl. Genet. 2024, 65, 575–589. [Google Scholar] [CrossRef]
- Rudenko, O.; Megel, Y.; Bezsonov, O.; Rybalka, A. Cattle breed identification and live weight evaluation on the basis of machine learning and computer vision. CMIS 2020, 2608, 939–954. [Google Scholar] [CrossRef]
- Khan, S.S.; Doohan, N.V.; Gupta, M.; Jaffari, S.; Chourasia, A.; Joshi, K.; Panchal, B. Hybrid deep learning approach for enhanced animal breed classification and prediction. Trait. Du Signal 2023, 40, 2087. [Google Scholar] [CrossRef]
- Ciofi, C.; Funk, S.M.; Coote, T.; Cheesman, D.J.; Hammond, R.L.; Saccheri, I.J.; Bruford, M.W. Genotyping with microsatellite markers. In Molecular Tools for Screening Biodiversity: Plants and Animals; Springer: Berlin/Heidelberg, Germany, 1998; pp. 195–201. [Google Scholar]
- Mihailova, Y.; Rusanov, K.; Rusanova, M.; Vassileva, P.; Atanassov, I.; Nikolov, V.; Todorovska, E.G. Genetic diversity and population structure of Bulgarian autochthonous sheep breeds revealed by microsatellite analysis. Animals 2023, 13, 1878. [Google Scholar] [CrossRef] [PubMed]
- Reif, D.M.; Motsinger, A.A.; McKinney, B.A.; Crowe, J.E.; Moore, J.H. Feature selection using a random forests classifier for the integrated analysis of multiple data types. In Proceedings of the 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, Toronto, ON, Canada, 28–29 September 2006; pp. 1–8. [Google Scholar]
- Holliday, J.A.; Wang, T.; Aitken, S. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest. G3 Genes|Genomes|Genet. 2012, 2, 1085–1093. [Google Scholar] [CrossRef]
- Zhao, J.; Bodner, G.; Rewald, B. Phenotyping: Using machine learning for improved pairwise genotype classification based on root traits. Front. Plant Sci. 2016, 7, 1864. [Google Scholar] [CrossRef]
- Saiprasath, G.; Babu, N.; ArunPriyan, J.; Vinayakumar, R.; Sowmya, V.; Soman, K. Performance comparison of machine learning algorithms for malaria detection using microscopic images. Int. J. Curr. Res. Acad. Rev. 2019, 6, 86–90. [Google Scholar]
- Bezsonov, O.; Lebediev, O.; Lebediev, V.; Megel, Y.; Prochukhan, D.; Rudenko, O. Breed recognition and estimation of live weight of cattle based on methods of machine learning and computer vision. East.-Eur. J. Enterp. Technol. 2021, 6, 114. [Google Scholar]
- Ghosh, P.; Mandal, S.N. PigB: Intelligent pig breeds classification using supervised machine learning algorithms. Int. J. Artif. Intell. Soft Comput. 2022, 7, 242–266. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
- Raileanu, L.E.; Stoffel, K. Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 2004, 41, 77–93. [Google Scholar] [CrossRef]
- Mustafa, O.M.; Ahmed, O.M.; Saeed, V.A. Comparative analysis of decision tree algorithms using gini and entropy criteria on the forest covertypes dataset. In Proceedings of the International Conference on Innovations in Computing Research, Athens, Greece, 12–14 August 2024; pp. 185–193. [Google Scholar]
- Ling, N.E.; Hasan, Y.A. Evaluation Method in Random Forest as Applied to Microarray Data. Malays. J. Math. Sci. 2008, 2, 73–81. [Google Scholar]
- Tran, L.; He, K.; Wang, D.; Jiang, H. A cross-validation statistical framework for asymmetric data integration. Biometrics 2023, 79, 1280–1292. [Google Scholar] [CrossRef] [PubMed]
- Ciss, S. Generalization Error and Out-of-Bag Bounds in Random (Uniform) Forests. Preprint, HAL Open Science, 2015. Available online: https://hal.science/hal-01110524 (accessed on 15 May 2024).
- Breiman, L. Out-of-Bag Estimation. 1996. Available online: https://www.stat.berkeley.edu/~breiman/OOBestimation.pdf (accessed on 15 May 2024).
- Kodovský, J. Ensemble Classification in Steganalysis Cross-Validation and AdaBoost; Tech. Rep. Digit. Data Embed. Lab. (DDE); Binghamton University: Binghamton, NY, USA, 2011. [Google Scholar]
- Janitza, S.; Hornung, R. On the overestimation of random forest’s out-of-bag error. PLoS ONE 2018, 13, e0201904. [Google Scholar] [CrossRef]
- Rasoarahona, R.; Wattanadilokchatkun, P.; Panthum, T.; Thong, T.; Singchat, W.; Ahmad, S.F.; Chaiyes, A.; Han, K.; Kraichak, E.; Muangmai, N. Optimizing microsatellite marker panels for genetic diversity and population genetic studies: An ant colony algorithm approach with polymorphic information content. Biology 2023, 12, 1280. [Google Scholar] [CrossRef]
- Jaito, W.; Singchat, W.; Patta, C.; Thatukan, C.; Kumnan, N.; Chalermwong, P.; Budi, T.; Panthum, T.; Wongloet, W.; Wattanadilokchatkun, P. Shared alleles and genetic structures in different Thai domestic cat breeds: The possible influence of common racial origins. Genom. Inform. 2024, 22, 12. [Google Scholar] [CrossRef]
- Patta, C.; Singchat, W.; Thatukan, C.; Jaito, W.; Kumnan, N.; Chalermwong, P.; Panthum, T.; Budi, T.; Wongloet, W.; Wattanadilokchatkun, P. Optimizing Bangkaew dog breed identification using DNA technology. Genes Genom. 2024, 46, 659–669. [Google Scholar] [CrossRef]
- Quainoo, D.K.; Chalermwong, P.; Muangsuk, P.; Nguyen, T.H.D.; Panthum, T.; Singchat, W.; Budi, T.; Duengkae, P.; Suksavate, W.; Chaiyes, A. Genetic insights for enhancing conservation strategies in captive and wild Asian elephants through improved non-invasive DNA-based individual identification. PLoS ONE 2025, 20, e0320480. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Hall, M.A. Correlation-Based Feature Selection for Machine Learning; The University of Waikato: Hamilton, New Zealand, 1999. [Google Scholar]
- Omary, Z.; Mtenzi, F. Machine learning approach to identifying the dataset threshold for the performance estimators in supervised learning. Int. J. Infonomics 2010, 3, 314–325. [Google Scholar] [CrossRef]
- Trevor, H.; Robert, T.; Jerome, F. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
- Kawakubo, H.; Yoshida, H. Rapid feature selection based on random forests for high-dimensional data. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV, USA, 16–19 July 2012; pp. 1–6. [Google Scholar]
- Fox, E.W.; Hill, R.A.; Leibowitz, S.G.; Olsen, A.R.; Thornbrugh, D.J.; Weber, M.H. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology. Environ. Monit. Assess. 2017, 189, 316. [Google Scholar] [CrossRef]
- Sylvester, E.V.; Bentzen, P.; Bradbury, I.R.; Clément, M.; Pearce, J.; Horne, J.; Beiko, R.G. Applications of random forest feature selection for fine-scale genetic population assignment. Evol. Appl. 2018, 11, 153–165. [Google Scholar] [CrossRef]
- Zanotti, M. The cost of ensembling: Is it always worth combining? arXiv 2025, arXiv:2506.04677. [Google Scholar] [CrossRef]
- Erickson, P.A.; Weller, C.A.; Song, D.Y.; Bangerter, A.S.; Schmidt, P.; Bergland, A.O. Unique genetic signatures of local adaptation over space and time for diapause, an ecologically relevant complex trait, in Drosophila melanogaster. PLoS Genet. 2020, 16, e1009110. [Google Scholar] [CrossRef] [PubMed]
- Mtileni, B.; Muchadeyi, F.; Maiwashe, A.; Chimonyo, M.; Dzama, K. Conservation and utilisation of indigenous chicken genetic resources in Southern Africa. World’s Poult. Sci. J. 2012, 68, 727–748. [Google Scholar] [CrossRef]
- Chebo, C.; Betsha, S.; Melesse, A. Chicken genetic diversity, improvement strategies and impacts on egg productivity in Ethiopia: A review. World’s Poult. Sci. J. 2022, 78, 803–821. [Google Scholar] [CrossRef]
- Himel, G.M.S.; Islam, M.M.; Rahaman, M. Utilizing EfficientNet for sheep breed identification in low-resolution images. Syst. Soft Comput. 2024, 6, 200093. [Google Scholar] [CrossRef]
- Liu, R.; Xu, Z.; Teng, J.; Pan, X.; Lin, Q.; Cai, X.; Diao, S.; Feng, X.; Yuan, X.; Li, J. Evaluation of six machine learning classification algorithms in pig breed identification using SNPs array data. Anim. Genet. 2023, 54, 113–122. [Google Scholar] [CrossRef]
- Kumar, R.; Sharma, M.; Dhawale, K.; Singal, G. Identification of dog breeds using deep learning. In Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing (IACC), Tiruchirappalli, India, 13–14 December 2019; pp. 193–198. [Google Scholar]
- Leng, D.; Zeng, B.; Wang, T.; Chen, B.L.; Li, D.Y.; Li, Z.J. Single-nucleus and single-cell RNA sequencing of the chicken hypothalamic–pituitary–ovarian axis provides new insights into the molecular regulatory mechanisms of ovarian development. Zool. Res. 2024, 45, 1088–1107. [Google Scholar] [CrossRef]
- Wang, T.; Leng, D.; Cai, Z.; Chen, B.; Li, J.; Kui, H.; Li, D.; Li, Z. Insights into left–right asymmetric development of the chicken ovary at the single-cell level. J. Genet. Genom. 2024, 51, 1265–1277. [Google Scholar] [CrossRef]
- Munkong, P.; Suwanasopee, T.; Koonawoottrittriron, S. Morphometric Analysis of Kai Dam Nil Kaset (Sart) Eggs: Implications for Production and Selection. Khon Kaen Agric. J. 2024, 52, 214. [Google Scholar]
- Maneechot, N.; Tunim, S.; Wattanachant, C.; Khongsen, M.; Sukteab, P.; Phongphanich, P. Genetic Diversity of Native Chicken Populations and Red Jungle Fowl in Southern Thailand Based on Mitochondrial DNA D-loop Region. Braz. J. Poult. Sci. 2025, 27, 001–010. [Google Scholar] [CrossRef]
- Mancinelli, A.C.; Menchetti, L.; Birolo, M.; Bittante, G.; Chiattelli, D.; Castellini, C. Crossbreeding to Improve Local Chicken Breeds: Predicting Growth Performance of the Crosses Using the Gompertz Model and Estimated Heterosis. Poult. Sci. 2023, 102, 102783. [Google Scholar] [CrossRef]
- Siddiqui, S.A.; Rahmatullah, R.A.; Achyar, A.; Atifah, Y.; Ahmad, A.; Fitriani, A. Dong Tao Chickens in Vietnam—A Critical Review. World’s Poult. Sci. J. 2024, 80, 1241–1263. [Google Scholar] [CrossRef]
- Budi, T.; Luu, A.H.; Singchat, W.; Wongloet, W.; Rey, J.; Kumnan, N.; Chalermwong, P.; Nguyen, C.P.T.; Panthum, T.; Tanglertpaibul, N.; et al. Purposive Breeding Strategies Drive Genetic Differentiation in Thai Fighting Cock Breeds. Genes Genom. 2024, 46, 1225–1237. [Google Scholar] [CrossRef] [PubMed]
- Phromnoi, S.; Yeamkong, S.; Mingchai, C. Phenotypic Characteristics and Morphology of Khiew-Phalee Chicken in Uttaradit Province. Rajamangala Univ. Technol. Srivijaya Res. J. 2023, 15, 37–48. [Google Scholar]
- Somkuna, E.; Intaravicha, N.; Maksuwan, A. Effects of Natural Environmental Structure on Growth Performance of Indigenous Chicks (Lueng Hang Khao). J. Vocat. Educ. Agric. 2023, 6, 68–83. [Google Scholar]
- Wongloet, W.; Singchat, W.; Chaiyes, A.; Ali, H.; Piangporntip, S.; Ariyaraphong, N.; Budi, T.; Thienpreecha, W.; Wannakan, W.; Mungmee, A.; et al. Environmental and Socio-Cultural Factors Impacting the Unique Gene Pool Pattern of Mae Hong Son Chicken. Animals 2023, 13, 1949. [Google Scholar] [CrossRef]







| Population | Aberrations | Number of Individuals | Training Dataset | Testing Dataset |
|---|---|---|---|---|
| Betong | Bt | 30 | 26 | 4 |
| Chaiyaphum (G. g. spadiceus) | Chaiya Ggs | 30 | 24 | 6 |
| Chaiyaphum (G. g. gallus) | Chatha Ggg | 30 | 25 | 5 |
| Fighting chicken | fight | 30 | 26 | 4 |
| Huai Yang Pan (G. g. spadiceus) | HYP Ggs | 30 | 26 | 4 |
| Khao Kho (G. g. spadiceus) | KK Ggs | 30 | 23 | 7 |
| Khok Mai Rua (G. g. gallus) | KMR Ggg | 30 | 25 | 5 |
| Mae Hong Son | MHS | 70 | 59 | 11 |
| Petchaburi (G. g. spadiceus) | Petch Ggs | 30 | 29 | 1 |
| Roi Et (G. g. gullus) | RE Ggg | 30 | 27 | 3 |
| Sa Kaeo (G. g. gullus) | SK Ggg | 30 | 24 | 6 |
| Si Sa Ket (G. g. gullus) | SSK Ggg | 30 | 25 | 5 |
| Uthai Thani (Samae Dam) | UT | 33 | 29 | 4 |
| Total | 433 | 368 | 65 |
| True Label | First Class | Second Class | Third Class | Final Prediction | |||
|---|---|---|---|---|---|---|---|
| Membership Probability | Population | Membership Probability | Population | Membership Probability | Population | ||
| KK Ggs | 52.81 | KK Ggs | 14.16 | KMR Ggg | 7.64 | SK Ggg | KK Ggs |
| MHS | 97.98 | MHS | 0.90 | UT | 0.45 | KMR Ggg | MHS |
| SK Ggg | 44.72 | SK Ggg | 15.96 | Chaiya Ggs | 10.56 | Chatha Ggg | SK Ggg |
| fight | 36.40 | fight | 13.71 | KMR Ggg | 12.36 | SSK Ggg | fight |
| KMR Ggg | 16.40 | fight | 14.16 | Petch Ggs | 12.58 | Chatha Ggg | fight |
| fight | 50.34 | fight | 13.71 | RE Ggg | 8.76 | KMR Ggg | fight |
| MHS | 92.81 | MHS | 3.60 | UT | 0.90 | Chatha Ggg | MHS |
| SSK Ggg | 48.54 | SSK Ggg | 19.33 | RE Ggg | 17.53 | fight | SSK Ggg |
| fight | 26.97 | fight | 22.47 | KMR Ggg | 13.93 | SSK Ggg | fight |
| Chaiya Ggs | 63.60 | Chaiya Ggs | 10.11 | KK Ggs | 8.76 | KMR Ggg | Chaiya Ggs |
| Method | Accuracy (%) | Accuracy Std | 95% CI | Kappa | NIR |
|---|---|---|---|---|---|
| Fixed data split | 95.38 | - | (0.9028, 1.0000) | 0.9492 | 0.1692 |
| R10FCVT | 91.44 | 0.0408 | (0.8904, 0.9384) | 0.9065 | 0.1617 |
| LOOCV | 90.99 | 0.2866 | (0.8830, 0.9369) | 0.9016 | 0.1617 |
| Population | Precision | Recall | F1-Score |
|---|---|---|---|
| Bt | 1.00 | 1.00 | 1.00 |
| Chaiya Ggs | 1.00 | 0.83 | 0.91 |
| Chatha Ggg | 1.00 | 1.00 | 1.00 |
| fight | 0.80 | 1.00 | 0.89 |
| HYP Ggs | 1.00 | 1.00 | 1.00 |
| KK Ggs | 0.88 | 1.00 | 0.93 |
| KMR Ggg | 1.00 | 0.60 | 0.75 |
| MHS | 1.00 | 1.00 | 1.00 |
| Petch Ggs | 0.50 | 1.00 | 0.67 |
| RE Ggg | 1.00 | 1.00 | 1.00 |
| SK Ggg | 1.00 | 1.00 | 1.00 |
| SSK Ggg | 1.00 | 1.00 | 1.00 |
| UT | 1.00 | 1.00 | 1.00 |
| macro average | 0.94 | 0.96 | 0.93 |
| weighted average | 0.97 | 0.95 | 0.95 |
| Model | Score (Accuracy) | Prediction Time (s) | Fit Time (s) | Pred Time Marginal (s) | Fit Time Marginal (s) | Stack Level | Fit Order |
|---|---|---|---|---|---|---|---|
| WeightedEnsemble_L3 | 0.992000 | 3.831986 | 277.421132 | 0.000882 | 0.280986 | 3 | 17 |
| WeightedEnsemble_L2 | 0.991429 | 0.367359 | 2.854856 | 0.000486 | 0.125387 | 2 | 13 |
| ExtraTreesGini_BAG_L1 | 0.989143 | 0.110687 | 1.107739 | 0.110687 | 1.107739 | 1 | 9 |
| NeuralNetFastAI_BAG_L2 | 0.988571 | 3.831105 | 277.140146 | 0.204727 | 7.725666 | 2 | 14 |
| ExtraTreesEntr_BAG_L1 | 0.988000 | 0.163950 | 0.799778 | 0.163950 | 0.799778 | 1 | 10 |
| LightGBMXT_BAG_L2 | 0.988000 | 4.605415 | 361.044465 | 0.979038 | 91.629985 | 2 | 15 |
| LightGBMXT_BAG_L1 | 0.987429 | 1.184527 | 14.656198 | 1.184527 | 14.656198 | 1 | 4 |
| RandomForestGini_BAG_L1 | 0.986286 | 0.086773 | 0.875183 | 0.086773 | 0.875183 | 1 | 6 |
| LightGBM_BAG_L2 | 0.986286 | 3.823396 | 302.657472 | 0.197019 | 33.242992 | 2 | 16 |
| CatBoost_BAG_L1 | 0.985714 | 0.070553 | 189.957427 | 0.070553 | 189.957427 | 1 | 8 |
| RandomForestEntr_BAG_L1 | 0.985143 | 0.092236 | 0.821953 | 0.092236 | 0.821953 | 1 | 7 |
| NeuralNetFastAI_BAG_L1 | 0.972571 | 0.072332 | 4.908889 | 0.072332 | 4.908889 | 1 | 3 |
| NeuralNetTorch_BAG_L1 | 0.970857 | 0.174236 | 27.011140 | 0.174236 | 27.011140 | 1 | 12 |
| LightGBM_BAG_L1 | 0.970857 | 1.020726 | 19.746374 | 1.020726 | 19.746374 | 1 | 5 |
| XGBoost_BAG_L1 | 0.965714 | 0.387963 | 9.517267 | 0.387963 | 9.517267 | 1 | 11 |
| KNeighborsDist_BAG_L1 | 0.925143 | 0.109600 | 0.003412 | 0.109600 | 0.003412 | 1 | 2 |
| KNeighborsUnif_BAG_L1 | 0.860571 | 0.152794 | 0.009120 | 0.152794 | 0.009120 | 1 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Toky, R.F.M.; Sukhamsri, S.; Medhasi, S.; Budi, T.; Panthum, T.; Singchat, W.; Srikulnath, K. High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework. Biology 2026, 15, 21. https://doi.org/10.3390/biology15010021
Toky RFM, Sukhamsri S, Medhasi S, Budi T, Panthum T, Singchat W, Srikulnath K. High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework. Biology. 2026; 15(1):21. https://doi.org/10.3390/biology15010021
Chicago/Turabian StyleToky, Rajaonarison Faniriharisoa Maxime, Sutthisak Sukhamsri, Sadeep Medhasi, Trifan Budi, Thitipong Panthum, Worapong Singchat, and Kornsorn Srikulnath. 2026. "High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework" Biology 15, no. 1: 21. https://doi.org/10.3390/biology15010021
APA StyleToky, R. F. M., Sukhamsri, S., Medhasi, S., Budi, T., Panthum, T., Singchat, W., & Srikulnath, K. (2026). High-Accuracy Chicken Breed Identification Using Microsatellite Genotype Data and AutoGluon Framework. Biology, 15(1), 21. https://doi.org/10.3390/biology15010021

