Benchmarking Feature Selection Methods and Prediction Models for Flowering Time Prediction in Maize
Abstract
1. Introduction
2. Results
2.1. Pipeline of Feature Selection Methods and Prediction Models for Complex Trait Prediction
2.2. The Performance of Linear and Nonlinear Feature Selection on Flowering Time Prediction
2.3. Machine Learning Models Achieve Higher Accuracy for Flowering Time Prediction
2.4. Performance Evaluation of Omics Data Integration for Flowering Time Prediction
2.5. Potential to Predict Flowering Time Genes Through SHAP Values
3. Discussion
3.1. Molecular Basis and Key Regulators of Flowering Time in Maize
3.2. The Potential and Challenges of Machine Learning in Plant Genomic Prediction
3.3. The Critical Role of Feature Selection in High Dimensional Omics Prediction
3.4. Limited Predictive Performance of Transcriptome-Based Models
3.5. Model Interpretability and Gene Mining Enhanced Through SHAP
4. Materials and Methods
4.1. Plant Materials
4.2. Methods
4.3. Feature Selection Methods
4.3.1. Feature Selection: Mutual Information
4.3.2. Feature Selection: Lasso
4.3.3. Feature Selection: ElasticNet
4.3.4. Feature Selection: RF
4.3.5. Feature Selection: LightGBM
4.3.6. Feature Selection: XGBoost
4.3.7. Feature Selection: Boruta
4.4. Machine Learning Algorithm Models and Traditional Linear Model
4.5. Interpretation of Candidate Genes via Explainable Artificial Intelligence (SHAP)
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Maldonado, C.; Mora-Poblete, F.; Contreras-Soto, R.I.; Ahmar, S.; Chen, J.-T.; do Amaral Júnior, A.T.; Scapim, C.A. Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species through Deep Learning and Bayesian Regularized Neural Network. Front. Plant Sci. 2020, 11, 593897. [Google Scholar] [CrossRef]
- Grote, U.; Fasse, A.; Nguyen, T.T.; Erenstein, O. Food Security and the Dynamics of Wheat and Maize Value Chains in Africa and Asia. Front. Sustain. Food Syst. 2021, 4, 617009. [Google Scholar] [CrossRef]
- Andorf, C.; Beavis, W.D.; Hufford, M.; Smith, S.; Suza, W.P.; Wang, K.; Woodhouse, M.; Yu, J.; Lübberstedt, T. Technological Advances in Maize Breeding: Past, Present and Future. Theor. Appl. Genet. 2019, 132, 817–849. [Google Scholar] [CrossRef]
- Heino, M.; Kinnunen, P.; Anderson, W.; Ray, D.K.; Puma, M.J.; Varis, O.; Siebert, S.; Kummu, M. Increased Probability of Hot and Dry Weather Extremes during the Growing Season Threatens Global Crop Yields. Sci. Rep. 2023, 13, 3583. [Google Scholar] [CrossRef]
- Abou-Elwafa, S.F.; Büttner, B.; Chia, T.; Schulze-Buxloh, G.; Hohmann, U.; Mutasa-Göttgens, E.; Jung, C.; Müller, A.E. Conservation and Divergence of Autonomous Pathway Genes in the Flowering Regulatory Network of Beta Vulgaris. J. Exp. Bot. 2011, 62, 3359–3374. [Google Scholar] [CrossRef] [PubMed]
- Büttner, B.; Abou-Elwafa, S.F.; Zhang, W.; Jung, C.; Müller, A.E. A Survey of EMS-Induced Biennial Beta Vulgaris Mutants Reveals a Novel Bolting Locus Which Is Unlinked to the Bolting Gene B. Theor. Appl. Genet. 2010, 121, 1117–1131. [Google Scholar] [CrossRef] [PubMed]
- Romero Navarro, J.A.; Willcox, M.; Burgueño, J.; Romay, C.; Swarts, K.; Trachsel, S.; Preciado, E.; Terron, A.; Delgado, H.V.; Vidal, V.; et al. A Study of Allelic Diversity Underlying Flowering-Time Adaptation in Maize Landraces. Nat. Genet. 2017, 49, 476–480, Erratum in Nat Genet. 2017, 49, 970. [Google Scholar] [CrossRef] [PubMed]
- Danilevskaya, O.N.; Meng, X.; Selinger, D.A.; Deschamps, S.; Hermon, P.; Vansant, G.; Gupta, R.; Ananiev, E.V.; Muszynski, M.G. Involvement of the MADS-Box Gene ZMM4 in Floral Induction and Inflorescence Development in Maize. Plant Physiol. 2008, 147, 2054–2069. [Google Scholar] [CrossRef]
- Li, Z.; Liu, X.; Xu, X.; Liu, J.; Sang, Z.; Yu, K.; Yang, Y.; Dai, W.; Jin, X.; Xu, Y. Favorable Haplotypes and Associated Genes for Flowering Time and Photoperiod Sensitivity Identified by Comparative Selective Signature Analysis and GWAS in Temperate and Tropical Maize. Crop J. 2020, 8, 227–242. [Google Scholar] [CrossRef]
- Sun, H.; Wang, C.; Chen, X.; Liu, H.; Huang, Y.; Li, S.; Dong, Z.; Zhao, X.; Tian, F.; Jin, W. Dlf1 Promotes Floral Transition by Directly Activating ZmMADS4 and ZmMADS67 in the Maize Shoot Apex. New Phytol. 2020, 228, 1386–1400. [Google Scholar] [CrossRef]
- Meuwissen, T.H.E.; Hayes, B.J.; Goddard, M.E. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef]
- Erbe, M.; Hayes, B.J.; Matukumalli, L.K.; Goswami, S.; Bowman, P.J.; Reich, C.M.; Mason, B.A.; Goddard, M.E. Improving Accuracy of Genomic Predictions Within and Between Dairy Cattle Breeds with Imputed High-Density Single Nucleotide Polymorphism Panels. J. Dairy Sci. 2012, 95, 4114–4129, Erratum in J. Dairy Sci. 2014, 97, 6622. [Google Scholar] [CrossRef] [PubMed]
- Technow, F.; Bürger, A.; Melchinger, A.E. Genomic Prediction of Northern Corn Leaf Blight Resistance in Maize with Combined or Separated Training Sets for Heterotic Groups. G3 Genes Genomes Genet. 2013, 3, 197–203. [Google Scholar] [CrossRef] [PubMed]
- Marsh, J.I.; Hu, H.; Gill, M.; Batley, J.; Edwards, D. Crop Breeding for a Changing Climate: Integrating Phenomics and Genomics with Bioinformatics. Theor. Appl. Genet. 2021, 134, 1677–1690. [Google Scholar] [CrossRef] [PubMed]
- Bermingham, M.L.; Pong-Wong, R.; Spiliopoulou, A.; Hayward, C.; Rudan, I.; Campbell, H.; Wright, A.F.; Wilson, J.F.; Agakov, F.; Navarro, P.; et al. Application of High-Dimensional Feature Selection: Evaluation for Genomic Prediction in Man. Sci. Rep. 2015, 5, 10312. [Google Scholar] [CrossRef]
- Li, B.; Zhang, N.; Wang, Y.-G.; George, A.W.; Reverter, A.; Li, Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods. Front. Genet. 2018, 9, 237. [Google Scholar] [CrossRef] [PubMed]
- Al-Mamun, H.A.; Danilevicz, M.F.; Marsh, J.I.; Gondro, C.; Edwards, D. Exploring Genomic Feature Selection: A Comparative Analysis of GWAS and Machine Learning Algorithms in a Large-scale Soybean Dataset. Plant Genome 2025, 18, e20503. [Google Scholar] [CrossRef] [PubMed]
- Ning, C.; Ouyang, H.; Xiao, J.; Wu, D.; Sun, Z.; Liu, B.; Shen, D.; Hong, X.; Lin, C.; Li, J.; et al. Development and Validation of an Explainable Machine Learning Model for Mortality Prediction among Patients with Infected Pancreatic Necrosis. Eclinicalmedicine 2025, 80, 103074. [Google Scholar] [CrossRef]
- Madakkatel, I.; Hyppönen, E. LLpowershap: Logistic Loss-Based Automated Shapley Values Feature Selection Method. BMC Med. Res. Methodol. 2024, 24, 247. [Google Scholar] [CrossRef]
- Wang, P.; Lehti-Shiu, M.D.; Lotreck, S.; Segura Abá, K.; Krysan, P.J.; Shiu, S.-H. Prediction of Plant Complex Traits via Integration of Multi-Omics Data. Nat. Commun. 2024, 15, 6856. [Google Scholar] [CrossRef]
- Knoch, D.; Werner, C.R.; Meyer, R.C.; Riewe, D.; Abbadi, A.; Lücke, S.; Snowdon, R.J.; Altmann, T. Multi-Omics-Based Prediction of Hybrid Performance in Canola. Theor. Appl. Genet. 2021, 134, 1147–1165. [Google Scholar] [CrossRef]
- Alemu, A.; Åstrand, J.; Montesinos-López, O.A.; Isidro YSánchez, J.; Fernández-Gónzalez, J.; Tadesse, W.; Vetukuri, R.R.; Carlsson, A.S.; Ceplitis, A.; Crossa, J.; et al. Genomic Selection in Plant Breeding: Key Factors Shaping Two Decades of Progress. Mol. Plant 2024, 17, 552–578. [Google Scholar] [CrossRef]
- Yang, Z.; Song, M.; Huang, X.; Rao, Q.; Zhang, S.; Zhang, Z.; Wang, C.; Li, W.; Qin, R.; Zhao, C.; et al. AdaptiveGS: An Explainable Genomic Selection Framework Based on Adaptive Stacking Ensemble Machine Learning. Theor. Appl. Genet. 2025, 138, 204. [Google Scholar] [CrossRef]
- Liang, Y.; Liu, Q.; Wang, X.; Huang, C.; Xu, G.; Hey, S.; Lin, H.; Li, C.; Xu, D.; Wu, L.; et al. Zm MADS 69 Functions as a Flowering Activator through the ZmRap2.7- ZCN 8 Regulatory Module and Contributes to Maize Flowering Time Adaptation. New Phytol. 2019, 221, 2335–2347. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhao, B.; Xie, Y.; Jia, H.; Li, Y.; Xu, M.; Wu, G.; Ma, X.; Li, Q.; Hou, M.; et al. The Evening Complex Promotes Maize Flowering and Adaptation to Temperate Regions. Plant Cell 2023, 35, 369–389. [Google Scholar] [CrossRef] [PubMed]
- Hansey, C.N.; Johnson, J.M.; Sekhon, R.S.; Kaeppler, S.M.; De Leon, N. Genetic Diversity of a Maize Association Population with Restricted Phenology. Crop Sci. 2011, 51, 704–715. [Google Scholar] [CrossRef]
- Kim, R.J.; Kim, H.J.; Shim, D.; Suh, M.C. Molecular and Biochemical Characterizations of the Monoacylglycerol Lipase Gene Family of Arabidopsis thaliana. Plant J. 2016, 85, 758–771. [Google Scholar] [CrossRef] [PubMed]
- Kumar, V.; Mandlik, R.; Kumawat, S.; Mahakalkar, B.; Rana, N.; Sharma, Y.; Rajora, N.; Sudhakaran, S.; Vats, S.; Deshmukh, R.; et al. Deciphering the Role of Monoacylglycerol Lipases (MAGL) under Abiotic Stress and Lipid Metabolism in Soybean (Glycine max L.). Plant Biotechnol. J. 2025, 23, 4318–4335. [Google Scholar] [CrossRef]
- Barnes, A.C.; Rodríguez-Zapata, F.; Juárez-Núñez, K.A.; Gates, D.J.; Janzen, G.M.; Kur, A.; Wang, L.; Jensen, S.E.; Estévez-Palmas, J.M.; Crow, T.M.; et al. An Adaptive Teosinte Mexicana Introgression Modulates Phosphatidylcholine Levels and Is Associated with Maize Flowering Time. Proc. Natl. Acad. Sci. USA 2022, 119, e2100036119. [Google Scholar] [CrossRef]
- Feiz, L.; Shyu, C.; Wu, S.; Ahern, K.R.; Gull, I.; Rong, Y.; Artymowicz, C.J.; Piñeros, M.A.; Fei, Z.; Brutnell, T.P.; et al. COI1 F-Box Proteins Regulate DELLA Protein Levels, Growth, and Photosynthetic Efficiency in Maize. Plant Cell 2024, 36, 3237–3259. [Google Scholar] [CrossRef]
- Alves, A.A.C.; Espigolan, R.; Bresolin, T.; Costa, R.M.; Fernandes Júnior, G.A.; Ventura, R.V.; Carvalheiro, R.; Albuquerque, L.G. Genome-Enabled Prediction of Reproductive Traits in Nellore Cattle Using Parametric Models and Machine Learning Methods. Anim. Genet. 2021, 52, 32–46. [Google Scholar] [CrossRef]
- Alves, A.A.C.; da Costa, R.M.; Bresolin, T.; Fernandes Júnior, G.A.; Espigolan, R.; Ribeiro, A.M.F.; Carvalheiro, R.; de Albuquerque, L.G. Genome-Wide Prediction for Complex Traits Under the Presence of Dominance Effects in Simulated Populations Using GBLUP and Machine Learning Methods. J. Anim. Sci. 2020, 98, skaa179. [Google Scholar] [CrossRef]
- He, D.; Rish, I.; Haws, D.; Parida, L. MINT: Mutual Information Based Transductive Feature Selection for Genetic Trait Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 13, 578–583. [Google Scholar] [CrossRef]
- e Sousa, M.B.; Galli, G.; Lyra, D.H.; Granato, Í.S.C.; Matias, F.I.; Alves, F.C.; Fritsche-Neto, R. Increasing Accuracy and Reducing Costs of Genomic Prediction by Marker Selection. Euphytica 2019, 215, 18. [Google Scholar] [CrossRef]
- Guo, Z.; Wang, H.; Tao, J.; Ren, Y.; Xu, C.; Wu, K.; Zou, C.; Zhang, J.; Xu, Y. Development of Multiple SNP Marker Panels Affordable to Breeders through Genotyping by Target Sequencing (GBTS) in Maize. Mol. Breed. 2019, 39, 37. [Google Scholar] [CrossRef]
- Wang, H.; Yan, S.; Wang, W.; Cheng, Y.; Hong, J.; He, Q.; Diao, X.; Lin, Y.; Chen, Y.; Cao, Y.; et al. Cropformer: An Interpretable Deep Learning Framework for Crop Genome Prediction. Plant Commun. 2024, 6, 101223. [Google Scholar] [CrossRef]
- Zhang, Y.; Ikram, M.; Khan, N.; Zhao, N.; Derakhshani, B.; Usman, B.; Wang, H.-F. Machine Learning and Functional Validation Identify OsRAV11/12 as Negative Regulators of Drought Tolerance and Early Flowering in Rice. Int. J. Biol. Macromol. 2025, 329, 147709. [Google Scholar] [CrossRef]
- Deng, P.; Liu, K.; Zhou, M.; Li, M.; Yang, R.; Cao, C.; Wang, M.; Zhang, Z. DPCformer: An Interpretable Deep Learning Model for Genomic Prediction in Crops. arXiv 2025, arXiv:2510.08662. [Google Scholar] [CrossRef]
- Hirsch, C.N.; Foerster, J.M.; Johnson, J.M.; Sekhon, R.S.; Muttoni, G.; Vaillancourt, B.; Peñagaricano, F.; Lindquist, E.; Pedraza, M.A.; Barry, K.; et al. Insights into the Maize Pan-Genome and Pan-Transcriptome. Plant Cell 2014, 26, 121–135. [Google Scholar] [CrossRef] [PubMed]
- Azodi, C.B.; Pardo, J.; VanBuren, R.; De Los Campos, G.; Shiu, S.-H. Transcriptome-Based Prediction of Complex Traits in Maize. Plant Cell 2020, 32, 139–151. [Google Scholar] [CrossRef] [PubMed]
- Haghighat, M.B.A.; Aghagolzadeh, A.; Seyedarabi, H. Multi-Focus Image Fusion for Visual Sensor Networks in DCT Domain. Comput. Electr. Eng. 2011, 37, 789–797. [Google Scholar] [CrossRef]
- Jubair, S.; Tucker, J.R.; Henderson, N.; Hiebert, C.W.; Badea, A.; Domaratzki, M.; Fernando, W.G.D. GPTransformer: A Transformer-Based Deep Learning Method for Predicting Fusarium Related Traits in Barley. Front. Plant Sci. 2021, 12, 761402. [Google Scholar] [CrossRef]
- Tian, S.; Yu, Y.; Guo, H. Variable Selection and Corporate Bankruptcy Forecasts. J. Bank. Financ. 2015, 52, 89–100. [Google Scholar] [CrossRef]
- Coad, A.; Srhoj, S. Catching Gazelles with a Lasso: Big Data Techniques for the Prediction of High-Growth Firms. Small Bus. Econ. 2020, 55, 541–565. [Google Scholar] [CrossRef]
- Milanez-Almeida, P.; Martins, A.J.; Germain, R.N.; Tsang, J.S. Cancer Prognosis with Shallow Tumor RNA Sequencing. Nat. Med. 2020, 26, 188–192. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
- Ghosh, D.; Cabrera, J. Enriched Random Forest for High Dimensional Genomic Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 2817–2828. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD Intetiona Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery (ACM): New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Kursa, M.B.; Jankowski, A.; Rudnicki, W.R. Boruta—A System for Feature Selection. Fundam. Inform. 2010, 101, 271–285. [Google Scholar] [CrossRef]






Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Du, Y.; Jia, N.; Wang, Y.; Li, R.; Lu, Y.; Würschum, T.; Zhu, X.; Liu, W. Benchmarking Feature Selection Methods and Prediction Models for Flowering Time Prediction in Maize. Int. J. Mol. Sci. 2026, 27, 1635. https://doi.org/10.3390/ijms27041635
Du Y, Jia N, Wang Y, Li R, Lu Y, Würschum T, Zhu X, Liu W. Benchmarking Feature Selection Methods and Prediction Models for Flowering Time Prediction in Maize. International Journal of Molecular Sciences. 2026; 27(4):1635. https://doi.org/10.3390/ijms27041635
Chicago/Turabian StyleDu, Yan, Nianhua Jia, Yueli Wang, Ronglan Li, Ying Lu, Tobias Würschum, Xintian Zhu, and Wenxin Liu. 2026. "Benchmarking Feature Selection Methods and Prediction Models for Flowering Time Prediction in Maize" International Journal of Molecular Sciences 27, no. 4: 1635. https://doi.org/10.3390/ijms27041635
APA StyleDu, Y., Jia, N., Wang, Y., Li, R., Lu, Y., Würschum, T., Zhu, X., & Liu, W. (2026). Benchmarking Feature Selection Methods and Prediction Models for Flowering Time Prediction in Maize. International Journal of Molecular Sciences, 27(4), 1635. https://doi.org/10.3390/ijms27041635

