Using Machine Learning to Classify Capsicum Genotypes Based on Agronomic Traits
Abstract
1. Introduction
2. Materials and Methods
2.1. Experimental Design
2.2. Statistical Analyses
- Y is the matrix of dependent variables (e.g., agronomic traits such as stem diameter, plant height, and pH);
- X is the design matrix representing the independent variables or experimental factors (e.g., Capsicum varieties);
- B is the matrix of coefficients associated with the effects of the explanatory variables;
- E is the matrix of residual errors, representing the unexplained variation.
- H (Hypothesis matrix): represents the variation explained by the model (i.e., between-group variation);
- E (Error matrix): represents the residual or within-group variation.
- Λ (Lambda) is the test statistic;
- |E| and |H| denote the determinants of the error and hypothesis matrices, respectively.
- Lower values of Λ indicate greater differences between groups.
- U is the test statistic;
- tr(·) denotes the trace of a matrix (sum of diagonal elements);
- E−1 is the inverse of the error matrix.
- V is the test statistic;
- (H + E)−1 represents the inverse of the total variation matrix.
- θ (theta) is the largest eigenvalue;
- eigenvalue represents the magnitude of variance explained along a given direction.
- D(s,t) is the distance between cluster s and cluster t;
- ns and nt are the number of observations in clusters s and t, respectively;
- xsi and xti are the means of the ith variable for clusters s and t, respectively;
- p is the number of variables.
- are the coordinates on the confidence ellipse;
- represents the means of the two variables (the center of the ellipse);
- and are the eigenvalues of the variance–covariance matrix of the variables;
- are the elements of the eigenvector matrix that orients the ellipse in space;
- c is a scaling factor based on the critical value of the Chi-square distribution χ2 for a confidence level;
- α e p degrees of freedom, adjusting the size of the ellipse to the desired confidence level.
2.3. Machine Learning Models
3. Results
3.1. Phenotypic Variability and Data Distribution
3.2. Multivariate Analysis and Pattern Recognition
3.3. Machine Learning Model Performance
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ANOVA | Analysis of Variance |
| MANOVA | Multivariate Analysis of Variance |
| PCA | Principal Component Analysis |
| KNN | K-Nearest Neighbors |
| RF | Random Forest |
| XGBoost | Extreme Gradient Boosting |
| ROC | Receiver Operating Characteristic |
| AUC | Area Under the Curve |
| NMR | Nuclear Magnetic Resonance |
| HS-SPME | Headspace Solid Phase Microextraction |
| GC–MS | Gas Chromatography–Mass Spectrometry |
| RGB | Red Green Blue |
| IoT | Internet of Things |
References
- Ramírez-Meraz, M.; Méndez-Aguilar, R.; Zepeda-Vallejo, L.G. Exploring the chemical diversity of Capsicum chinense cultivars using NMR-based metabolomics and machine learning methods. Food Res. Int. 2024, 178, 113796. [Google Scholar] [CrossRef] [PubMed]
- Munjuluri, S.; Wilkerson, D.A.; Sooch, G. Capsaicin and TRPV1 Channels in the Cardiovascular System: The Role of Inflammation. Cells 2021, 11, 18. [Google Scholar] [CrossRef] [PubMed]
- Tripodi, P.; Rabanus-Wallace, M.T.; Barchi, L. Global range expansion history of pepper (Capsicum spp.) revealed by over 10,000 genebank accessions. Proc. Natl. Acad. Sci. USA 2021, 118, e2104315118. [Google Scholar] [CrossRef] [PubMed]
- Brilhante, B.D.G.; Santos, T.O.; Santos, P.H.A.D. Phenotypic and molecular characterization of Brazilian Capsicum germplasm. Agronomy 2021, 11, 854. [Google Scholar] [CrossRef]
- Devi, J.; Sagar, V.; Kaswan, V. Advances in breeding strategies of bell pepper (Capsicum annuum L.). In Advances in Plant Breeding Strategies: Vegetable Crops; Springer: Cham, Switzerland, 2021; pp. 3–58. [Google Scholar]
- Bellei, C.M.; Rodrigues-Ferreira, S.; Azevedo-dos-Santos, L. Insecticidal activity of Capsicum annuum leaf proteins on Callosobruchus maculatus. J. Asia Pac. Entomol. 2023, 26, 102158. [Google Scholar] [CrossRef]
- Srivastava, A.; Baliyan, N.; Mangal, M. Capsaicin: Sources, isolation, quantitative analysis and applications. In Capsaicinoids; Springer: Singapore, 2024; pp. 25–53. [Google Scholar]
- Li, H.; Gao, Z.; Tan, C. CRS: An online database of Capsicum annuum RNA-seq libraries. Sci. Hortic. 2023, 312, 111864. [Google Scholar] [CrossRef]
- Martinez, M.; Santos, C.P.; Verruma-Bernardi, M.R. Agronomic, physicochemical and sensory evaluation of pepper hybrids (Capsicum chinense). Sci. Hortic. 2021, 277, 109819. [Google Scholar] [CrossRef]
- Sudianto; Herdiyeni, Y.; Haristu, A.; Hardhienata, M. Chilli quality classification using deep learning. In Proceedings of the 2020 International Conference on Computer Science and Its Application in Agriculture (ICOSICA); IEEE: Kuala Lumpur, Malaysia, 2020; pp. 1–5. [Google Scholar]
- Ramírez-Meraz, M.; Méndez-Aguilar, R.; Hidalgo-Martínez, D. Experimental races of Capsicum annuum cv. jalapeño: Chemical characterization and classification by NMR/machine learning. Food Res. Int. 2020, 138, 109763. [Google Scholar] [CrossRef]
- Raschka, S.; Patterson, J.; Nolet, C. Machine learning in Python: Main developments and technology trends. Information 2020, 11, 193. [Google Scholar] [CrossRef]
- Mouloodi, S.; Rahmanpanah, H.; Gohari, S. Applications of artificial intelligence in equine biomechanical research. J. Mech. Behav. Biomed. Mater. 2021, 123, 104728. [Google Scholar] [CrossRef]
- Aziz, M.A.; Nazir, W.M.A.; Ali, A.M.; Abawajy, J. Chili ripeness grading simulation using machine learning. In Proceedings of the 2021 IEEE International Conference on Computing (ICOCO); IEEE: Kuala Lumpur, Malaysia, 2021; pp. 253–258. [Google Scholar]
- Rasekh, M.; Karami, H.; Fuentes, S. Non-destructive sorting techniques for pepper using odor parameters. LWT 2022, 164, 113667. [Google Scholar] [CrossRef]
- Tripathi, A.; Waqas, A.; Venkatesan, K. Building flexible machine-learning-ready multimodal oncology datasets. Sensors 2024, 24, 1634. [Google Scholar] [CrossRef] [PubMed]
- Sharma, P.; Hans, P.; Gupta, S.C. Classification of plant leaf diseases using machine learning. In Proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering; IEEE: Kuala Lumpur, Malaysia, 2020; pp. 480–484. [Google Scholar]
- Azimi, S.; Kaur, T.; Gandhi, T.K. A deep learning approach to measure stress level in plants due to nitrogen deficiency. Measurement 2021, 173, 108650. [Google Scholar] [CrossRef]
- Jafar, A.; Bibi, N.; Naqvi, R.A. Revolutionizing agriculture with artificial intelligence: Plant disease detection methods. Front. Plant Sci. 2024, 15, 1356260. [Google Scholar] [CrossRef]
- Kundu, N.; Rani, G.; Dhaka, V.S. Deep learning models for disease classification in bell pepper. In Proceedings of the 2020 Sixth International Conference on Parallel, Distributed and Grid Computing; IEEE: Kuala Lumpur, Malaysia, 2020; pp. 243–247. [Google Scholar]
- Yee-Rendon, A.; Torres-Pacheco, I.; Trujillo-Lopez, A.S. Analysis of New RGB Vegetation Indices for PHYVV and TMV Identification in Jalapeño Pepper (Capsicum annuum) Leaves Using CNNs-Based Model. Plants 2021, 10, 1977. [Google Scholar] [CrossRef]
- Park, J.R.; Kang, H.H.; Cho, J.K. Rapid determination of piperine using NIR and multivariate statistical analysis. Foods 2020, 9, 1437. [Google Scholar] [CrossRef]
- Niu, W.; Tian, H.; Zhan, P. Pepper volatile flavor compounds using HS-SPME–GC–MS and multivariate statistics. Molecules 2022, 27, 7760. [Google Scholar] [CrossRef] [PubMed]
- Espichán, F.; Rojas, R.; Quispe, F. Metabolomic characterization of Peruvian chili peppers. Food Chem. 2022, 386, 132704. [Google Scholar] [CrossRef]
- Ye, Z.; Shang, Z.; Li, M. Evaluation of physicochemical qualities of pickled Chinese pepper. Food Res. Int. 2020, 137, 109535. [Google Scholar] [CrossRef]
- González-López, J.; Rodríguez-Moar, S.; Silvar, C. Correlation Analysis of High-Throughput Fruit Phenomics and Biochemical Profiles in Native Peppers (Capsicum spp.) from the Primary Center of Diversification. Agronomy 2021, 11, 262. [Google Scholar] [CrossRef]
- Sahmat, S.S.; Rafii, M.Y.; Oladosu, Y. Genotype and environment interactions in chilli yield attributes. Sci. Rep. 2024, 14, 1698. [Google Scholar] [CrossRef]
- Yun, B.H.; Yu, H.Y.; Kim, H. Geographical discrimination of Asian red pepper powders using NMR and deep learning. Food Chem. 2024, 439, 138082. [Google Scholar] [CrossRef]
- Ding, H.; Tian, J.; Yu, W. Application of artificial intelligence and big data in the food industry. Foods 2023, 12, 4511. [Google Scholar] [CrossRef] [PubMed]
- IPGRI—International Plant Genetic Resource Institute. Descriptor for Capsicum (Capsicum spp.); International Plant Genetic Resource Institute: Rome, Italy, 1995; 49p. [Google Scholar]
- Goudet, J. hierfstat: A package for R to compute hierarchical F-statistics. Mol. Ecol. Notes 2005, 5, 184–186. [Google Scholar] [CrossRef]
- Kurita, T. Principal Component Analysis (PCA). In Computer Vision; Springer: Cham, Switzerland, 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Forina, M.; Armanino, C.; Raggio, V. Clustering with dendrograms on interpretation variables. Anal. Chim. Acta 2002, 454, 13–19. [Google Scholar] [CrossRef]
- Harrou, F.; Zeroual, A.; Hittawe, M.M.; Sun, Y. Recurrent and convolutional neural networks for traffic management. In Road Traffic Modeling and Management; Elsevier: Amsterdam, The Netherlands, 2022; pp. 197–246. [Google Scholar] [CrossRef]
- Wang, B.; Shi, W.; Miao, Z. Confidence analysis of standard deviational ellipse and its extension into higher dimensional Euclidean space. PLoS ONE 2015, 10, e0118537. [Google Scholar] [CrossRef]
- Reis, I.; Baron, D.; Shahaf, S. Probabilistic RF for noisy datasets. Astron. J. 2019, 157, 16. [Google Scholar] [CrossRef]
- Lakshmanaprabu, S.K.; Shankar, K.; Ilayaraja, M. RF for big data classification in IoT. Int. J. Mach. Learn. Cybern. 2019, 10, 2609–2618. [Google Scholar] [CrossRef]
- Valavi, R.; Elith, J.; Lahoz-Monfort, J.; Guillera-Arroita, G. Modelling species presence-only data with random forests. Ecography 2021, 44, 1731–1742. [Google Scholar] [CrossRef]
- Zhu, X.; Ying, C.; Wang, J. Ensemble of ML-KNN for classification algorithm recommendation. Knowl. Based Syst. 2021, 221, 106933. [Google Scholar] [CrossRef]
- Kirana, R.; Handayani, T.; Harmanto; Anwarudin, M.J. Selection of chili pepper hybrid variety candidate based on flower characteristics. IOP Conf. Ser. Earth Environ. Sci. 2023, 1172, 012023. [Google Scholar] [CrossRef]
- Sahin, E.K. Predictive capability of ensemble tree methods for landslide susceptibility mapping. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
- Abedi, R.; Costache, R.; Shafizadeh-Moghadam, H.; Pham, Q.B. Flash-flood susceptibility mapping using machine learning. Geocarto Int. 2022, 37, 5479–5496. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, Y.; Ali, B. Genome-wide identification of Rboh family genes in pepper. Trop. Plant Biol. 2021, 14, 251–266. [Google Scholar] [CrossRef]
- Meshram, V.; Patil, K.; Meshram, V. Machine learning in agriculture: A state-of-the-art survey. AI Life Sci. 2021, 1, 100010. [Google Scholar] [CrossRef]









| Scientific Name | Common Name in Brazil | Translation to English | Code |
|---|---|---|---|
| Capsicum frutescens | Grisu | Grisu | T1 |
| Capsicum baccatum | Dedo de moça | Lady’s Finger | T2 |
| Capsicum chinense | BRS Seriema | BRS Seriema | T3 |
| Capsicum annuum | Vulcão | Volcano | T4 |
| Capsicum chinense | Biquinho amarela | Little Beak Yellow | T5 |
| Capsicum annuum | Pimentão amarelo | Yellow Bell Pepper | T6 |
| Capsicum chinense | Guaraci Cumari do Pará | Guaraci Cumari from Pará | T7 |
| Capsicum chinense | BRS Moema | BRS Moema | T8 |
| Capsicum annuum | Doce Italiana | Sweet Italian | T9 |
| Capsicum frutescens | Tabasco | Tabasco | T10 |
| Capsicum frutescens | Peter | Peter | T11 |
| Capsicum frutescens | Malagueta | Malagueta | T12 |
| Capsicum annuum | Jalapenho | Jalapeño | T13 |
| Capsicum annuum | Jamaican red | Jamaican Red | T14 |
| Capsicum annuum | Jamaican yellow | Jamaican Yellow | T15 |
| Capsicum chinense | Bode Amarela Arari | Yellow Goat Arari | T16 |
| Measured Variable | Statistic | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | T11 | T12 | T13 | T14 | T15 | T16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| stem diameter (mm) | mean | 9.83 | 14.076 | 12.814 | 11.608 | 11.926 | 11.284 | 14.256 | 12.076 | 11.372 | 11.142 | 12.954 | 10.834 | 8.276 | 9.972 | 12.278 | 9.432 |
| std | 2.107 | 2.248 | 1.371 | 0.898 | 1.109 | 0.891 | 1.18 | 1.248 | 0.535 | 0.905 | 0.39 | 1.167 | 1.231 | 0.729 | 1.065 | 1.022 | |
| plant height (cm) | mean | 41.8 | 66.4 | 66.6 | 47.4 | 47.5 | 42.2 | 42.16 | 37.7 | 46.04 | 54 | 45.4 | 47.2 | 41.9 | 40.9 | 50 | 36.8 |
| std | 4.817 | 13.126 | 5.079 | 3.912 | 4.359 | 6.211 | 6.301 | 3.718 | 2.955 | 9.407 | 0.822 | 5.495 | 4.006 | 2.074 | 2.915 | 5.45 | |
| canopy diameter (cm) | mean | 41.774 | 78.5 | 58.98 | 49.624 | 65.758 | 51.09 | 69.776 | 58.83 | 45.458 | 43.95 | 45.264 | 47.75 | 35.556 | 44.302 | 52.62 | 32.398 |
| std | 4.55 | 6.586 | 5.092 | 4.879 | 4.343 | 4.358 | 4.157 | 2.398 | 2.579 | 3.743 | 4.911 | 4.373 | 2.084 | 6.368 | 3.111 | 3.476 | |
| stem longitude (cm) | mean | 9.4 | 13 | 41.6 | 11.06 | 16.8 | 18.1 | 8.9 | 11.7 | 15.4 | 30.2 | 6.54 | 13.4 | 11.7 | 16.8 | 18.4 | 18.56 |
| std | 1.395 | 1.581 | 3.435 | 1.479 | 0.972 | 2.408 | 1.673 | 1.754 | 1.673 | 1.44 | 0.74 | 1.134 | 1.857 | 1.605 | 2.903 | 1.992 | |
| seeds per fruit | mean | 112.8 | 75.2 | 33 | 38.2 | 47.4 | 123.2 | 37 | 36 | 143.6 | 45.4 | 93 | 19.2 | 44.8 | 126 | 103.8 | 22.2 |
| std | 12.317 | 14.237 | 2.739 | 4.266 | 3.362 | 13.18 | 2.55 | 3.606 | 9.813 | 3.05 | 5.244 | 2.49 | 5.45 | 7.874 | 2.168 | 4.438 | |
| pericarp thickness (mm) | mean | 1.988 | 2.058 | 2.456 | 1.294 | 1.476 | 4.66 | 1.202 | 1.642 | 3.034 | 1.03 | 1.914 | 0.988 | 4.254 | 2.224 | 1.434 | 1.722 |
| std | 0.396 | 0.239 | 0.339 | 0.084 | 0.095 | 0.346 | 0.077 | 0.114 | 0.505 | 0.137 | 0.454 | 0.07 | 0.267 | 0.411 | 0.316 | 0.354 | |
| fruit weight (g) | mean | 0.266 | 0.96 | 0.457 | 0.264 | 0.705 | 0.369 | 0.664 | 0.817 | 0.441 | 0.562 | 0.3 | 0.252 | 0.367 | 0.376 | 0.982 | 0.661 |
| std | 0.043 | 0.114 | 0.082 | 0.037 | 0.097 | 0.105 | 0.048 | 0.129 | 0.093 | 0.082 | 0.062 | 0.018 | 0.064 | 0.034 | 0.116 | 0.074 | |
| fruit diameter (mm) | mean | 14.2 | 16.02 | 14.6 | 7.8 | 15.2 | 51.2 | 10.772 | 16.2 | 45.1 | 8 | 22 | 6.8 | 27.2 | 50 | 48.2 | 14 |
| std | 1.304 | 1.689 | 0.548 | 0.837 | 0.837 | 5.933 | 0.872 | 1.483 | 4.478 | 0.707 | 4.243 | 0.447 | 1.789 | 1.581 | 2.049 | 0.707 | |
| weight5f (g) | mean | 13.48 | 7.92 | 1.446 | 1.728 | 1.886 | 42.386 | 1.13 | 2.246 | 42.66 | 3.018 | 13.83 | 0.432 | 20.052 | 13.408 | 12.038 | 0.96 |
| std | 0.69 | 0.952 | 0.058 | 0.107 | 0.11 | 8.102 | 0.113 | 0.284 | 2.015 | 4.297 | 0.787 | 0.089 | 4.369 | 1.295 | 1.435 | 0.089 | |
| brix degree | mean | 8.44 | 8.042 | 7.844 | 8.91 | 9.052 | 7.64 | 8.94 | 9.61 | 7.23 | 8.99 | 8.33 | 12.8 | 7.376 | 8.26 | 6.874 | 6.262 |
| std | 2.06 | 0.915 | 1.069 | 1.69 | 0.807 | 0.727 | 0.532 | 0.151 | 0.148 | 0.27 | 0.504 | 0.308 | 0.438 | 0.691 | 0.952 | 0.262 | |
| pH | mean | 4.518 | 4.596 | 4.162 | 5.484 | 4.394 | 4.596 | 3.952 | 4.454 | 4.082 | 6.96 | 4.728 | 4.816 | 5.198 | 4.848 | 4.886 | 4.486 |
| std | 0.236 | 0.347 | 0.223 | 0.33 | 0.215 | 0.322 | 0.544 | 0.268 | 0.841 | 0.23 | 0.266 | 0.249 | 0.232 | 0.44 | 0.426 | 0.172 |
| DF | |||||
|---|---|---|---|---|---|
| Criterion | Test Statistic | Approx F | Num | Denom | p |
| Wilks’ | 0.00000 | 35.070 | 165 | 503 | 0.000 |
| Lawley–Hotelling | 200.28449 | 63.341 | 165 | 574 | 0.000 |
| Pillai’s | 8.11467 | 12.000 | 165 | 704 | 0.000 |
| Roy’s | 93.03567 | - | 165 | - | 0.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Freire, A.I.; Souza, A.F.d.; Leal, G.d.S.; Souza, F.B.M.d.; Verri, F.A.N.; Balestrassi, P.P.; Paiva, A.P.d.; Júnior, J.J.d.S.; Silva, L.F.d.; Garcia, F.H.S.; et al. Using Machine Learning to Classify Capsicum Genotypes Based on Agronomic Traits. Horticulturae 2026, 12, 623. https://doi.org/10.3390/horticulturae12050623
Freire AI, Souza AFd, Leal GdS, Souza FBMd, Verri FAN, Balestrassi PP, Paiva APd, Júnior JJdS, Silva LFd, Garcia FHS, et al. Using Machine Learning to Classify Capsicum Genotypes Based on Agronomic Traits. Horticulturae. 2026; 12(5):623. https://doi.org/10.3390/horticulturae12050623
Chicago/Turabian StyleFreire, Ana Izabella, Alex Fernandes de Souza, Gustavo dos Santos Leal, Filipe Bittencourt Machado de Souza, Filipe Alves Neto Verri, Pedro Paulo Balestrassi, Anderson Paulo de Paiva, João José da Silva Júnior, Leonardo França da Silva, Fernando Henrique Silva Garcia, and et al. 2026. "Using Machine Learning to Classify Capsicum Genotypes Based on Agronomic Traits" Horticulturae 12, no. 5: 623. https://doi.org/10.3390/horticulturae12050623
APA StyleFreire, A. I., Souza, A. F. d., Leal, G. d. S., Souza, F. B. M. d., Verri, F. A. N., Balestrassi, P. P., Paiva, A. P. d., Júnior, J. J. d. S., Silva, L. F. d., Garcia, F. H. S., & Fonseca, G. G. (2026). Using Machine Learning to Classify Capsicum Genotypes Based on Agronomic Traits. Horticulturae, 12(5), 623. https://doi.org/10.3390/horticulturae12050623

