Comparative Evaluation of Random Forest, XGBoost and Long Short-Term Memory Models for Weekly Banana Production Estimation on a Commercial Farm in Naranjal, Ecuador
Abstract
1. Introduction
2. Materials and Methods
2.1. Experimental Design
2.2. Study Site and Dataset
- Meteorological predictors (six variables): mean temperature (°C), total weekly precipitation (mm), mean relative humidity (%), mean wind speed (km/h), dominant wind direction (categorical: Noroeste, Oeste, Sur, Sureste, Suroeste), and mean solar radiation (W/).
- Edaphological predictors (12 variables): soil pH and concentrations of , P, K, Ca, Mg, S, Zn, Cu, Fe, Mn, and B. Soil nutrient concentrations were obtained from periodic laboratory analyses conducted every six months. For each 5-ha sampling unit, one composite soil sample was prepared from 10 subsamples, providing a representative estimate of soil chemical conditions at the production-unit scale. The semiannual laboratory values were aligned with the weekly production records by carrying the most recent soil analysis forward until the next sampling date. This procedure allowed the edaphological information to be incorporated into the weekly modeling dataset while recognizing that short-term within-semester nutrient fluctuations were not directly measured.
- Operational predictors (two variables): the number of bagged bunches (enfundes) and the number of harvested bunches. Calendar identifiers (year, month, week-of-year) and the categorical bagging-color code (color de enfunde) were additionally retained in the feature matrix as auxiliary variables; categorical fields were converted to dummy variables via one-hot encoding before model fitting.
- Target variable (one variable): the number of banana boxes processed each week, registered at the packing facility once the week has been completed.
2.3. Evaluated Models
LSTM Architecture and Training Configuration
2.4. Model Evaluation Metrics
2.5. Calibration and Hold-Out Evaluation of XGBoost
- Calibration partition: the 104 weekly records of 2022 and 2023, used exclusively for training and hyperparameter selection of the final XGBoost model.
- Hold-out test partition: the 52 weekly records of 2024, kept aside as a chronological test period used for model comparison and final reporting.
2.6. Hyperparameter Sensitivity Analysis
- learning_rate
- max_depth
- n_estimators
- colsample_bytree
- reg_alpha
- reg_lambda
3. Results
3.1. Algorithm Comparison on the 2024 Hold-Out Test Set
3.2. XGBoost Hyperparameter Sensitivity and Validation
3.2.1. Learning Rate
3.2.2. Number of Trees
3.2.3. Feature Importance
3.2.4. Hold-Out Validation of the Final XGBoost Model
4. Discussion
Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial intelligence |
| LSTM | Long Short-Term Memory |
| MAE | Mean absolute error |
| MAG | Ministerio de Agricultura y Ganadería (Ecuador) |
| ME | Mean error (bias) |
| ML | Machine learning |
| RMSE | Root mean square error |
| RF | Random Forest |
| SHAP | SHapley Additive exPlanations |
| TR4 | Tropical Race 4 (Fusarium oxysporum f. sp. cubense) |
| XGBoost | Extreme Gradient Boosting |
References
- Vaca, E.; Gaibor, N.; Kovács, K. Analysis of the chain of the banana industry of Ecuador and the European market. APSTRACT Appl. Stud. Agribus. Commer. 2020, 14, 55–64. [Google Scholar] [CrossRef]
- Olivares, B.O.; Vega, A.; Rueda Calderón, M.A.; Montenegro-Gracia, E.; Araya-Almán, M.; Marys, E. Prediction of banana production using epidemiological parameters of black Sigatoka: An application with Random Forest. Sustainability 2022, 14, 14123. [Google Scholar] [CrossRef]
- Jayasinghe, S.L.; Ranawana, C.J.K.; Liyanage, I.C.; Kaliyadasa, P.E. Growth and yield estimation of banana through mathematical modelling: A systematic review. J. Agric. Sci. 2022, 160, 152–167. [Google Scholar] [CrossRef]
- Zubelzu, S.; Panigrahi, N.; Thompson, A.J.; Knox, J.W. Modelling water fluxes to improve banana irrigation scheduling and management in Magdalena, Colombia. Irrig. Sci. 2023, 41, 69–79. [Google Scholar] [CrossRef]
- Quiloango-Chimarro, C.A.; Gioia, H.R.; De Oliveira Costa, J. Typology of production units for improving banana agronomic management in Ecuador. AgriEngineering 2024, 6, 2811–2823. [Google Scholar] [CrossRef]
- De Souza, A.V.; Neto, A.B.; Piazentin, J.C.; Junior, B.J.D.; Gomes, E.P.; Bonini, C.d.S.B.; Putti, F.F. Artificial neural network modelling in the prediction of bananas’ harvest. Sci. Hortic. 2019, 257, 108724. [Google Scholar] [CrossRef]
- Khan, T.; Qiu, J.; Ali Qureshi, M.A.; Iqbal, M.S.; Mehmood, R.; Hussain, W. Agricultural fruit prediction using deep neural networks. Procedia Comput. Sci. 2020, 174, 72–78. [Google Scholar] [CrossRef]
- Patrick, S.; Mirau, S.; Mbalawata, I.; Leo, J. Time series and ensemble models to forecast banana crop yield in Tanzania, considering the effects of climate change. Resour. Environ. Sustain. 2023, 14, 100138. [Google Scholar] [CrossRef]
- Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest algorithm overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef]
- Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
- Khan, T.; Sherazi, H.H.R.; Ali, M.; Letchmunan, S.; Butt, U.M. Deep learning-based growth prediction system: A use case of China agriculture. Agronomy 2021, 11, 1551. [Google Scholar] [CrossRef]
- Botero-Valencia, J.; García-Pineda, V.; Valencia-Arias, A. Machine Learning in Sustainable Agriculture: Systematic Review and Research Perspectives. Agriculture 2025, 15, 377. [Google Scholar] [CrossRef]
- Jarne, A.; Usón, A.; Reiné, R. Assessing the Impact of Environmental and Management Variables on Mountain Meadow Yield and Feed Quality Using a Random Forest Model. Plants 2025, 14, 2150. [Google Scholar] [CrossRef] [PubMed]
- Singh, K.; Yadav, M.; Barak, D.; Bansal, S.; Moreira, F. Machine-Learning-Based Frameworks for Reliable and Sustainable Crop Forecasting. Sustainability 2025, 17, 4711. [Google Scholar] [CrossRef]
- Sharma, R. Artificial intelligence in agriculture: A review. In Proceedings of the 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 937–942. [Google Scholar] [CrossRef]
- Olsen, T.L.; Tomlin, B. Industry 4.0: Opportunities and challenges for operations management. Manuf. Serv. Oper. Manag. 2020, 22, 113–122. [Google Scholar] [CrossRef]
- Celis Crisostomo, M.A.; Hernández López, F.M.; Cárdenas Magaña, J.A.; Vega Negrete, E. Implementación de microservicios en proyectos de IoT con Arduino. INGENIUS 2025, 34, 9–19. [Google Scholar] [CrossRef]
- Amador-Sacoto, C.; Helfgott-Lerner, S. Sustainability of sugarcane farms in the Milagro Canton, Ecuador. Int. J. Adv. Sci. Eng. Inf. Technol. 2023, 13, 837–843. [Google Scholar] [CrossRef]
- Herrera-Franco, G.; Sánchez-Arizo, V.; Escandon-Panchana, P.; Caicedo-Potosí, J.; Jaya-Montalvo, M.; Zambrano-Mendoza, J. Analysis of scientific contributions to agricultural development and food security in Ecuador. Int. J. Des. Nat. Ecodyn. 2023, 18, 1129–1139. [Google Scholar] [CrossRef]
- Luzuriaga-Amador, M.; Novillo-Luzuriaga, N.; Guevara-Viejó, F.; Valenzuela-Cobos, J.D. Evaluation of the performance of information competencies in the fertilization and trade strategies of small banana producers in Ecuador. Sustainability 2025, 17, 868. [Google Scholar] [CrossRef]
- Abdullah, N.; Mohd Taib, R.; Mohamad Aziz, N.S.; Omar, M.R.; Md Disa, N. Banana pseudo-stem biochar derived from slow and fast pyrolysis process. Heliyon 2023, 9, e12940. [Google Scholar] [CrossRef]
- Valenzuela-Cobos, J.D.; Pérez-Martínez, S.; Fiallos-Cárdenas, M.; Guevara-Viejó, F. Data mining for the characterization of a paper prototype obtained with bacterial cellulose derived from banana and pineapple by-products. Appl. Sci. 2024, 14, 11426. [Google Scholar] [CrossRef]
- Hu, J.; Szymczak, S. A review on longitudinal data analysis with Random Forest. Brief. Bioinform. 2023, 24, bbad002. [Google Scholar] [CrossRef]
- Contreras Urgiles, W.R.; León Japa, R.S.; Maldonado Ortega, J.L. Predicción de emisiones de CO y HC en motores Otto mediante redes neuronales. INGENIUS 2019, 23, 30–39. [Google Scholar] [CrossRef]
- Kumari, P.; Goswami, V.; Harshith, N.; Pundir, R.S. Recurrent neural network architecture for forecasting banana prices in Gujarat, India. PLoS ONE 2023, 18, e0275702. [Google Scholar] [CrossRef]
- Nguyen, V.G.; Sharma, P.; Ağbulut, Ü.; Le, H.S.; Cao, D.N.; Dzida, M.; Osman, S.M.; Le, H.C.; Tran, V.D. Improving the prediction of biochar production from various biomass sources through the implementation of eXplainable machine learning approaches. Int. J. Green Energy 2024, 21, 2771–2798. [Google Scholar] [CrossRef]
- Houngue, J.A.; Houédjissin, S.S.; Ahanhanzo, C.; Pita, J.S.; Houndénoukon, M.S.E.; Zandjanakou-Tachin, M. Cassava mosaic disease (CMD) in Benin: Incidence, severity and whitefly abundance from field surveys in 2020. Crop Prot. 2022, 158, 106007. [Google Scholar] [CrossRef] [PubMed]
- Mancero-Castillo, D.; Garcia, Y.; Aguirre-Munizaga, M.; Ponce De Leon, D.; Portalanza, D.; Avila-Santamaria, J. Dynamic perspectives into tropical fruit production: A review of modeling techniques. Front. Agron. 2024, 6, 1482893. [Google Scholar] [CrossRef]
- Cedric, L.S.; Adoni, W.Y.H.; Aworka, R.; Zoueu, J.T.; Mutombo, F.K.; Krichen, M.; Kimpolo, C.L.M. Crops yield prediction based on machine learning models: Case of West African countries. Smart Agric. Technol. 2022, 2, 100049. [Google Scholar] [CrossRef]
- Ibrahem Ahmed Osman, A.; Najah Ahmed, A.; Chow, M.F.; Feng Huang, Y.; El-Shafie, A. Extreme gradient boosting (XGBoost) model to predict the groundwater levels in Selangor, Malaysia. Ain Shams Eng. J. 2021, 12, 1545–1556. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Aguirre-Munizaga, M.; Chang-Zorilla, S.; Rivera, D.V.; Vera-Lucio, N. Implementation of a web application for estimating cocoa productivity using machine learning. In Information Technology and Systems; Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2025; Volume 1447, pp. 382–390. [Google Scholar] [CrossRef]
- Huang, R.; Wei, C.; Wang, B.; Yang, J.; Xu, X.; Wu, S.; Huang, S. Well performance prediction based on Long Short-Term Memory (LSTM) neural network. J. Pet. Sci. Eng. 2022, 208, 109686. [Google Scholar] [CrossRef]
- Aijaz, N.; Lan, H.; Raza, T.; Yaqub, M.; Iqbal, R.; Pathan, M.S. Artificial intelligence in agriculture: Advancing crop productivity and sustainability. J. Agric. Food Res. 2025, 20, 101762. [Google Scholar] [CrossRef]
- Fernández-Ledesma, C.M.; Garcés-Fiallos, F.R.; Rosso, F.; Cordero, N.; Ferraz, S.; Durigon, A.; Portalanza, D. Assessing the risk of Fusarium oxysporum f. sp. cubense Tropical Race 4 outbreaks in Ecuadorian banana crops using spatial climatic data. Sci. Agropecu. 2023, 14, 301–312. [Google Scholar] [CrossRef]





| Variable/Domain | Sampling Frequency | Same-Week | Target-Derived Autoregressive Features |
|---|---|---|---|
| Meteorological predictors (6 variables) | Daily, aggregated weekly | Yes | None |
| Edaphological predictors (12 variables) | Semiannual laboratory analyses, carried forward | Yes (most recent sample) | None |
| Bagged bunches (enfundes) | Recorded weekly at bagging time | Yes | None |
| Harvested bunches | Recorded after weekly harvest is completed | Yes (nowcasting) | None |
| Calendar/categorical helpers (year, month, week-of-year, bagging color) | Recorded weekly | Yes | None |
| Processed banana boxes (target, used as autoregressive source) | Recorded at the packing facility after the week ends | Not used directly | 1-, 3-, 4-, 12-, 26- and 52-week lags; 4- and 12-week rolling mean and SD (shifted 1 week); 4- and 52-week differences; log of 1-week lag |
| Model | MAE (Boxes) | RMSE (Boxes) | |
|---|---|---|---|
| Random Forest | 747.55 [616.6, 885.9] | 896.68 [756.2, 1032.1] | 0.678 [0.473, 0.794] |
| XGBoost | 712.56 [580.2, 855.2] | 862.85 [719.8, 997.9] | 0.702 [0.522, 0.805] |
| LSTM (baseline) | 1202.66 [919.5, 1515.4] | 1638.38 [1266.2, 1978.9] | −0.075 [−0.579, 0.261] |
| Metric | Symbol | Value [95% CI] |
|---|---|---|
| Coefficient of Determination | 0.702 [0.522, 0.805] | |
| Willmott Concordance Index | d | 0.910 [0.861, 0.941] |
| Mean Absolute Error | MAE | 712.56 boxes [580.2, 855.2] |
| Root Mean Square Error | RMSE | 862.85 boxes [719.8, 997.9] |
| Mean Error (Bias) | ME | +495.72 boxes [309.4, 689.9] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Aguirre-Munizaga, M.; Vásquez-Bermúdez, M.; Hidalgo-Larrea, J.; García, Y.; Avilés-Vera, M. Comparative Evaluation of Random Forest, XGBoost and Long Short-Term Memory Models for Weekly Banana Production Estimation on a Commercial Farm in Naranjal, Ecuador. Agriculture 2026, 16, 1182. https://doi.org/10.3390/agriculture16111182
Aguirre-Munizaga M, Vásquez-Bermúdez M, Hidalgo-Larrea J, García Y, Avilés-Vera M. Comparative Evaluation of Random Forest, XGBoost and Long Short-Term Memory Models for Weekly Banana Production Estimation on a Commercial Farm in Naranjal, Ecuador. Agriculture. 2026; 16(11):1182. https://doi.org/10.3390/agriculture16111182
Chicago/Turabian StyleAguirre-Munizaga, Maritza, Mitchell Vásquez-Bermúdez, Jorge Hidalgo-Larrea, Yoansy García, and María Avilés-Vera. 2026. "Comparative Evaluation of Random Forest, XGBoost and Long Short-Term Memory Models for Weekly Banana Production Estimation on a Commercial Farm in Naranjal, Ecuador" Agriculture 16, no. 11: 1182. https://doi.org/10.3390/agriculture16111182
APA StyleAguirre-Munizaga, M., Vásquez-Bermúdez, M., Hidalgo-Larrea, J., García, Y., & Avilés-Vera, M. (2026). Comparative Evaluation of Random Forest, XGBoost and Long Short-Term Memory Models for Weekly Banana Production Estimation on a Commercial Farm in Naranjal, Ecuador. Agriculture, 16(11), 1182. https://doi.org/10.3390/agriculture16111182

