Next Article in Journal
Imputation Bias in ARIMA Air Quality Models
Previous Article in Journal
A Multi-Scenario Approach of Emergency Rescuer Training and Dispatching Integration with Knowledge Accumulation Function for Large-Scale Emergencies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Analysis of Machine Learning Techniques in Predicting Maize Crop Yield: Case Study of Kayonza District—Rwanda

1
African Center of Excellence in Internet of Things, University of Rwanda, Kigali P.O. Box 3900, Rwanda
2
School of ICT, College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda
3
Regional Center of Excellence in Biomedical Engineering and E-Health, University of Rwanda, Kigali P.O. Box 3900, Rwanda
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(6), 448; https://doi.org/10.3390/a19060448
Submission received: 16 January 2026 / Revised: 7 April 2026 / Accepted: 9 April 2026 / Published: 1 June 2026

Abstract

Climate change presents significant challenges to agriculture worldwide, leading to food insecurity and impacting rural livelihoods. Maize farming is especially vulnerable to extreme weather, such as heavy rainfall, high temperatures, soil acidity, humidity, and poor irrigation, which reduce crop yields and raise concerns about food security. The study aimed to develop a reliable and accurate machine learning method to predict maize crop yields using historical climate data to facilitate decision-making. This allows farmers and agronomists to forecast maize production based on past data for adaptation. A dataset from Meteo Rwanda and maize yield data from the Kayonza district, Rwanda, were used for training and testing. The weather data included annual mean temperature, maximum temperature, minimum temperature, rainfall, and soil temperature over the past thirteen years. The data were analyzed using machine learning techniques such as Random Forest regressor, Extreme Boost regressor, Gradient, Support Vector Machine, and LASSO (Least Absolute Shrinkage and Selection Operator). The results show that developing a high-yield crop depends on predicting and integrating climate variables, especially temperature and rainfall. Overall, Random Forest, Support Vector Machine, and Extreme Boost outperformed LASSO, with R2 values of 0.957, 0.955, and 0.953, compared to 0.256 for LASSO.

1. Introduction

The world population is expected to reach 9.8 billion by 2050, making it essential to maintain ecosystem balance for community well-being. One factor that could disrupt this balance is climate degradation, which directly impacts human lives and leads to issues like hunger, malnutrition, deforestation, water shortages, and soil erosion. Africa will also face challenges, with maize yields projected to decline by 5% and wheat by 17% [1]. In Rwanda, climate change has affected agriculture and people’s livelihoods. Meteorological data indicate that rainfall variability in Rwanda is projected to increase from 5% to 10%, which will directly affect people’s lives and trigger disasters such as floods and landslides, ultimately impacting the country’s economy. Rwanda is committed to addressing these climate challenges through initiatives like the Paris Agreement [2], the Green Growth and Climate Resilience Strategy (GGCRS) [3], and the United Nations Framework Convention on Climate Change (UNFCCC). These commitments are incorporated into the Rwanda National Strategies for Transformation (NST2) for the five-year government planning cycle. According to the 2022 census by the National Institute of Statistics of Rwanda, 63% of households are involved in crop farming, with maize accounting for 56% of crop production after beans [2]. The crop intensification program [3], launched by the Rwandan government in 2018, identified maize as a suitable crop for the Kayonza district; however, in 2008, maize yields declined by 37% in the Eastern Province and 26% in the Southern Province. In 2017, over 3000 families in the Eastern Province, including Kayonza, faced food insecurity due to prolonged drought [4]. To address these issues, various studies suggest solutions for crop yield prediction, emphasizing the types of machine learning to be used for high performance and the environmental variables to be considered. Study [5] analyzed different machine learning algorithms for crop yield prediction and found that the Random Forest regressor was effective at handling complex datasets. Agriculture contributes to gross value addition in the Rwandan economy with an increase of 3.1% per year on average for the period between 2018 and 2023, compared to the overall GDP of 8.3% [6]. In [7], a review of one hundred studies on crop yield prediction identified the strengths and weaknesses of various techniques and noted that the Random Forest regressor, Support Vector Machine, and KNN are reliable and suitable for medium-sized datasets. Studies also identified key environmental variables impacting agriculture; for example, the review [8] highlighted that machine learning techniques increased yields for crops like cotton and wheat, serving as powerful tools to help farmers respond to climate variability by boosting productivity. Ref. [9] pointed out that rainfall and temperature are crucial in predicting crop yields and recommended the use of a Random Forest regressor for early yield predictions. This same study emphasized the importance of relevant climatic parameters for accurate predictions. Additionally, a deep neural network optimized with a genetic algorithm was used to maximize the R2 score [10]. In this study, several machine learning methods, including the Random Forest regressor, Support Vector Machine, Extreme Gradient Boosting, and LASSO, were explored. The focus crop is maize, and the study area is Kayonza. Weather data were obtained from Meteo Rwanda, while maize production data came from the National Institute of Statistics of Rwanda and Kayonza District. The key environmental variables considered include temperature and rainfall. The main objectives of this study are: (1) to examine the relationship between crop production and meteorological parameters such as temperature and rainfall; (2) to evaluate the importance of each meteorological parameter in crop production; (3) to assess the performance of machine learning models in predicting crop yield. The findings will help in designing a crop yield prediction system that incorporates emerging technologies such as the Internet of Things and is supported by effective machine learning. This system aims to assist farmers and policymakers in developing adaptation strategies. The paper is organized as follows: Section 2 details the materials and research methodology, including the study area, methods used, and sample distribution for both meteorological data and crop production. Section 3 discusses the results and the performance of the applied machine learning models. Section 4 provides a detailed discussion of the results, and Section 5 concludes the paper.

2. Materials and Methods

2.1. Study Area

The research was carried out in the Kayonza district, one of seven districts comprising the Eastern Province of the Republic of Rwanda. The district covers an area of 1935 km2 and has 236.26 inhabitants per km2. The total population of the district as of 2022 is 457,156. The district is composed of twelve administrative sectors; seven were considered in the study, and they are equipped with meteorological facilities where data are being collected for analysis.
Figure 1 shows a map of the Republic of Rwanda, indicating the area of study.
The total land area reserved for agriculture is 93,685.41 Ha, including livestock (57.71%). The agricultural land covers only 61,149.45 Ha, providing 37.6% of the total district land use.

2.2. Methodology

In this study, the dataset considered is the integrated agricultural data with meteorological parameters. The climate data considered are rainfall and temperature, which are considered to have more influence on crop predictions. Soil properties were considered, specifically the soil temperature. For the agronomic data, consideration was given to maize yield records for the years 2011 and 2024. The missing values were treated using mean imputation for continuous variables and mode imputation for categorical variables. The approach preserves the dataset structure and minimizes bias that may be observed. For the outliers, the interquartile range method was used to improve the model stability. For consistency, the numerical features were normalized using a Z-score. The dataset was divided into training and testing sets using an 80:20 ratio. To ensure robustness of the proposed model, k-fold validation (k = 5) was applied during model training. Cross-validation ensures that the model’s performance is not biased. In this study, four machine learning algorithms were implemented due to their proven effectiveness in regression tasks and agriculture prediction [10]. LASSO regression performs both regression and feature selection, making the model suitable for higher-dimensional datasets. The Support Vector Machine (SVM) is determined by a radial basis function kernel utilized to capture the nonlinear relationship between input features and maize yield. Random Forest is an ensemble learning method that constructs multiple decision trees and aggregates their outputs to improve accuracy, and XGBoost is a Gradient boosting algorithm known for its efficiency and superior performance in structured datasets. These models were proposed based on their strong performance in recent agricultural machine learning studies. To ensure the integrity, accuracy, and genuineness of data, seasonal reports on the maize crop were reviewed through the National Institute of Statistics of Rwanda, and the Kayonza district was considered. The methodology proposed in Figure 2 provides a standardized flow for data collection, pre-processing, model evaluation, and decision-making regarding learning. The graphical user interface of the system was developed using Python (version 3.10), and components were implemented using Thinker, the Python library for application development.

2.2.1. Distribution of the Sample

The dependent variable for the proposed models is the maize crop yield determined in tons per hectare (output). The dataset consists of thirteen years of weather data and maize crop yield observations with two seasons per year. In the training dataset, 20 observations were used to train the machine learning model, and six observations were reserved as the testing dataset for the evaluation of the predictive performance of the model. The dataset contains meteorological variables (temperature, rainfall) and maize crop yield production (t/ha). Data collected from the respective agencies were cleaned, structured, and organized, and outliers were identified and removed to ensure consistency of data and uniformity. To guarantee proper model training and evaluation, the dataset was split into a chronological train–test split as indicated in Table 1. To ensure reliability and robustness despite the limited sample size, the following were considered: K-fold cross-validation (k = 5) to maximize data utilization, the use of regularization techniques such as LASSO and SVM to control overfitting, and the application of ensemble methods such as Random Forest and XGBoost. Therefore, while increasing the sample size is desirable, the current study ensures methodological robustness of the model employed and clear strategies for the consideration of dataset expansion in future work.

2.2.2. Rainfall Data

Rainfall data constitute a critical input variable in maize yield prediction due to the crop’s high water requirements throughout its growth cycle. In this study, rainfall data were obtained from Meteo Rwanda covering daily, monthly, and annual records over an historical period from 2011 to 2024. These data were aggregated into annual rainfall values to align with the temporal resolution of maize yield data. Maize typically requires between 500 mm and 800 mm of water during the growing season, with water demand peaking during vegetative growth and reproductive stages. Therefore, rainfall variability directly influences yield outcomes. Excessive rainfall or waterlogging can also be detrimental, as maize is susceptible to root diseases in saturated soil conditions [11]. The daily rainfall observed provides discrepancies in rainfall distribution for the area under study, where the minimum daily rainfall ranged between 20 mm and 150 mm, which directly affects the expected maize productivity. In the dataset, rainfall was included as the primary climatic feature, and feature importance analysis showed that annual rainfall contributed the strongest influence (≈0.44 in RF and ≈0.39 in XGBoost models), indicating its dominant role in yield prediction.

2.2.3. Temperature Data

The temperature data were collected from Meteo Rwanda and include daily, monthly, and annual measurements, which were subsequently aggregated into key indicators such as: annual mean temperature, annual minimum temperature, and annual maximum temperature. In general, the optimum temperature for maize growth ranges between 20 °C and 30 °C. Temperatures below 10 °C or above 35 °C can adversely affect growth and yield, but this can differ due to the overall soil quality and the nutrient content. Extreme high temperatures (above 40 °C) during pollination and grain-filling stages can lead to significant yield losses [12,13]. These variables were selected because temperature significantly affects physiological processes such as germination, photosynthesis, and grain-filling. In this study, temperature data spanning multiple decades were used. The data covered the period from 2011 to 2024.
Feature importance analysis revealed that temperature variables collectively contributed substantially to model performance, with annual mean temperature contributing approximately 0.276 in Random Forest and 0.299 in XGBoost models, highlighting its strong relationship with yield variability. Temperature variability influences maize yield both positively and negatively; optimal temperature ranges enhance growth, while extreme temperature values can reduce productivity [14]. The inclusion of minimum and maximum temperature values enables models to capture these nonlinear effects, improving prediction accuracy.

2.2.4. Soil Temperature

Soil temperature is a key factor influencing maize germination, root development, and nutrient uptake. In this study, soil temperature data were obtained from Meteo Rwanda and measured at a depth of approximately 10 cm, which is relevant for root zone processes. Maize typically grows in regions with abundant sunlight, and it is sensitive to variations in solar radiation. Insufficient solar radiation leads to poor growth, delays maturity, and reduces yields [15,16]. The analysis shows that the optimal soil temperature for maize germination ranges between 20 °C and 30 °C, with deviation from this range leading to delayed germination or poor crop establishment. The dataset indicated a relatively low variability in soil temperature over the study period, with values ranging between 21.21 °C and 23.39 °C. Despite its agronomic importance, soil temperature exhibited a low feature importance (≈0.003 in RF and ≈0.043 in XGBoost) in the predictive models. This suggests that, within the study area, soil temperature remained relatively stable and therefore contributed less to yield variability compared to rainfall and atmospheric temperature. However, soil temperature was retained in the model as it provides complementary information on soil conditions and supports a more comprehensive representation of environmental factors affecting maize growth.

2.2.5. Maize Crop Data

Maize crop yield data were collected from district agricultural offices, the National Institute of Statistics of Rwanda (NISR), and agronomist reports. The dataset includes annual maize production records for the period 2011–2024, expressed in tons per hectare (t/ha). The dataset was used as the target variable (dependent variable) in the machine learning models, while climatic and soil variables served as predictors.
The maize crop has a relatively high water requirement, especially during the vegetative and reproductive stages. In general, the total water requirement for the entire growing season ranges from 500 to 800 mm, depending on the climate and growing season length. Moisture stress during the silking and grain-filling stages can severely reduce yield. Excessive rainfall or waterlogging can also be detrimental, as maize is susceptible to root diseases in saturated soil conditions [17,18]. The daily rainfall provides discrepancies in rainfall distribution for the area under study, where the minimum daily rainfall ranged between 20 mm and 150 mm, which directly affects the expected maize productivity. The initiative aimed to transform subsistence farming into commercial agriculture to enhance food security. Based on the type of soil, among other crops identified to be cultivated in the Kayonza district is the maize crop, intending to shift from rudimentary practice to mechanized agriculture to enhance productivity and promote exports. The maize crop is cultivated in two seasons a year: Season A, from September to December, and Season B, from March to June. From the dataset, between 2011 and 2016, the productivity in (t/ha) was 2.64 on cultivated land with a size of 13,642 hectares. In 2017, we observed a drop in production of 0.9 (t/ha) on land having a size of 13,145 hectares. Between 2018 and 2024, the rate of production of the maize crop was observed to have a value of 3(t/ha) of productivity for land with a size of 12,053 hectares. The data were obtained from the National Institute of Statistics of Rwanda and Kayonza district. The study demonstrated that machine learning models, particularly Random Forest, achieved high predictive performance (R2 ≈ 0.957), indicating strong relationships between the selected environmental variables and maize yield. The integration of rainfall, temperature, soil temperature, and maize yield data ensured a comprehensive representation of the agro-climatic conditions influencing maize production, consistent with the methodology adopted in the referenced study.

2.2.6. Hyperparameter Selection and Justification

Hyperparameter tuning was systematically conducted to optimize the performance and generalization capability of the proposed machine learning models for the maize yield prediction. We combined grid search and random search to efficiently explore the hyperparameter space, while k-fold cross-validation k = 5 was used to ensure robustness given the limited dataset. The proposed models in this specific study were evaluated using standard regression metrics, including R2, RMSE, MSE, NRMSE, and NMSE, to ensure consistency with current best practices in crop yield prediction studies [19]. The Support Vector Machine (SVM), key hyperparameters, such as the regularization parameter (C), kernel coefficient, and epsilon, were tuned, with the radial basis function (RBF) kernel provided to capture the nonlinear relationship between climate variables and crop yield, as indicated in [20]. For the LASSO regression model, the regularization parameter was optimized to enable effective feature selection and mitigation among climate variables, as indicated in [21,22] for interpretable crop modeling. For the Random Forest, hyperparameters such as the number of trees, maximum tree depth, and minimum samples for splitting were tuned to enhance model stability. The RF has the ability to handle complex nonlinear interactions, as reported in [23]. For the Extreme Gradient Boosting (XGBoost) model, key parameters include the learning rate, number of estimators, and maximum depth to balance bias variation. In [24], studies highlighted that, in crop yield prediction, data quality and feature relevance are more critical than dataset size for achieving reliable prediction.
Figure 3 gives an overview of the Random Forest regression model.
The selection of the Extreme Gradient Boosting method was due to its ability to handle nonlinear interactions between weather variables and agronomic factors. The model minimizes residual errors and prevents overfitting. The model is highly optimized and regularized for better performance. The input features enter the first decision tree, and the residual errors from the first tree are computed; a new tree is trained to predict these residuals, and this process is repeated for multiple trees and summed to get the final yield. Figure 4 shows the general architecture of the XGBoost algorithm.
The Support Vector Machine was selected due to its ability to perform effective regression in a multidimensional feature space. The model had the ability to handle limited datasets and is suitable for historical data. The input features are mapped into a high-dimensional space via a kernel and identify a hyperplane that best predicts yield within a tolerance margin; the outputs predicted yield for the given input. Figure 5 shows the diagram process with clarity.
The Least Absolute Shrinkage and Selection Operator was selected for its efficiency in modeling linear relationships between predictors and response variables. The model assesses the relationship between climatic variables and maize crop yield and can examine the influence of predictor variables on the response variable.
β ^ = a r g m i n β 1 2 n Σ i = 1 n y i Σ j = 1 p β j x i j 2 ,   | | β | | 1 t
The mathematical equation [26] describes a regression method that performs both variable selection and regularization, intended to enhance prediction accuracy.

2.3. Model Evaluation

To evaluate the proposed model, the following metrics were considered: the mean absolute error (MAE), Root Mean Squared Error, Coefficient of Determination, Normalized Mean Squared Error, and Mean Squared Error. These measurable metrics ensure the reliability and validity of the model and support decision-making in choosing the proposed model. The formula for the mean absolute error is
MAE = 1 / n i = 1 n y i y ^ i
where n is the number of data points; yi is the actual (observed) value for the i-th data point; and y ^ i is the predicted value for the i-th data point.
The Root Mean Squared Error (RMSE) is the root-mean-square difference between the anticipated and actual values.
RMSE = √ [((y1 − ŷ1)2 + (y2 − ŷ2)2 + (y3 − ŷ3)2 + … + (yn − ŷn)2)/n]
where n is the number of data points; yi is the actual (observed) value for the i-th data point; and ŷn is the predicted value for the i-th data point.
Coefficient of Determination (R2): a statistical measurement that examines the variance in dependent variables in regression models.
R2 = 1 − [Σ (yi − ŷi)2/Σ (yi − ȳ)2]
where yi is the observed value, ŷi is the predicted value, and ȳ is the mean of the observed values.
The Normalized Mean Squared Error is a performance metric used in regression models. It measures the deviation between predicted and measured values expressed as follows:
NMSE = [(1/n) Σ (yi − ŷi)2]/[(1/n) Σ (yi − ȳ)2]
The Mean Squared Error (MSE) is a metric employed to evaluate regression models. It measures the average of the squared differences between the actual and predicted values. The equation below expresses the metric.
MSE = [(y1 − ŷ1)2 + (y2 − ŷ2)2 + (y3 − ŷ3)2 + … + (yn − ŷn)2]/n
where n is the number of observations, yn is the actual observation, and ŷn is the predicted value.

2.4. Cross-Validation

To assess the proposed model’s performance in machine learning, k-fold cross-validation was utilized to reduce the variability of data and increase consistency and reliability. To ensure reliable predictions, k = 5 was used [22].

3. Results

3.1. Prediction Results

To confirm the results indicated in Table 2 for decision-making, a 5-fold validation (k = 5) was used to ensure the reliability and robustness of the proposed models. Based on RF, SVM, and XGBoost, the R2 and low MAE/RMSE values, as indicated in Table 2, show that the predicted maize yield is closer to the actual production values. The LASSO model indicates limitations in capturing complex interactions that exist between climatic variables and crop yield. The value of the standard deviation indicates consistency of deviations across the folds.

Graphical Representation

Graphical representations in Figure 6, Figure 7 and Figure 8 present similarities where points are clustered closely to the diagonal line; this reaffirms the high performance of Random Forest, Support Vector Machine, and Extreme Gradient Boosting machine learning models. Points indicated above the diagonal line simply indicate occasional overestimation and can be tolerated as minor bias. However, Figure 9 shows a large number of points above and under the diagonal line. This indicates that a large number of crop yields are both underestimated and overestimated, and that the performance is not viable and cannot be trusted; this implies that the model is not viable and not advised for a linear regression model.

3.2. Variable Importance

Understanding features that contribute to machine learning models is crucial for accurate prediction. As indicated in Figure 10 and Table 3, rainfall, temperature, and soil temperature contribute differently to the prediction of the maize crop.

3.2.1. Random Forest Feature Importance

From the results, the contribution of rainfall is 44.4%, the annual mean temperature is 27.6%, the annual maximum temperature is 12.6%, the annual minimum temperature is 15.1%, and soil temperature is 0.3%. This provides a clear understanding of the contributions of rainfall and temperature to the prediction of maize crop yield, compared with soil temperature.

3.2.2. Extreme Gradient Boost Regressor

As indicated in Figure 11 and Table 4, the feature importance indicates that rainfall is the main influential predictor, at a proportion of 38.8%; this highlights the important contribution of water in the maize yield growth process. The annual minimum temperature and the annual minimum temperature demonstrate moderate contributions, with proportions of 29.90% and 15.5%, respectively. However, soil temperature exhibits a negligible contribution, with a proportion of 4.3%. This indicates the minimal influence of soil temperature in this prediction model. Overall, these results emphasize the contribution of rainfall and temperature as critical factors in the proposed model predictions.

3.2.3. Performance Evaluation of the Proposed Machine Learning

The proposed predictive machine learning method was evaluated using metrics such as R2, RMSE, MSE, NRMSE, and NMSE. From the assessment, the Random Forest algorithm demonstrated high performance, with R2 = 0.957, RMSE = 1.279 t/ha, MSE = 1.636 t2/ha2, NRMSE = 49.2%, and NMSE = 0.242. The results indicated that Random Forest has the lowest prediction error compared to the mean yield. The Support Vector Machine and XGBoost models performed well, with slightly higher error values (SVM: RMSE = 1.311 t/ha, NRMSE = 50.4%, NMSE = 0.254; XGBoost: RMSE = 1.334 t/ha, NRMSE = 51.3%, NMSE = 0.263). These results testify that nonlinear and ensemble-based approaches consider complex climate interactions. In comparison, the LASSO regression model expressed a poor performance (R2 = 0.256, RMSE = 5.302 t/ha, NRMSE = 203.9%, NMSE = 4.16).

4. Discussion

The findings of this study show that maize yield is greatly affected by climate factors, especially rainfall and temperature. Using machine learning techniques to predict crop yield has great potential for farmers, researchers, and policymakers; this aligns with the existing research highlighting machine learning’s role in crop prediction. According to [27], the authors analyze various machine learning techniques and key influencing factors such as climate data and soil conditions, arguing that machine learning-driven crop yield predictions can optimize resource use, improve food security, and help practitioners to adapt to climate change despite extreme environmental conditions. In study [28], an analysis compared the performance of supervised and unsupervised machine learning, finding that Random Forest and Artificial Neural Networks performed well, with R2 values ranging from 0.75 to 0.92. Study [29] pointed out that ensemble models such as Random Forest, Gradient Boost, and CNN consider nonlinear, spatial, and temporal patterns in agricultural data, and that combining climate variables and soil data improves prediction accuracy. In [30], a comprehensive review of emerging machine learning techniques for crop yield prediction was conducted, highlighting the increasing use of algorithms like Random Forest, Support Vector Machines, and Artificial Neural Networks; the study showed Random Forest achieving R2 values between 0.75 and 0.90, while Artificial Neural Networks often exceeded 0.80 in various crop yield studies. In [31], supervised machine learning models, such as RF, were used for crop recommendation and yield prediction based on climate data, reporting R2 = 0.89, RMSE = 0.31, and MAE = 0.22. The Support Vector Machine (SVM) demonstrated an even better performance, with R2 = 0.955 and RMSE = 1.311 t/ha, showing its ability to handle complex relationships in small datasets. In [32], researchers used UAV-derived multispectral data and showed that models like SVM and K-Nearest Neighbor could predict multi-stage corn yields, with R2 values up to 0.84 and RMSE as low as 0.69 mg/ha for specific treatments. XGBoost also provided reliable predictions, with R2 = 0.953 and RMSE = 1.334 t/ha, indicating its robustness in handling structured agricultural data. In [33], the author evaluated various machine learning techniques for crop yield prediction, finding that XGBoost achieved the highest performance, with R2 = 0.974, RMSE = 15,203.15, and MAE = 2681.32, outperforming other regressors like K-Nearest Neighbors in an agricultural setting. However, LASSO regression showed weaker performance, with R2 = 0.256, likely due to its inability to manage nonlinear relationships in agricultural yield prediction. The review paper [34] highlights the use of machine learning in crop yield prediction, with rainfall and temperature identified as key factors. Among various techniques, AI combined with remote sensing data outperforms traditional statistical methods. In [35], the author proposes integrating network science with machine learning to study crop growth, considering nodes, edges, and relationships. Features based on time series, such as lag, rolling mean, and volatility, were used, showing that temporal features explain 50–20% of the model’s performance; the lag-1 yield accounts for 30–50% of the variance, while network features contribute less than 1%. In [36], a decision support system using a hybrid approach was proposed, showing that hybrid and data-driven models improve accuracy for sustainable agriculture. In [37], the author combines tuned nonlinear autoregression neural networks (ENLARNN) and generalized regression neural networks (GRNN) for error detection, finding that the RMSE was reduced by 9.6–19.5% compared to the ensemble model NLARNN. The model offers consistent performance and supports irrigation planning. Ref. [38] evaluates the performance of machine learning models at a 10 m resolution using multi-temporal Sentinel data. The models were trained on spectral reflectance and vegetation indices at different growth stages, and the authors concluded that they are more precise and better at capturing crop phenology. In [39], remote sensing, computer vision, and multi-source data fusion were used to detect crop diseases, identify pests, and predict yields. The study showed that Convolution Neural Networks are more effective for crop prediction and disease detection. Lastly, in [40], a combination of economic models and machine learning was used to explore how these concepts interact over time and through nonlinear relationships.

5. Conclusions

This study demonstrates that maize crop yield production is influenced by many factors, including climatic conditions and soil temperature. The approach of applying machine learning techniques in predicting maize crop yield sheds light on its characteristics to handle complex interactions and enhance performance and adaptability. The utilization of strong evaluation metrics such as R2, RMSE, NRMSE, and NMSE helps in decision-making for the choice of machine learning to be applied. The analysis of feature importance revealed that rainfall and temperature variables contribute the most to maize crop growth. The use of the cross-validation technique ensures the reliability of the model. The results demonstrate that, from the proposed models, Random Forest, Support Vector Machine, and Extreme Boost performed better compared to the LASSO model.
This shows the contribution of nonlinear ensemble learning methods for complex interactions. The research findings have implications for smallholder farmers by enabling them to consider potential risks related to climate variability and take appropriate mitigation strategies. Despite the promising findings, limitations should be acknowledged. First, the set of climatic variables—temperature, rainfall, and soil temperature—compared to other factors such as soil fertility, humidity, wind speed, and radiation, was not considered and may constrain the comprehensiveness of the predictive models. Second, due to limited coverage, the applied model may not fit other regions due to the different climatic conditions. Finally, this study focused only on historical data and does not consider future climate change conditions, which may limit the model’s capability in long-term forecasting conditions.

Author Contributions

Conceptualization, B.M.L. and R.M.; methodology, O.G.; software, B.M.L.; validation, R.M., O.G., and C.T.; formal analysis, B.M.L.; investigation, B.M.L.; data curation, B.M.L.; writing—original draft preparation, B.M.L.; writing—review and editing, R.M.; visualization, O.G.; supervision, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request.

Acknowledgments

We thank the African Center of Excellence in Internet of Things (ACEIoT), University of Rwanda, for providing a conducive platform to perform this research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MAEMean Absolute Error
RMSERoot Mean Squared Error
RFRandom Forest
SVMLinear Dichroism
LASSOSupport Vector Machine
XGBoostExtreme Gradient Boosting
AIArtificial Intelligence
MLMachine Learning

References

  1. Shevchenko, V.; Lukashevich, A.; Taniushkina, D.; Bulkin, A.; Grinis, R.; Kovalev, K.; Narozhnaia, V.; Sotiriadi, N.; Krenke, A.; Maximov, Y. Climate Change Impact on Agricultural Land Suitability. IEEE Access 2024, 12, 15748–15763. [Google Scholar] [CrossRef]
  2. Updated Nationally Determined Contribution, Page 9. Available online: https://unfccc.int/NDCREG (accessed on 5 May 2020).
  3. Perevedentsev, Y.P.; Vasil’ev, A.A. Climate Change and Its Impact on Agriculture. Russ. Meteorol. Hydrol. 2023, 48, 739–744. [Google Scholar] [CrossRef]
  4. Pereira, L. Climate Change Impacts on Agriculture Across Africa; Oxford University Press: Oxford, UK, 2017. [Google Scholar] [CrossRef]
  5. Bello, O.B.; Ganiyu, O.T.; Wahab, M.K.A.; Afolabi, M.S.; Oluleye, F.; Ig, S.A.; Mahmud, J.; Azeez, M.A.; Abdulmaliq, S.Y. Evidence of Climate Change Impacts on Agriculture and Food Security in Nigeria. Int. J. Agric. For. 2012, 2, 49–55. [Google Scholar] [CrossRef]
  6. Fifth Strategic Plan Agriculture Transformation PSTA 5. Building Resilient and Sustainable Agri-Food Systems; Ministry of Agriculture: Kigali, Rwanda, 2024.
  7. Mahin, A.; Adnan, N.; Khondoker, R. Precision Agriculture Using Machine Learning and Deep Learning Algorithms. A Comprehensive Study. J. Agric. Educ. Res. 2026. Available online: https://www.researchgate.net/publication/391483355_Precision_Agriculture_using_Machine_Learning_and_Deep_Learning_Algorithms_A_Comprehensive_Study (accessed on 8 April 2026).
  8. Lionel, B.M.; Musabe, R.; Gatera, O.; Twizere, C. A comparative study of machine learning models in predicting crop yield. Discov. Agric. 2025, 3, 151. [Google Scholar] [CrossRef]
  9. Kuradusenge, M.; Hitimana, E.; Hanyurwimfura, D.; Rukundo, P.; Mtonga, K.; Mukasine, A.; Uwitonze, C.; Ngabonziza, J.; Uwamahoro, A. Crop Yield Prediction using machine learning models: Case of Potato and Maize. Agriculture 2023, 13, 225. [Google Scholar] [CrossRef]
  10. Malashin, I.; Tynchenko, V.; Gantimurov, A.; Nelyub, V.; Borodulin, A.; Tynchenko, Y. Predicting Sustainable crop yields: Deep learning and Explainable AI Tools. Sustainability 2024, 16, 9437. [Google Scholar] [CrossRef]
  11. Ahmed, S.; Raza, B.; Hussain, L.; Aldweesh, A.; Omar, A.; Khan, M.S.; Eldin, E.T.; Nadim, M.A. The Deep learning Resnet 101 and ensemble XGBoost algorithm with hyperparameter optimization accurately predict lung cancer. Appl. Artif. Intell. 2023, 37, 2166222. [Google Scholar] [CrossRef]
  12. Zhang, N.; Qu, Y.; Song, Z.; Chen, Y.; Jiang, J. Responses and sensitivities of maize phenology to climate change from 1971 to 2020 in Henan Province, China. PLoS ONE 2022, 17, e0262289. [Google Scholar] [CrossRef] [PubMed]
  13. Dwamena, H.A.; Tawiah, K.; Kodua, A.S.A. The Effect of Rainfall, Temperature, and Relative Humidity on the Yield of Cassava, Yam, and Maize in the Ashanti Region of Ghana. Int. J. Agron. 2022, 2022, 9077383. [Google Scholar] [CrossRef]
  14. Waqas, M.A.; Wang, X.; Zafar, S.A.; Noor, M.A.; Hussain, H.A.; Azher Nawaz, M.; Farooq, M. Thermal Stresses in Maize: Effects and Management Strategies. Plants 2021, 10, 293. [Google Scholar] [CrossRef]
  15. Li, Y.; Lang, J.; Ji, L.; Zhong, J.; Wang, Z.; Guo, Y.; He, S. Weather Forecasting Using an Ensemble of Spatial-Temporal Attention Networks and Multi-Layer Perceptron. Asia-Pac. J. Atmos. Sci. 2021, 57, 533–546. [Google Scholar] [CrossRef]
  16. Jabel, M.A.; Azmi Murad, M.A. Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches with insight for future research and sustainability. Heliyon 2024, 10, e40836. [Google Scholar] [CrossRef]
  17. Mishra, S.; Mishra, D.; Santra, G.H. Application of machine learning techniques in agricultural crop production: A review paper. Indian J. Sci. Technol. 2016, 9, 1–14. [Google Scholar] [CrossRef]
  18. Morales, A.; Villalobos, F.J. Using machine learning for crop yield production in the past or for the future. Front. Plant Sci. 2023, 14, 1128388. [Google Scholar] [CrossRef] [PubMed]
  19. Bali, N.; Singla, A. Emerging Trends in Machine Learning to Predict Crop Yield and Study Its Influential Factors: A Survey. Arch. Computat. Methods Eng. 2021, 29, 95–112. [Google Scholar] [CrossRef]
  20. Abbasi, M.; Rahman, M.M.; Chen, D. Machine learning approaches for predicting maize biomass yield using environmental variables. Agric. Syst. 2025, 210, 103705. [Google Scholar]
  21. Sharma, R.K.; Kaur, J.; Feng, G.; Huang, Y.; Kumar, C.; Wang, Y.; Sharma, S.; Jenkins, J.; Dhillon, J. Maize and soybean yield prediction using machine learning methods: A systematic literature review. Discov. Agric. 2025, 3, 64. [Google Scholar] [CrossRef]
  22. Sapkota, B.R.; Baath, G.S.; Flynn, K.C.; Adhikari, K.; Hajda, C.; Smith, D.R. Machine learning algorithms for maize yield prediction using multispectral imagery. Sci. Remote Sens. 2025, 11, 100123. [Google Scholar]
  23. Kok, Z.H.; Shariff, A.R.M.; Alfatni, M.S.M.; Khairunniza-Bejo, S. Support vector machine in Precision Agriculture: A review. Comput. Electron. Agric. 2021, 191, 106546. [Google Scholar] [CrossRef]
  24. Patil, P.; Athavale, P.; Bothara, M.; Tambolkar, S.; More, A. Crop selection and yield prediction using machine learning approach. Curr. Agric. Res. J. 2023, 11, 968–980. [Google Scholar] [CrossRef]
  25. Bhagat, M.; Bakariya, B. A Comprehensive Review of Cross-Validation Techniques in Machine Learning. Int. J. Sci. Technol. 2025. Available online: https://www.ijsat.org/papers/2025/1/1305.pdf (accessed on 8 April 2026).
  26. Emmart, F.; Dehmer, M. High Dimension LASSO—Based computational Regression Model; Regularization, Shrinkage and Selection. Mach. Learn. Knowl. 2019, 1, 359–383. [Google Scholar] [CrossRef]
  27. Islam, M.M.; Alharthi, M.; Alkadi, R.S.; Islam, R.; Masum, A.K.M. Crop yield prediction through machine learning. A path towards sustainable agriculture and climate resilience in Saudi Arabia. AIMS Agric. Food 2024, 9, 980–1003. [Google Scholar] [CrossRef]
  28. Menon, A.G.; Prabhakar, M. Smart Agriculture Monitoring Rover for Small-Scale Farms in Rural Areas using IoT. In Proceedings of the 2021 IEEE International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, ICSES 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
  29. Nikhil, U.V.; Pandiyan, A.M.; Raja, S.P.; Stamenkovic, Z. Machine learning-based crop yield prediction in South India: Performance Analysis of various models. Computers 2024, 13, 137. [Google Scholar] [CrossRef]
  30. Badshah, A.; Alkazemi, B.Y.; Din, F.; Zamli, K.Z.; Haris, M. Crop classification and yield prediction using robust machine learning models for agricultural sustainability. IEEE Access 2025, 12, 162799–162813. [Google Scholar] [CrossRef]
  31. Hernández, G.C.; Gómez Gómez, J.; Jiménez-Cabas, J. Predictive Models Based on Artificial Intelligence to estimate crop yield. Agriculture 2025, 15, 2438. [Google Scholar] [CrossRef]
  32. Fashoto, S.G.; Mbunge, E.; Ogunleye, G.; Van den Burg, J. Implementation of machine learning for predicting maize crop yield using multiple linear regression and backward elimination. Malays. J. Comput. 2021, 6, 679. [Google Scholar] [CrossRef]
  33. Chitradurga, M.; Palayyan, B.P. An efficiency crop yield prediction framework using a hybrid machine learning model. Rev. D Intell. Artif. 2023, 37, 1157–1167. [Google Scholar]
  34. Yan, Y.; Wang, Y.; Li, J.; Zhang, J.; Mo, X. Crop Yield Time-Series Data Prediction Based on Multiple Hybrid Machine Learning Models. Appl. Comput. Eng. 2025, 133, 217–223. [Google Scholar] [CrossRef]
  35. Arizo-García, P.; Castiñeira-Ibáñez, S.; Cruzado-Campos, E.; San Bautista, A.; Rubio, C. High resolution wheat and Barley yield forecasting using multi-temporal satellite time series and machine learning. Agriculture 2026, 16, 516. [Google Scholar] [CrossRef]
  36. Shinyclimensa, C.; Parthiban, A. Network-enhanced machine learning framework for multi-crop yield prediction: A comprehensive analysis of Indian agriculture data. Front. Agron. 2026, 8, 1767878. [Google Scholar] [CrossRef]
  37. Zhang, R.; Wu, X.; Li, J.; Zhao, P.; Zhang, Q.; Wuri, L.; Zhang, D.; Zhang, Z.; Yang, L. A bibliometric review of deep learning in crop monitoring: Trends, challenges and future perspectives. Front. Artif. Intell. 2025, 8, 1636898. [Google Scholar] [CrossRef] [PubMed]
  38. Singh, K.; Yadav, M.; Barak, D.; Bansal, S.; Moreira, F. Machine learning-based frameworks for reliable and sustainable crop forecasting. Sustainability 2025, 17, 4711. [Google Scholar] [CrossRef]
  39. Zhao, X.; Deng, X.; Xiang, D.; Wang, S. Analysis and prediction of the coupling coordination relationship between digital economy and agricultural new quality productivity. In Proceedings of the 2026 International Conference on Digital Economy and Agricultural Development; IEEE: Piscataway, NJ, USA, 2026. [Google Scholar]
  40. Rahman, A.A.; Rathipriya, R.; Meero, A.; Hamdan, H. Hybrid Neural Networks for improved crop yield prediction and water demand estimation. Arab Gulf J. Sci. Res. 2025, 43, 99–115. [Google Scholar]
Figure 1. Map of the Republic of Rwanda with a blue arrow indicating the area under study.
Figure 1. Map of the Republic of Rwanda with a blue arrow indicating the area under study.
Algorithms 19 00448 g001
Figure 2. Methodology for crop prediction using ML.
Figure 2. Methodology for crop prediction using ML.
Algorithms 19 00448 g002
Figure 3. Structure of the Random Forest model [8].
Figure 3. Structure of the Random Forest model [8].
Algorithms 19 00448 g003
Figure 4. Block diagram of XGBoost Gradient [25].
Figure 4. Block diagram of XGBoost Gradient [25].
Algorithms 19 00448 g004
Figure 5. SVM block diagram concept [9].
Figure 5. SVM block diagram concept [9].
Algorithms 19 00448 g005
Figure 6. Graphical visual prediction of the Random Forest model.
Figure 6. Graphical visual prediction of the Random Forest model.
Algorithms 19 00448 g006
Figure 7. Graphical visual prediction of the XGBoost prediction model.
Figure 7. Graphical visual prediction of the XGBoost prediction model.
Algorithms 19 00448 g007
Figure 8. Graphical visual prediction of the SVM model.
Figure 8. Graphical visual prediction of the SVM model.
Algorithms 19 00448 g008
Figure 9. Graphical visual of the LASSO prediction model.
Figure 9. Graphical visual of the LASSO prediction model.
Algorithms 19 00448 g009
Figure 10. The contribution of feature importance of the RF model.
Figure 10. The contribution of feature importance of the RF model.
Algorithms 19 00448 g010
Figure 11. The contribution of feature importance for the XGBoost model.
Figure 11. The contribution of feature importance for the XGBoost model.
Algorithms 19 00448 g011
Table 1. Training and testing sets.
Table 1. Training and testing sets.
DatasetYears CoveredSeasons per Year Number of Samples
Training setYear 1–10220
Testing set Year 11–1326
Table 2. Model performance.
Table 2. Model performance.
ModelR2 (±)MAE (t/ha) (±)RMSE (t/ha) (±)MSE (t2/ha2)NRMSE (%)NMSE
RF0.957 ± 0.00191.018 ± 0.0261.279 ± 0.0231.62649.190.242
SVM0.955 ± 0.00161.047 ± 0.0231.311 ± 0.0151.71850.420.254
XGBoost0.953 ± 0.00161.058 ± 0.0191.334 ± 0.0151.7851.310.263
LASSO0.256 ± 0.0104.026 ± 0.1105.302 ± 0.12028.11203.924.16
Table 3. Importance of features.
Table 3. Importance of features.
FeatureImportance in %
annual_mean_temp2.76
annual_max_temp12.6
annual_min_temp15.1
annual_rainfall44.4
soil_temp0.3
Table 4. Contribution of feature importance for the XGBoost model.
Table 4. Contribution of feature importance for the XGBoost model.
FeatureImportance in %
annual_mean_temp29.9
annual_max_temp15.5
annual_min_temp11.6
annual_rainfall38.8
soil_temp4.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lionel, B.M.; Musabe, R.; Gatera, O.; Twizere, C. Performance Analysis of Machine Learning Techniques in Predicting Maize Crop Yield: Case Study of Kayonza District—Rwanda. Algorithms 2026, 19, 448. https://doi.org/10.3390/a19060448

AMA Style

Lionel BM, Musabe R, Gatera O, Twizere C. Performance Analysis of Machine Learning Techniques in Predicting Maize Crop Yield: Case Study of Kayonza District—Rwanda. Algorithms. 2026; 19(6):448. https://doi.org/10.3390/a19060448

Chicago/Turabian Style

Lionel, Bobo Mafrebo, Richard Musabe, Omar Gatera, and Celestin Twizere. 2026. "Performance Analysis of Machine Learning Techniques in Predicting Maize Crop Yield: Case Study of Kayonza District—Rwanda" Algorithms 19, no. 6: 448. https://doi.org/10.3390/a19060448

APA Style

Lionel, B. M., Musabe, R., Gatera, O., & Twizere, C. (2026). Performance Analysis of Machine Learning Techniques in Predicting Maize Crop Yield: Case Study of Kayonza District—Rwanda. Algorithms, 19(6), 448. https://doi.org/10.3390/a19060448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop