Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI

Karakoy, Tolga; Yelmen, Ilkay; Zontul, Metin; Yildirim, Fazli

doi:10.3390/agronomy16070768

Open AccessArticle

Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI

¹

Department of Plant Protection, Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, Sivas 58000, Türkiye

²

Department of Software Engineering, Faculty of Engineering and Natural Sciences, Istinye University, Istanbul 34396, Türkiye

³

Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Sivas University of Science and Technology, Sivas 58000, Türkiye

⁴

Faculty of Economics, Administrative and Social Sciences, Department of Management Information Systems, Istanbul Topkapı University, Istanbul 34087, Türkiye

^*

Authors to whom correspondence should be addressed.

Agronomy 2026, 16(7), 768; https://doi.org/10.3390/agronomy16070768

Submission received: 27 February 2026 / Revised: 24 March 2026 / Accepted: 31 March 2026 / Published: 7 April 2026

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Chickpea (Cicer arietinum L.) is a globally important legume crop whose grain yield is strongly influenced by environmental and agronomic variability. This study aimed to predict chickpea grain yield using artificial neural networks (ANNs) and to identify key traits associated with yield formation across different genotypes under semi-arid conditions. The dataset consisted of 96 chickpea genotypes evaluated over two growing seasons (2022–2023) in Sivas, Türkiye. The results demonstrated that reproductive traits, particularly seed weight per plant, number of pods per plant, and number of seeds per plant, were the most influential factors determining grain yield. Environmental variability also contributed significantly to yield prediction, highlighting the importance of genotype–environment interactions. The developed ANN model showed high predictive accuracy, indicating its robustness in capturing complex relationships among yield-related traits. Beyond prediction, the model provides biologically meaningful insights into trait prioritization, supporting its application in chickpea breeding programs. Overall, the findings suggest that ANN-based approaches can serve as effective decision-support tools in precision agriculture by enabling accurate yield estimation, facilitating the selection of high-performing genotypes, and identifying key breeding traits for sustainable crop improvement.

Keywords:

chickpea yield prediction; artificial neural networks; explainable AI; SHAP; precision agriculture

1. Introduction

Chickpea (Cicer arietinum L.) is one of the most widely grown and consumed grain legumes in the world. It is valued for its nutritional importance, its positive role in soil fertility through symbiotic nitrogen fixation, and its ability to grow under different environmental conditions. Even so, factors such as low water availability and unfavorable temperatures can seriously affect its growth, yield, and seed quality [1,2]. Chickpea production is strongly influenced by both genotype and growing conditions, which can lead to marked variation in agronomic traits such as seed yield and 100-seed weight, particularly under drought stress [3]. This variability is also reflected in disease response and planting time effects, as previous studies have shown that ascochyta blight infection and grain weight per plant may differ considerably among chickpea genotypes under different sowing periods [4]. Chickpea is widely grown across many countries and remains an important crop in both traditional and developing agricultural systems. Its productivity is strongly influenced by environmental conditions, especially in water-limited environments, which has made the development of short-duration and stress-tolerant varieties a major focus of chickpea improvement programs [5]. In addition to varietal improvement, agronomic practices such as the inoculation of rhizobium and azotobacter strains have also been reported to improve plant growth and grain yield under challenging agroecological conditions.

Breeding projects aim to improve crop output, particularly by creating resilient, high-quality chickpea varieties [5,6,7,8]. Path coefficient analysis is a crucial tool for optimizing agricultural productivity by understanding the correlation between yield-associated characteristics and seed yield [9]. Linear regression models identify critical characteristics for enhancing yield in fodder pea, including plant height (61–70 cm), beans (7–9), seeds per plant (35–40), and 1000-seed weight (150–260 g) [10]. Yield capacity prediction is complex and influenced by multiple interacting factors, including soil physicochemical properties, environmental conditions, and genotype [11,12].

Regression models, including linear and nonlinear forms, are used to predict crop productivity by examining correlations between independent and dependent variables, with multiple linear regression techniques being particularly useful [13]. Furthermore, principal component analysis (PCA) diminishes dataset complexity while maintaining covariance, hence providing enhanced understanding of the determinants influencing yield [14]. Correlation analysis of yield-related variables is crucial in breeding programs, influencing productivity and crop performance. Regression models are essential for selecting chickpea lines. However, these conventional statistical approaches may have limited ability to represent the complex and nonlinear interactions among agronomic, environmental, and genotype-related variables affecting chickpea yield. In contrast, artificial neural networks (ANNs) are more suitable for modeling such multidimensional relationships, as they can capture nonlinear patterns and variable interactions without requiring strict parametric assumptions. Therefore, ANN-based approaches may provide a more flexible and powerful framework for chickpea grain yield prediction under variable field conditions.

Machine learning is a powerful tool for creating accurate prediction models using various algorithms like artificial neural networks, decision trees, genetic algorithms, fuzzy logic, and regression [15]. These models have been proven to effectively train machine learning algorithms, manage intricate input-output relationships, and improve the selection and elimination of relevant characteristics [16]. In agriculture, these algorithms are frequently utilized to precisely identify descriptive aspects for the evaluation of product quality [17]. ANNs, particularly multilayer perceptrons (MLPs), are crucial in data processing due to their unidirectional data flow from the input to the output layer [18,19]. Among several machine learning techniques, ANNs and random forests (RF) are recognized for their effectiveness in evaluating food characteristics [20]. The RF technique enhances classification by generating multiple decision trees using bootstrap samples of the original training data, so providing a robust approach for feature distinction. Numerous researchers have comprehensively researched the physical qualities of chickpea seeds, including form, size, and color, as well as various agronomic variables related to biological yields [21,22].

A comparative analysis was conducted between ARIMA and ANN models to forecast chickpea yield in the Prayagraj region of Uttar Pradesh, India. Using climatic data (temperature and rainfall) and yield records from 1996 to 2020, projections were made for the period between 2021 and 2025. The results showed that the ANN model outperformed ARIMA, achieving a lower root mean square error (RMSE = 66.716) and a higher coefficient of determination (R² = 0.96). These findings suggest that ANN-based approaches offer more accurate predictions in agricultural yield forecasting, especially in regions affected by climate variability [23].

In another study, rainfed chickpea yield was predicted by integrating remote sensing indices, such as NDVI, EVI, LAI, FPAR, GPP, and ET, with meteorological variables including temperature, humidity, sunshine hours, and precipitation. Support Vector Regression (SVR), Random Forest (RF), and K-Nearest Neighbors (KNN) algorithms were applied to data from 11 counties in Kermanshah, Iran, a region notable for its chickpea cultivation. Model accuracy was assessed using leave-one-out cross-validation (LOOCV), and among the tested methods, Random Forest demonstrated the highest predictive performance, with deviations from official yield statistics limited to 7–8%. The findings highlight the potential of machine learning approaches in improving yield forecasts, particularly in semi-arid environments where climatic variability strongly influences agricultural output [12,24].

Recent studies have demonstrated that machine learning algorithms such as Random Forest, Neural Networks, and XGBOOST are widely used in chickpea yield prediction. Among these, XGBOOST has shown superior performance due to its ability to manage feature interactions and multicollinearity effectively. Moreover, incorporating outputs from mechanistic models, such as biomass and soil moisture, into data-driven models has further improved prediction quality and interpretability, making hybrid approaches particularly promising for precision agriculture applications [25].

A study investigated the potential of machine learning techniques for predicting chickpea seed mass, a trait strongly associated with overall yield, using image-derived physical and color attributes. By applying non-destructive image processing methods, the study evaluated the performance of four algorithms: Random Forest (RF), Multilayer Perceptron (MLP), Support Vector Regression (SVR), and k-Nearest Neighbors (k-NN). Among these, the Random Forest algorithm provided the most accurate predictions, highlighting its robustness in handling complex input features and its suitability for agricultural trait modeling. These findings underscore the value of integrating image analysis and machine learning to support efficient yield estimation and cultivar selection in chickpea breeding programs [26].

In another study related to chickpea yield prediction, the APSIM-chickpea model was evaluated and enhanced to improve its accuracy in simulating grain yield under variable environmental conditions. After incorporating the effects of soil moisture and frost risk into the thermal time calculation, the model showed significantly improved performance. The updated model achieved a coefficient of determination (R²) of 0.70 and a root mean square deviation (RMSD) of 293 kg/ha, demonstrating better agreement between predicted and observed yields. These results confirm the model’s enhanced capability in capturing the impact of cold temperatures during the reproductive stage on chickpea productivity, making it a reliable tool for yield forecasting and agronomic decision-making [1].

Although machine learning methods are increasingly used in agricultural studies, chickpea yield prediction under field conditions remains relatively underexplored, particularly with respect to genotype-dependent variation and seasonal environmental effects. Previous studies have largely emphasized predictive performance, while giving less attention to the biological interpretation of model outputs and the relative importance of yield-related traits. In this context, the present study aimed to evaluate the applicability of an ANN-based model for predicting chickpea grain yield under semi-arid conditions and to identify the agronomic traits with the greatest contribution to model performance. The novelty of the study lies in the integration of ANN-based modeling with SHAP analysis, which enables both accurate prediction and interpretable insight into the factors driving yield variation.

This study contributes to the literature by strengthening data-driven approaches in agriculture through the application of ANN-based models for chickpea yield prediction:

The application of ANNs enabled reliable and accurate prediction of chickpea grain yield, demonstrating strong predictive capability in modeling complex relationships among agronomic and environmental variables.
The modeling process involved the simultaneous evaluation of numerous agronomic and environmental inputs, allowing for a more comprehensive analysis of the parameters affecting yield.
The ANN models employed in the study provided an effective alternative for modeling complex relationships arising from genetic diversity and environmental variability, offering a more flexible and adaptive approach than classical statistical methods.
The research presents a practical application that can support the development of data-based agricultural planning and early warning systems, particularly in regions heavily affected by climate variability and drought.
The predictive analysis of different genotypes’ performances laid the groundwork for the development of tools that can aid in identifying high-yielding lines.

2. Materials and Methods

In this study, 86 wild chickpea (Cicer reticulatum) genotypes obtained from the International Center for Agricultural Research in the Dry Areas (ICARDA genebank) and various regions of Turkey, along with 10 registered chickpea cultivars (İnci, Seçkin, Azkan, Işık, Aksu, Ubet, Tolga 01, Çakır, Arda and Hasanbey), were used as plant materials.

2.1. Experimental Site and Duration

The study was conducted over two consecutive chickpea growing seasons in 2022 and 2023 at the Agricultural R&D Center of the Faculty of Agricultural Sciences and Technology, Sivas University of Science and Technology, Sivas (39°43′09.13″ N 36°55′15.71″ E), Turkey. The field experiment was designed according to an augmented experimental design consisting of four blocks. Sowing was carried out in rows of 1 m length with 45 cm spacing between rows. Throughout the growing period, all necessary cultural practices were properly performed, and fertilization was applied at a rate of 4 kg nitrogen (N) and 6 kg phosphorus (P₂O₅) per decare. At the end of the maturity stage, five plants were randomly selected from each plot, and the following traits were recorded; days to emergence (days), days to flowering (days), days to first pod set (days), days to maturity (days), plant height (cm), first pod height (cm), number of pods per plant (piece), number of seeds per plant (piece), seed weight per plant (g), 100-seed weight (g), biological yield (kg ha⁻¹) and grain yield (kg da⁻¹). Descriptive statistical analyses, including minimum, maximum, mean values, and standard errors were calculated using the JMP 7 statistical software package.

2.2. Climatic Characteristics of the Experimental Site

The climate of Sivas is classified as a continental climate, characterized by hot and dry summers and cold, snowy winters. The key climatic parameters, including relative humidity, total precipitation, and temperature during the study period, are presented in Table 1.

During the experimental period in 2022, the total precipitation was at its lowest in July (0 mm) and peaked in June (116.6 mm). In 2023, the lowest total precipitation was recorded in July (3.0 mm), while the highest was observed in April (74.8 mm) (Table 1).

Regarding average temperature values, the lowest temperature in 2022 was recorded in April (12.2 °C), whereas the highest was measured in August (23.7 °C). Similarly, in 2023, the lowest average temperature was observed in April (9.1 °C), while the highest was recorded in August (23.4 °C) (Table 1).

The average relative humidity in 2022 was at its lowest in April (44.5%) and reached its highest in June (55.8%). In 2023, the lowest relative humidity was observed in August (76.6%), while the highest was recorded in June (95.3%) (Table 1).

2.3. Soil Characteristics of the Experimental Site

A total of 96 Turkish chickpea genotypes and 10 registered cultivars were used as plant material in this study, and their identification numbers, origins, collection provinces, and species information are summarized in Table 2.

The soil in the experimental area was classified as silty clay loam with a pH of 7.28, and it contained 19.6% lime, 0.33% salt, 34.0 kg ha⁻¹ available phosphorus (P₂O₅), 935.9 kg ha⁻¹ available potassium (K₂O), and 1.7% organic matter, as shown in Table 3.

3. Artificial Neural Networks

ANNs are widely used modeling tools capable of capturing complex nonlinear relationships between agronomic variables and crop yield. In this study, an MLP architecture was employed to model chickpea grain yield based on phenological, morphological, and environmental inputs.

The most popular model of ANNs for multiple variables regression analysis is the multi-layer feed-forward network, which consists of input, hidden, and output layers with a different number of neurons. Input neurons refer to independent variables, while output neurons correspond to dependent variables in regression analysis. However, the number of hidden layers and the number of hidden neurons in each hidden layer are generally determined experimentally in a way to give the best output prediction [27,28].

Backpropagation is a fundamental algorithm used to train ANNs by minimizing prediction error through gradient descent. It works by comparing the network’s output to the correct answer, calculating the error, and then propagating this error backward through the layers to update the connection weights. This process helps each neuron understand its contribution to the overall error and adjust accordingly. Over time, these adjustments improve the network’s ability to make accurate predictions [29,30,31].

Before applying the Backpropagation Algorithm, all input and output variables must be normalized because they can have different ranges of values, which can cause unfair calculations. The most popular normalization functions are standard normalization (Equation (1)) and min-max normalization (Equation (2)), as follows [32]:

x_{i, n}' = \frac{x_{i, n} - µ}{σ}

(1)

where µ and σ represent the mean and standard deviation of the i-th feature, respectively.

x_{i, n}' = \frac{x_{i, n} - \min (x_{i})}{\max (x_{i}) - \min (x_{i})} (n M a x - n M i n) + n M i n

(2)

where min and max represent the minimum and maximum value of i-th feature respectively, while the lower and upper bounds to rescale the data are corresponded to nMin and nMax individually.

During the Backpropagation Algorithm execution, the input values coming from neurons are multiplied by weights, and the results are summed to produce the neuron outputs. To adjust these outputs in a way to be in a specific range, activation functions are used. The mostly used activation functions are given below: The Sigmoid logistic Activation Function (Equation (3)) and Tanh (Hyperbolic Tangent) Activation Function (Equation (4)).

S i g m o i d (x) = \frac{1}{(1 + e^{- x})}

(3)

with the properties:

Output range: (0, 1).
Commonly used in classification problems.
Advantage: Output can easily be interpreted as a probability.
Disadvantage: For large positive or negative input values, the derivative becomes very small (vanishing gradient problem).

Tanh (x) = \frac{(e^{x} - e^{- x})}{(e^{x} + e^{- x})}

(4)

with the properties:

Output range: (−1, 1)
Has a more centered distribution compared to sigmoid (mean closer to zero).
Can lead to better learning in deeper networks since gradients tend to be larger.
Still susceptible to the vanishing gradient problem for large absolute input values.

The weights between neurons are updated continuously by the Backpropagation algorithms concerning the difference between the actual value and the predicted value of outputs. However, to stop the algorithm, some evaluation metrics are required. For these purposes, the error metrics (MSE, MAE, RMSE, etc.) and the coefficient of determination (R²) are generally used as shown below.

MAE and RMSE evaluate the closeness and variability of the predicted values to the actual ones, respectively, without offsetting positive and negative errors. MSE reflects the average squared difference between predicted and actual values, indicating the degree of deviation. R² shows how much of the variation in the dependent variable can be explained by the independent variables. The closer the R² value is to 1, the better the regression line fits the data [33,34]. The MSE, MAE, and RMSE formulations used in this study are given in Equations (5)–(7), respectively.

M S E = (\frac{1}{n}) \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(5)

M A E = (\frac{1}{n}) \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(6)

R M S E = \sqrt{(\frac{1}{n}) \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(7)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(8)

where

y_{i},

{\hat{y}}_{i}

and

\bar{y}

refer to the actual value, the predicted value, and the mean of output, respectively.

Considering the nonlinear nature of yield formation processes and the evaluation framework described above, an MLP-based ANN was selected in this study to capture the complex relationships among phenological, morphological, and yield-related traits influencing chickpea grain yield. Crop yield formation results from interactions between genotype characteristics and environmental variability, which may not always be adequately represented using strictly linear modeling approaches. MLP networks have been widely applied in agricultural yield prediction studies due to their ability to model multidimensional relationships derived from experimental field data.

The objective of this study was not to compare different machine learning algorithms but to evaluate the applicability and interpretability of an MLP-based modeling framework for chickpea yield prediction under field conditions. Model reliability and generalization performance were therefore assessed using 10-fold cross-validation. Accordingly, the model was designed to evaluate yield estimation performance under field conditions using observed agronomic traits, rather than to serve as a purely early-stage forecasting system based only on indirect predictors.

4. Experimental Results

4.1. Model Performance Evaluation

Chickpea (Cicer arietinum L.) is a globally significant legume crop that plays a strategic role in agricultural production due to its high nutritional value and nitrogen fixation capacity. One of the most critical criteria in chickpea production is grain yield, which is determined by various agronomic and physiological factors. For this reason, identifying the traits most closely associated with yield is essential for both crop improvement and predictive modeling.

The primary yield components determining chickpea grain yield include the number of pods per plant, the number of grains per pod, thousand-grain weight, biological yield, and harvest index. The number of pods per plant is influenced by genetic traits as well as environmental factors, whereas the number of grains per pod and thousand-grain weight are directly related to genetic factors along with water and nutrient management. Thousand-grain weight is particularly crucial in terms of the commercial value of chickpea, as it is significantly affected by genotype and environmental conditions. Consistent with this, the SHAP results obtained in the present study identified seed weight per plant, number of pods per plant, and number of seeds per plant as the main contributors to grain yield prediction.

Each yield component can be optimized through different agronomic practices. For instance, sowing time and plant density directly impact the number of pods per plant and, consequently, grain yield. Irrigation and fertilization strategies, especially during the flowering and grain-filling stages, enhance the number of grains per pod and grain size, thereby contributing to total yield. Additionally, effective disease and pest management help minimize physiological losses during the reproductive growth phase, mitigating the negative effects on yield components. Moreover, the contribution of the year variable showed that yield estimation was also affected by environmental variability between growing seasons.

In conclusion, chickpea grain yield is a complex trait determined by the interaction between genetic and environmental factors. Balancing yield components is crucial for achieving high and sustainable yields. The present findings support this interpretation by showing that the developed ANN model was mainly driven by biologically relevant reproductive traits, while also reflecting the influence of seasonal conditions. Accordingly, Table 4 presents the descriptive statistics of the phenological, morphological, and yield-related traits included in the modeling framework. In this regard, appropriate cultivation techniques and modern agricultural practices are key in enhancing both productivity and quality in chickpea production.

Before model training, missing observations were removed from the dataset. The genotype variable was converted into numerical form using label encoding. All input features and the target variable were standardized using z-score normalization (mean = 0, standard deviation = 1). The dataset was randomly divided into training (70%) and test (30%) subsets. Scaling parameters were calculated using only the training data and subsequently applied to the test data to prevent data leakage.

To predict Grain Chickpea Yield, the features above (Table 4) are used for the inputs to the ANN models. The MLPRegressor model from the Python scikit-learn library (version 1.8.0) is used for chickpea grain yield prediction with different parameters as given in Table 5. The solver parameter (lbfgs and adam) refers to the optimization algorithm of the backpropagation algorithm to adjust the weights. Two hidden layers with 2, 5, and 10 neurons were evaluated to obtain the lowest MSE and the highest R² values for the test data (30% of the total dataset). To validate the test results, the cross-validation method was applied with 10 folds (CV = 10), in which 10 iterations are applied with 1 portion as test data and 9 portions as training data for each row below. The cross-validation results illustrate that the prediction model for chickpea grain yield works robustly.

As shown above, based on the cross-validation results, the best prediction performance was achieved using the lbfgs solver with the tanh activation function and two hidden layers consisting of 5 and 2 neurons, yielding the highest R² CV value (0.9461) together with the lowest MSECV (0.0173).

4.2. SHAP-Based Explainability Analysis

SHAP analysis was applied to interpret the contribution of input variables to the prediction performance of the developed machine learning model. The ranking of feature importance and their relative contributions based on SHAP values are presented in Table 6.

According to Table 6, seed weight per plant was identified as the most influential variable, contributing 64.9% to the overall model prediction. Although seed weight per plant showed the highest contribution, this does not indicate a one-to-one equivalence with grain yield, as the remaining variables also contributed substantially to the overall model prediction. This suggests that grain yield was influenced not by a single trait alone, but by the combined effects of reproductive, agronomic, and environmental factors. This result indicates that plant-level grain productivity plays a dominant role in determining model outputs. The number of pods per plant ranked second with a contribution of 9.9%, followed closely by the number of seeds per plant (9.4%), highlighting the strong influence of reproductive yield components on prediction performance. The year variable contributed 4.5%, emphasizing the impact of environmental and annual variability. Moderate contributions were observed for 100-seed weight (3.8%) and days to maturity (2.6%), whereas phenological and morphological traits including days to first emergence (1.4%), days to first pod setting (0.9%), first pod height (0.9%), days to flowering (0.7%), plant height (0.6%), and genotype (0.3%) exhibited relatively lower contributions to the overall model importance.

The global distribution and directional effects of feature contributions across all observations are illustrated in the SHAP summary plot shown in Figure 1. In this plot, each dot represents one observation, and the color of the dot indicates the relative value of the corresponding feature, where blue denotes low feature values and red denotes high feature values. The plot indicates that seed weight per plant has the widest spread of SHAP values and the strongest positive impact on model predictions, demonstrating its dominant contribution across the dataset. Higher values of seed weight per plant (represented by red points) are generally associated with increased predicted grain yield, whereas lower values (represented by blue points) contribute negatively to the model output.

Reproductive traits such as number of pods per plant and number of seeds per plant also show noticeable variability in SHAP values, indicating their meaningful influence on prediction performance, although their impact is considerably smaller than that of seed weight per plant. The year variable exhibits moderate dispersion, confirming the role of environmental and seasonal variability in yield prediction. In contrast, traits including 100-seed weight, days to maturity, emergence time, flowering duration, first pod height, plant height, and genotype display relatively narrow SHAP value distributions centered around zero, suggesting limited individual influence on the model output. Overall, the SHAP summary plot confirms that yield-related reproductive traits primarily drive model predictions, while phenological and morphological variables contribute secondary effects [35].

SHAP values were computed using the KernelSHAP method exclusively on out-of-fold validation samples (X_fold_test) within each cross-validation iteration to prevent information leakage and attribution bias. The background reference dataset consisted of 50 randomly sampled observations drawn from the corresponding training fold. SHAP values obtained from all folds were pooled across out-of-fold predictions to derive global feature importance estimates. Because SHAP explanations were derived from standardized inputs, importance rankings represent relative predictive influence rather than direct agronomic effect magnitude.

The overall magnitude of feature importance was quantified using mean absolute SHAP values as presented in Figure 2. In this bar plot, the blue bars represent the mean absolute SHAP value of each feature across all observations, indicating the average magnitude of that feature’s contribution to the model output regardless of direction. The feature importance plot clearly indicates that seed weight per plant exhibits substantially higher importance than all other variables, demonstrating its dominant contribution to chickpea grain yield prediction.

Reproductive traits, particularly the number of pods per plant and the number of seeds per plant, represent the next most influential predictors, although their contributions remain considerably lower than that of seed weight per plant. The year variable and 100-seed weight show moderate importance, reflecting the influence of environmental variability and seed characteristics on model predictions.

In contrast, phenological and morphological traits, including days to maturity, days to first emergence, days to first pod setting, first pod height, days to flowering, plant height, and genotype, display relatively low importance values, indicating a comparatively smaller individual contribution to the prediction process. Overall, the distribution of feature importance confirms that yield-related reproductive traits primarily drive model predictions, whereas structural and developmental characteristics provide secondary predictive information.

The SHAP analysis showed that the prediction model was mainly influenced by yield-related reproductive traits, particularly seed weight per plant, while environmental and other agronomic variables had smaller contributions. These results help explain how different traits affect chickpea grain yield prediction and support the biological relevance of the developed model.

4.3. Principal Component Analysis of Chickpea Genotypes

Principal component analysis (PCA) was conducted to evaluate the relationships among phenological, morphological, and yield-related traits and to characterize the multivariate structure of the chickpea dataset prior to artificial neural network (ANN) modeling. The PCA biplot (Figure 3) revealed substantial variability among genotypes, indicating a broad genetic base and considerable phenotypic diversity suitable for yield prediction and breeding applications. This variability is particularly important for machine learning approaches, as broader variation improves model generalization and predictive capability [36].

The first two principal components accounted for a large proportion of total variance, suggesting that chickpea yield formation is primarily governed by a limited number of key agronomic traits. Yield-related variables, including seed weight per plant, number of pods per plant, number of seeds per plant, biological yield, and grain yield, were positioned in similar directions on the PCA biplot, indicating strong positive correlations among these traits. This result confirms that reproductive components play a dominant role in determining grain yield, which is consistent with the biological structure of yield formation in chickpea [36].

The strong association among reproductive traits observed in PCA aligns closely with the ANN-based feature importance results, where seed weight per plant, number of pods per plant, and number of seeds per plant were identified as the most influential predictors of grain yield. This agreement between multivariate statistical analysis and machine learning outputs enhances the biological reliability and interpretability of the developed predictive model. The convergence of these independent analytical approaches further supports the robustness of the identified yield determinants.

In contrast, phenological traits such as days to emergence, days to flowering, days to first pod set, and days to maturity were located at greater distances from yield-related variables in the PCA space, indicating relatively weaker direct relationships with grain yield. Although phenological development is important for plant adaptation and stress avoidance, these results suggest that their influence on final yield is largely indirect. Similarly, morphological traits including plant height and first pod height exhibited moderate associations with yield components, indicating secondary contributions to productivity.

The distribution of genotypes across the PCA biplot demonstrated clear differentiation among chickpea lines, reflecting significant genetic diversity within the evaluated material. Genotypes positioned near yield-related vectors can be considered as high-yield potential candidates, whereas those located closer to phenological traits may represent genotypes with stress adaptation potential rather than high productivity. This differentiation highlights the usefulness of PCA for identifying promising parental material in chickpea breeding programs [37].

Furthermore, the wide dispersion of genotypes across principal components suggests the presence of genotype × environment interactions, particularly under semi-arid conditions such as those prevailing in Sivas. This variability supports the application of ANN-based modeling approaches, which are particularly suitable for capturing complex nonlinear interactions among agronomic traits and environmental factors.

Overall, the PCA results indicate that chickpea grain yield is primarily driven by reproductive traits, while phenological and morphological variables contribute indirectly to yield formation. These findings complement the ANN modeling results and demonstrate that integrating multivariate statistical approaches with machine learning methods provides a more comprehensive understanding of yield determinants. Such integrated analytical frameworks offer valuable decision-support tools for chickpea breeding and precision agriculture applications, particularly under variable environmental conditions.

5. Discussion

The results indicate that ANN-based modeling is effective for capturing the complex relationships among agronomic traits and chickpea grain yield. The high predictive performance obtained in this study suggests that MLP architectures can successfully model nonlinear interactions that are difficult to represent with conventional linear approaches. This is particularly relevant for chickpea, where yield is controlled by the combined effects of genotype, reproductive development, and environmental conditions. Similar findings have been reported in previous chickpea yield prediction studies using ANN-based approaches [23], where ANN models demonstrated superior performance compared to traditional time-series methods.

Compared with previous chickpea yield prediction studies, the performance obtained in the present study appears strong, although direct numerical comparisons should be interpreted cautiously because of differences in datasets, predictors, and validation strategies. The MLP-based ANN achieved an R²CV of 0.9461, indicating high predictive ability under field conditions that included genotype-related and seasonal variability. This level of predictive performance is comparable to studies based on climatic and remote sensing variables [24]; however, unlike those studies, the present work integrates detailed agronomic and reproductive traits, providing a more biologically grounded and interpretable modeling framework for yield prediction.

A major finding of the study is the dominant contribution of seed weight per plant, followed by number of pods per plant and number of seeds per plant, in the prediction of grain yield. These traits are biologically closely linked to final yield formation, and their prominence in the SHAP analysis supports the agronomic consistency of the model. At the same time, the very strong contribution of seed weight per plant indicates that the model should be interpreted primarily as a yield estimation framework, rather than a strict early-season forecasting tool. This distinction is important for the practical interpretation of the model.

The dominant contribution of seed weight per plant should be interpreted carefully, since this trait is directly associated with final grain yield and therefore represents a near-yield component rather than an independent upstream predictor. For this reason, the proposed ANN framework is more appropriately interpreted as a yield estimation model based on integrated agronomic observations than as a strict early-stage forecasting tool. Nevertheless, the strong influence of this variable remains biologically meaningful, as it reflects the central role of reproductive productivity in final yield formation. This interpretation is also supported by the model comparison, in which excluding seed weight per plant reduced the predictive performance to an R² of approximately 0.67. This decrease confirms that the variable contains substantial predictive information, but also shows that model performance was not determined exclusively by this single feature.

The contribution of the year variable also highlights the effect of environmental variability on yield prediction. This finding emphasizes the importance of genotype × environment interaction in chickpea and shows that yield performance cannot be explained only by plant traits. Under semi-arid conditions, annual differences in temperature and moisture availability can substantially affect reproductive development and final grain yield. This observation is consistent with previous studies highlighting the importance of climatic variability and environmental factors in chickpea yield prediction models [23,24].

In contrast, phenological and morphological traits such as flowering time, maturity, plant height, and first pod height showed lower individual contributions. Although these traits remain important from an agronomic and breeding perspective, their predictive influence in the present model was smaller than that of the direct reproductive components. This suggests that reproductive efficiency had a more immediate role in determining yield variation within the studied material.

An important strength of the study is that the ANN model was supported by SHAP-based interpretation, which improves transparency and helps identify the traits with the greatest influence on model output. This increases the practical value of the approach for genotype evaluation, trait prioritization, and data-driven breeding decisions. However, the study is also limited by its dataset structure, being based on a single location and two growing seasons. Therefore, broader validation under different environments would be necessary to confirm the general applicability of the model. Moreover, the dataset was derived from a single experimental location and limited to two growing seasons, which may restrict the generalizability of the model under different agroecological conditions. In addition, the absence of external validation using independent datasets should be considered when interpreting the robustness and broader applicability of the model. In contrast to earlier studies that primarily emphasized predictive performance [21,22], the present study combines high predictive accuracy with an interpretable, feature-based modeling approach, enabling a clearer understanding of the biological drivers of yield formation. This approach is also consistent with recent studies showing that incorporating meaningful input features can improve the effectiveness of data-driven models in crop yield prediction [25].

Overall, the findings show that explainable ANN models can provide not only accurate yield estimation but also biologically meaningful insight into the relative importance of key agronomic traits in chickpea. In this respect, the study contributes to the literature by moving beyond prediction alone and offering an interpretable, trait-based modeling framework for chickpea yield analysis under field conditions.

6. Conclusions

This study demonstrates the effectiveness of ANN, specifically MLP, in predicting chickpea grain yield based on a wide range of agronomic and environmental variables. The integration of ANN modeling with SHAP-based explainability represents a key contribution of this study, enabling both high predictive accuracy and interpretable insight into yield formation. By analyzing data from 96 genotypes across two growing seasons, the ANN model successfully captured the complex, nonlinear relationships influencing chickpea productivity. The best-performing model, employing the lbfgs solver with the tanh activation function and two hidden layers (5 and 2 neurons), demonstrated strong predictive performance, achieving a cross-validation coefficient of determination of R² = 0.9461 (MSE = 0.0173), indicating robust generalization capability. These findings demonstrate the suitability of ANN-based modeling for representing genotype–environment interactions in chickpea grain yield prediction.

The SHAP-based explainability analysis further clarified the relative importance of input variables, identifying seed weight per plant as the primary factor influencing model predictions, followed by environmental variability (year) and reproductive traits such as pod and seed number per plant. This interpretability supports the biological consistency of the developed prediction framework. These findings emphasize the critical role of reproductive traits in yield formation and highlight their importance for breeding and selection strategies.

The results indicate that explainable machine learning approaches can support genotype evaluation, trait prioritization, and data-driven breeding strategies under variable environmental conditions. In practical terms, the proposed framework can assist researchers and breeders in identifying high-performing genotypes and optimizing trait-based selection under semi-arid conditions. By linking predictive performance with interpretable trait contributions, the proposed framework provides practical insights for chickpea yield assessment beyond conventional statistical approaches.

However, because certain reproductive traits, particularly seed weight per plant, represent direct yield-related components measured close to harvest, the developed model should primarily be interpreted as a yield estimation framework rather than a strict early-season forecasting tool.

Future studies may focus on integrating additional environmental descriptors, remote sensing information, or hybrid modeling approaches to further improve prediction reliability and enhance decision-support tools for precision agriculture, sustainable chickpea production and crop improvement programs.

Author Contributions

Conceptualization, T.K., I.Y. and M.Z.; methodology, I.Y. and M.Z.; software, I.Y. and M.Z.; validation, I.Y., M.Z. and F.Y.; formal analysis, I.Y., M.Z. and F.Y.; investigation, T.K.; resources, T.K.; data curation, T.K.; writing—original draft preparation, T.K., I.Y. and M.Z.; writing—review and editing, I.Y., M.Z. and F.Y.; visualization, I.Y. and M.Z.; supervision, M.Z.; project administration, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (GPT-5.3) for purposes such as text refinement, language editing, and structural improvement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, J.; Wang, J.; Zhu, C.; Singh, R.P.; Chen, W. Chickpea: Its origin, distribution, nutrition, benefits, breeding, and symbiotic relationship with Mesorhizobium species. Plants 2024, 13, 429. [Google Scholar] [CrossRef]
Księżak, J.; Bojarszczuk, J. The effect of cropping method and botanical form on seed yielding and chemical composition of chickpeas (Cicer arietinum L.) grown under organic system. Agronomy 2020, 10, 801. [Google Scholar] [CrossRef]
Sachdeva, S.; Bharadwaj, C.; Patil, B.S.; Pal, M.; Roorkiwal, M.; Varshney, R.K. Agronomic performance of chickpea affected by drought stress at different growth stages. Agronomy 2022, 12, 995. [Google Scholar] [CrossRef]
Nakhalbaev, J.T.; Khamdamov, I.K. Estimation of ascochytosis infection of samples and lines of chickpea sorts in natural field conditions in Uzbekistan. Agrar. Sci. 2022, 6, 74–77. [Google Scholar] [CrossRef]
Tripathi, S.; Dixit, G.P. Chickpea (Cicer arietinum L.) Breeding. In Fundamentals of Legume Breeding: A Text for Students and Practitioners; Springer Nature: Singapore, 2025; pp. 33–50. [Google Scholar]
Mitache, M.; Baidani, A.; Bencharki, B.; Idrissi, O. Exploring the impact of light intensity under speed breeding conditions on the development and growth of lentil and chickpea. Plant Methods 2024, 20, 30. [Google Scholar] [CrossRef]
Jha, U.C.; Nayyar, H.; Thudi, M.; Beena, R.; Vara Prasad, P.V.; Siddique, K.H. Unlocking the nutritional potential of chickpea: Strategies for biofortification and enhanced multinutrient quality. Front. Plant Sci. 2024, 15, 1391496. [Google Scholar] [CrossRef]
Özer, S.; Karaköy, T.; Toklu, F.; Baloch, F.S.; Kilian, B.; Özkan, H. Nutritional and physicochemical variation in Turkish kabuli chickpea (Cicer arietinum L.) landraces. Euphytica 2010, 175, 237–249. [Google Scholar] [CrossRef]
Wright, S. Correlation and causation. J. Agric. Res. 1921, 20, 557–585. [Google Scholar]
Kosev, V. Model of high-productive varieties in forage pea. J. Cent. Eur. Agric. 2015, 2, 172–180. [Google Scholar] [CrossRef]
Khoshro, H.H.; Maleki, H.H. Clarifying interactions between genotype and environment and management in chickpea by focusing on plant and soil attributes. Sci. Rep. 2025, 15, 11401. [Google Scholar] [CrossRef]
Khazaei, J.; Naghavi, M.R.; Jahansouz, M.R.; Salimi-Khorshidi, G. Yield estimation and clustering of chickpea genotypes using soft computing techniques. Agron. J. 2008, 100, 1077–1087. [Google Scholar] [CrossRef]
Matsumura, K.; Gaitan, C.F.; Sugimoto, K.; Cannon, A.J.; Hsieh, W.W. Maize yield forecasting by linear regression and artificial neural networks in Jilin, China. J. Agric. Sci. 2015, 153, 399–410. [Google Scholar] [CrossRef]
Claudia, A. Regression analysis. In Encyclopedia of Bioinformatics and Computational Biology; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
Aasim, M.; Ali, S.A.; Altaf, M.T.; Ali, A.; Nadeem, M.A.; Baloch, F.S. Artificial neural network and decision tree facilitated prediction and validation of cytokinin-auxin induced in vitro organogenesis of sorghum (Sorghum bicolor L.). Plant Cell Tissue Organ Cult. 2023, 153, 611–624. [Google Scholar] [CrossRef]
Ewuzie, U.; Bolade, O.P.; Egbedina, A.O. Application of deep learning and machine learning methods in water quality modeling and prediction: A review. In Current Trends and Advances in Computer-Aided Intelligent Environmental Data Engineering; Elsevier: Amsterdam, The Netherlands, 2022; pp. 185–218. [Google Scholar]
Araújo, S.O.; Peres, R.S.; Ramalho, J.C.; Lidon, F.; Barata, J. Machine learning applications in agriculture: Current trends, challenges, and future perspectives. Agronomy 2023, 13, 2976. [Google Scholar] [CrossRef]
Karray, F.O.; De Silva, C.W. Soft Computing and Intelligent Systems Design: Theory, Tools and Applications; Pearson Education: London, UK, 2004. [Google Scholar]
Omid, M.; Khojastehnazhand, M.; Tabatabaeefar, A. Estimating volume and mass of citrus fruits by image processing technique. J. Food Eng. 2010, 100, 315–321. [Google Scholar] [CrossRef]
Mollazade, K.; Omid, M.; Arefi, A. Comparing Data Mining Classifiers for Grading Raisins based on Visual Features. Comput. Electron. Agric. 2012, 84, 124–131. [Google Scholar] [CrossRef]
Demir, B. Application of data mining and adaptive neuro-fuzzy structure to predict color parameters of walnuts (Juglans regia L.). Turk. J. Agric. For. 2018, 42, 216–225. [Google Scholar] [CrossRef]
Cetin, N.; Karaman, K.; Beyzi, E.; Saglam, C.; Demirel, B. Comparative evaluation of some quality characteristics of sunflower oilseeds (Helianthus annuus L.) through machine learning classifiers. Food Anal. Methods 2021, 14, 1666–1681. [Google Scholar] [CrossRef]
Vennela, B.; Mishra, E.P.; Gautam, S.; Mishra, A.R.; Rawat, S. Regional time series forecasting of chickpea using ARIMA and neural network models in central plains of Uttar Pradesh (India). Int. J. Environ. Clim. Change 2022, 12, 2879–2889. [Google Scholar] [CrossRef]
Rezapour, S.; Jooyandeh, E.; Ramezanzade, M.; Mostafaeipour, A.; Jahangiri, M.; Issakhov, A.; Chowdhury, S.; Techato, K. Forecasting rainfed agricultural production in arid and semi-arid lands using learning machine methods: A case study. Sustainability 2021, 13, 4607. [Google Scholar] [CrossRef]
Al-Shammari, D.; Chen, Y.; Wimalathunge, N.S.; Wang, C.; Han, S.Y.; Bishop, T.F. Incorporation of mechanistic model outputs as features for data-driven models for yield prediction: A case study on wheat and chickpea. Precis. Agric. 2024, 25, 2531–2553. [Google Scholar] [CrossRef]
Çetin, N.; Ozaktan, H.; Uzun, S.; Uzun, O.; Ciftci, C.Y. Machine learning based mass prediction and discrimination of chickpea (Cicer arietinum L.) cultivars. Euphytica 2023, 219, 20. [Google Scholar] [CrossRef]
Coskuner, G.; Jassim, M.S.; Zontul, M.; Karateke, S. Application of artificial intelligence neural network modeling to predict the generation of domestic, commercial and construction wastes. Waste Manag. Res. 2021, 39, 499–507. [Google Scholar] [CrossRef]
Linaza, M.T.; Posada, J.; Bund, J.; Eisert, P.; Quartulli, M.; Döllner, J.; Pagani, A.; Olaizola, I.G.; Barriguinha, A.; Moysiadis, T.; et al. Data-driven artificial intelligence applications for sustainable precision agriculture. Agronomy 2021, 11, 1227. [Google Scholar] [CrossRef]
Wythoff, B.J. Backpropagation neural networks: A tutorial. Chemom. Intell. Lab. Syst. 1993, 18, 115–155. [Google Scholar] [CrossRef]
Aasim, M.; Katırc, R.; Akgur, O.; Yildirim, B.; Mustafa, Z.; Nadeem, M.A.; Baloch, F.S.; Karakoy, T.; Yılmaz, G. Machine learning (ML) algorithms and artificial neural network for optimizing in vitro germination and growth indices of industrial hemp (Cannabis sativa L.). Industrial Crops and Products. Ind. Crops Prod. 2022, 181, 114801. [Google Scholar] [CrossRef]
Hernández Hernández, G.C.; Gómez Gómez, J.; Jiménez-Cabas, J. Predictive models based on artificial intelligence to estimate crop yield: A literature review. Agriculture 2025, 15, 2438. [Google Scholar] [CrossRef]
Sola, J.; Sevilla, J. Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 1997, 44, 1464–1468. [Google Scholar] [CrossRef]
Saglam, M.; Spataru, C.; Karaman, O.A. Electricity demand forecasting with use of artificial intelligence: The case of Gokceada Island. Energies 2022, 15, 5950. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, L.; Wang, J.; Niu, X. Hybrid system based on a multi-objective optimization and kernel approximation for multi-scale wind speed forecasting. Appl. Energy 2020, 277, 115561. [Google Scholar] [CrossRef]
Li, Y.; Li, R.; Ji, R.; Wu, Y.; Chen, J.; Wu, M.; Yang, J. Research on factors affecting global grain legume yield based on explainable artificial intelligence. Agriculture 2024, 14, 438. [Google Scholar] [CrossRef]
Talekar, S.C.; Viswanatha, K.P.; Lohithaswa, H.C.; Rathod, S. Multivariate analysis and selection indices to identify superior cultivars and influential yield components in chickpea (Cicer arietinum L.). Plant Genet. Resour. 2022, 20, 348–354. [Google Scholar] [CrossRef]
Mahmood, M.T.; Ahmad, M.; Ali, I.; Hussain, M.; Latif, A.; Zubrair, M. Evaluation of chickpea genotypes for genetic diversity through multivariate analysis. J. Environ. Agric. Sci. 2018, 15, 11–17. [Google Scholar]

Figure 1. SHAP summary plot.

Figure 2. SHAP feature importance plot.

Figure 3. Principal component analysis (PCA) biplot of the chickpea genotypes.

Table 1. Rainfall, temperature, and relative humidity values for the trial years *.

	Total Precipitation (mm)			Temperature (°C)			Relative Humidity (%)
Months	2022	2023	Many Years	2022	2023	Many Years	2022	2023	Many Years
April	4.3	74.8	33.7	12.2	9.1	8.9	44.5	92.8	62.3
May	5.6	56.4	54.7	12.5	13.0	13.5	53.1	93.6	61.1
June	116.6	51.4	43.4	18.8	17.3	17.0	55.8	95.3	58.3
July	0	3.0	6.2	19.1	20.1	20.0	51.9	82.8	54.0
August	11.4	3.6	4.5	23.7	23.4	20.3	47.5	76.6	53.0
Total/Average	137.9	189.2	142.5	17.3	16.6	15.9	50.6	88.2	57.7

* Sivas Provincial Meteorology Directorate.

Table 2. Origin and collection locations of the 96 Turkish chickpea genotypes and 10 varieties used in this study.

No	Identification Number	Origin	Provinces	Species	No	Identification Number	Origin	Provinces	Species
1	69946	TUR	Mardin	Cicer reticulatum	49	ILWC 42	TUR	Mardin	Cicer reticulatum
2	69947	TUR	Diyarbakır	Cicer reticulatum	50	ILWC 43	TUR	Mardin	Cicer reticulatum
3	69948	TUR	Elazig	Cicer reticulatum	51	ILWC 44	TUR	Mardin	Cicer reticulatum
4	69960	TUR	Mardin	Cicer reticulatum	52	ILWC 45	TUR	Mardin	Cicer reticulatum
5	69961	TUR	Elazig	Cicer reticulatum	53	ILWC 46	TUR	Mardin	Cicer reticulatum
6	69971	TUR	Gaziantep	Cicer reticulatum	54	ILWC 47	TUR	Mardin	Cicer reticulatum
7	69972	TUR	Adana	Cicer reticulatum	55	ILWC 48	TUR	Mardin	Cicer reticulatum
8	69973	TUR	Diyarbakır	Cicer reticulatum	56	ILWC 49	TUR	Mardin	Cicer reticulatum
9	69974	TUR	Urfa	Cicer reticulatum	57	ILWC 50	TUR	Mardin	Cicer reticulatum
10	69975	TUR	Mardin	Cicer reticulatum	58	ILWC 51	TUR	Mardin	Cicer reticulatum
11	69978	TUR	Urfa	Cicer reticulatum	59	ILWC 52	TUR	Mardin	Cicer reticulatum
12	70001	TUR	Mardin	Cicer reticulatum	60	ILWC 53	TUR	Mardin	Cicer reticulatum
13	70002	TUR	Mardin	Cicer reticulatum	61	ILWC 54	TUR	Mardin	Cicer reticulatum
14	70003	TUR	Mardin	Cicer reticulatum	62	ILWC 55	TUR	Mardin	Cicer reticulatum
15	70004	TUR	Mardin	Cicer reticulatum	63	ILWC 56	TUR	Mardin	Cicer reticulatum
16	70024	TUR	Elazig	Cicer reticulatum	64	ILWC 57	TUR	Mardin	Cicer reticulatum
17	70025	TUR	Elazig	Cicer reticulatum	65	ILWC 58	TUR	Mardin	Cicer reticulatum
18	70026	TUR	Elazig	Cicer reticulatum	66	ILWC 59	TUR	Mardin	Cicer reticulatum
19	70027	TUR	Elazig	Cicer reticulatum	67	ILWC 60	TUR	Mardin	Cicer reticulatum
20	70028	TUR	Elazig	Cicer reticulatum	68	ILWC 61	TUR	Mardin	Cicer reticulatum
21	72933	TUR	Mardin	Cicer reticulatum	69	ILWC 62	TUR	Mardin	Cicer reticulatum
22	72934	TUR	Mardin	Cicer reticulatum	70	ILWC 63	TUR	Mardin	Cicer reticulatum
23	72935	TUR	Mardin	Cicer reticulatum	71	ILWC 64	TUR	Mardin	Cicer reticulatum
24	72937	TUR	Mardin	Cicer reticulatum	72	ILWC 65	TUR	Mardin	Cicer reticulatum
25	72938	TUR	Mardin	Cicer reticulatum	73	ILWC 66	TUR	Mardin	Cicer reticulatum
26	72939	TUR	Mardin	Cicer reticulatum	74	ILWC 67	TUR	Mardin	Cicer reticulatum
27	72941	TUR	Mardin	Cicer reticulatum	75	ILWC 68	TUR	Mardin	Cicer reticulatum
28	72942	TUR	Mardin	Cicer reticulatum	76	ILWC 69	TUR	Mardin	Cicer reticulatum
29	72943	TUR	Mardin	Cicer reticulatum	77	ILWC 70	TUR	Mardin	Cicer reticulatum
30	72944	TUR	Mardin	Cicer reticulatum	78	ILWC 71	TUR	Mardin	Cicer reticulatum
31	72945	TUR	Mardin	Cicer reticulatum	79	109207	TUR	Mardin	Cicer reticulatum
32	72946	TUR	Mardin	Cicer reticulatum	80	109208	TUR	Mardin	Cicer reticulatum
33	72948	TUR	Mardin	Cicer reticulatum	81	109209	TUR	Siirt	Cicer reticulatum
34	72984	TUR	Gaziantep	Cicer reticulatum	82	109211	TUR	Diyarbakır	Cicer reticulatum
35	72985	TUR	Gaziantep	Cicer reticulatum	83	109212	TUR	Mardin	Cicer reticulatum
36	72986	TUR	Gaziantep	Cicer reticulatum	84	109213	TUR	Mardin	Cicer reticulatum
37	72997	TUR	Elazig	Cicer reticulatum	85	73083	TUR	Mardin	Cicer reticulatum
38	72998	TUR	Elazig	Cicer reticulatum	86	73086	TUR	Hakkari	Cicer reticulatum
39	72999	TUR	Elazig	Cicer reticulatum	87	İnci	TUR		Cicer arietinum L.
40	73000	TUR	Elazig	Cicer reticulatum	88	Seçkin	TUR		Cicer arietinum L
41	73001	TUR	Elazig	Cicer reticulatum	89	Azkan	TUR		Cicer arietinum L
42	73002	TUR	Elazig	Cicer reticulatum	90	Işık	TUR		Cicer arietinum L
43	73003	TUR	Elazig	Cicer reticulatum	91	Aksu	TUR		Cicer arietinum L
44	73004	TUR	Elazig	Cicer reticulatum	92	Ubet	TUR		Cicer arietinum L
45	73005	TUR	Elazig	Cicer reticulatum	93	Tolga 01	TUR		Cicer arietinum L
46	73006	TUR	Diyarbakır	Cicer reticulatum	94	Çakır	TUR		Cicer arietinum L
47	ILWC32	TUR	Diyarbakır	Cicer reticulatum	95	Arda	TUR		Cicer arietinum L
48	ILWC 41	TUR	Mardin	Cicer reticulatum	96	Hasanbey	TUR		Cicer arietinum L

Table 3. Physical and chemical properties of the soil in the experimental area.

Depth	Texture	pH	Lime (% CaCO₃)	Salt (%)	Phosphorus (P₂O₅ kg ha⁻¹)	Potassium (K₂O kg ha⁻¹)	Organic Matter (%)
0–30 cm	Silty clay loam	7.28	19.6	0.33	34.0	935.9	1.7

Table 4. Descriptive statistics of phenological, morphological and yield-related traits of chickpea genotypes used in the study.

Trait	Mean	Std Dev	Minimum	Maximum
Days to Emergence (days)	22.45	3.17	15.50	29.00
Days to Flowering (days)	59.33	4.69	51.50	71.50
Days to First Pod Set (days)	67.69	5.57	57.50	80.00
Days to Maturity (days)	101.84	3.67	96.50	114.00
Plant Height (cm)	30.64	6.03	21.55	52.05
First Pod Height (cm)	8.31	3.89	4.20	23.10
Number of Pods per Plant (piece)	86.73	28.03	32.30	178.20
Number of Seeds per Plant (piece)	81.41	26.74	35.80	159.20
Seed Weight per Plant (g)	10.08	5.69	3.54	33.65
100-Seed Weight (g)	13.31	8.47	7.21	41.75
Biological Yield (kg ha⁻¹)	5549.90	1185.50	3340.00	9320.00
Grain Yield (kg ha⁻¹)	1005.50	567.90	354.00	3364.90

Table 5. Chickpea grain yield prediction performance of MLP-based ANN models (test size = 30%, CV = 10).

Solver	Activation	Hidden Layers	MSE Test	R² Test	MSE CV	R² CV
lbfgs	logistic	10, 5	0.0045	0.9931	0.0377	0.9349
lbfgs	logistic	5, 2	0.0035	0.9946	0.0200	0.9206
lbfgs	tanh	10, 5	0.0028	0.9956	0.0197	0.9433
lbfgs	tanh	5, 2	0.0012	0.9982	0.0173	0.9461
adam	logistic	10, 5	0.0471	0.9273	0.1380	0.7923
adam	logistic	5, 2	0.5298	0.1814	0.9189	0.2453
adam	tanh	10, 5	0.0477	0.9264	0.1803	0.7224
adam	tanh	5, 2	0.1511	0.7666	0.4324	0.5924

Table 6. Ranking of input variables based on SHAP importance and relative contribution.

Rank	Feature Name	SHAP Importance	Relative Contribution
1	Seed Weight per Plant	0.6957	64.9%
2	Pods per Plant	0.1066	9.9%
3	Seeds per Plant	0.1010	9.4%
4	Year	0.0486	4.5%
5	100-Seed Weight	0.0410	3.8%
6	Days to Maturity	0.0281	2.6%
7	Days to First Emergence	0.0149	1.4%
8	Days to First Pod Setting	0.0099	0.9%
9	First Pod Height	0.0092	0.9%
10	Days to Flowering	0.0080	0.7%
11	Plant Height	0.0068	0.6%
12	Genotype	0.0028	0.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karakoy, T.; Yelmen, I.; Zontul, M.; Yildirim, F. Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI. Agronomy 2026, 16, 768. https://doi.org/10.3390/agronomy16070768

AMA Style

Karakoy T, Yelmen I, Zontul M, Yildirim F. Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI. Agronomy. 2026; 16(7):768. https://doi.org/10.3390/agronomy16070768

Chicago/Turabian Style

Karakoy, Tolga, Ilkay Yelmen, Metin Zontul, and Fazli Yildirim. 2026. "Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI" Agronomy 16, no. 7: 768. https://doi.org/10.3390/agronomy16070768

APA Style

Karakoy, T., Yelmen, I., Zontul, M., & Yildirim, F. (2026). Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI. Agronomy, 16(7), 768. https://doi.org/10.3390/agronomy16070768

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Chickpea Yield Using Artificial Neural Networks with Explainable AI

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Site and Duration

2.2. Climatic Characteristics of the Experimental Site

2.3. Soil Characteristics of the Experimental Site

3. Artificial Neural Networks

4. Experimental Results

4.1. Model Performance Evaluation

4.2. SHAP-Based Explainability Analysis

4.3. Principal Component Analysis of Chickpea Genotypes

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI