AI-Driven Wheat Crop Optimization and Yield Prediction Tool

Ayub, Wareesha; Sameer, Muhammad; Ali, Muhammad; Hussain, Sharaf

doi:10.3390/blsf2025054009

Open AccessProceeding Paper

AI-Driven Wheat Crop Optimization and Yield Prediction Tool^†

by

Wareesha Ayub

^1,*,

Muhammad Sameer

¹,

Muhammad Ali

¹ and

Sharaf Hussain

²

¹

Department of Computer Science Engineering and Technology, Iqra University, Karachi 75300, Pakistan

²

Department of Computer Science, Faculty of Engineering, Sciences & Technology, Iqra University, Karachi 75300, Pakistan

^*

Author to whom correspondence should be addressed.

^†

Presented at the 3rd International Online Conference on Agriculture (IOCAG 2025), 22–24 October 2025; Available online: https://sciforum.net/event/IOCAG2025.

Biol. Life Sci. Forum 2025, 54(1), 9; https://doi.org/10.3390/blsf2025054009

Published: 16 January 2026

(This article belongs to the Proceedings of The 3rd International Online Conference on Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Precise prediction of wheat yield plays a crucial role in food security and resource management in Pakistan. The current research suggests an artificial intelligence-driven framework based on 23 years of agro-meteorological and yield data that predicts wheat production. Several machine learning models were compared, and a two-layer LSTM model performed better because it was able to capture temporal dependencies. The model managed to achieve high accuracy (R² = 0.979) and low prediction errors, confirming the applicability of deep learning in agricultural forecasting in climate-sensitive regions and its applicability to other staple crops.

Keywords:

wheat; yield prediction; LSTM; precision agriculture; machine learning; deep learning; agricultural optimization; weather data; soil data

1. Introduction

Agriculture forms an important part of the Pakistani economy, employing a significant segment of the population and securing food security in the country. The primary staple crop is wheat, the output of which is about 24–25 million tons per year, and the estimated turnover is over Rs. 6078 billion [1,2]. Regardless of its significance, the cultivation of wheat in Pakistan is associated with significant risks, such as unpredictable weather conditions, poor application of irrigation methods, and the lack of access to more practical, data-driven information to manage the crop [3,4]. Traditional farming techniques, which are often intuitive or traditional, often do not maximize the utilization of resources.

Recent developments in artificial intelligence (AI), especially machine learning, have exhibited high capabilities of enhancing agricultural productivity as they can precisely predict yield depending on different environmental factors [5,6,7]. Regarding this possibility, this research paper suggests an AI-based, web-based decision support system, which runs on a Long Short-Term Memory (LSTM) deep learning framework. The system leverages 23 years of past weather and soil history of Meteoblue and yield data provided by the Pakistan Bureau of Statistics to produce precise forecasts on the yield of wheat and aid in informed decision-making at the farm level.

The research targets large agricultural districts in the province of Punjab due to the economic significance of wheat and the fact that Punjab accounts for almost 70 percent of the total national production of wheat [8,9]. The proposed system will provide a bilingual user interface to enhance accuracy in yield prediction, manage resources optimally, and assist in sustainable wheat production in Pakistan by integrating predictive modeling, irrigation scheduling, and agronomic guidance.

2. Materials and Methods

2.1. Overview of System Implementation

The project was created in the form of a full-stack, bilingual, responsive web-based application to assist wheat farmers in the major districts of Punjab, Pakistan. The system incorporates real-time weather analytics, historical weather–soil data, rule-based irrigation logic, and a wheat yield prediction model based on deep learning.

2.2. Technology Stack

React.js was used to build the frontend and React-Bootstrap was used to ensure modularity and responsiveness. Bilingual (English and Urdu) functionality was provided by the i18next library. FastAPI (Python, v0.104.1) was used to create the backend as it is a highly performing, asynchronous machine learning system. Lightweight data storage was enabled through SQLite. The application was implemented on Microsoft Azure (as a backend) and Vercel (as a frontend) with Progressive Web App functionality to enable access when offline.

2.3. Data Sources

The 23-year (2000–2023) meteorological (Meteoblue) data obtained consisted of evapotranspiration, soil moisture, soil temperature, relative humidity, sea-level pressure, and minimum temperature. To obtain a reliable ground truth, annual district-level statistics of wheat yields were acquired through the Pakistan Bureau of Statistics. Other agronomic and advisory data, such as the fertilizer principles, pest management, and phenology, were obtained from national agricultural extension sources.

2.4. Data Preprocessing and Feature Engineering

Min–Max scaling was used to normalize all the numerical input variables to improve the stability of training and convergence of the learning models. To avoid the occurrence of any data leakage, the scaling parameters were fit to the training data only and then applied to the test data to ensure that no information about future observations affected the learning process.

A one-year look-back window was applied to the time-series data in order to maintain temporal dependence; it was then converted into supervised learning sequences. Linear interpolation was applied to the minor missing meteorological values, whereas records that had significant discontinuities were omitted.

The eight features included in the final model input were six main variables of agro-meteorology parameters (minimum temperature, relative humidity, soil moisture, soil temperature, evapotranspiration, and rainfall) and two derived features of time used to represent seasonal and lagged climatic effects. The target variable was the annual district-level wheat yield.

2.5. Development of Machine Learning Model

Several machine learning and deep learning models were compared, including Random Forest, Artificial Neural Networks (ANNs), XGBoost, Gated Recurrent Units (GRUs), and Long Short-Term Memory (LSTM).

In order to give uniformity to the various experiments, the baseline machine learning models were trained with standard hyperparameters that are usually used in other related agricultural forecasting research. The comparative analysis aimed to evaluate how the sequential and non-sequential modeling methods compare with agro-climatic time-series data as opposed to hyperparameter optimization, which is exhaustive.

The LSTM architecture chosen was a stacked pair of 30 hidden units and finally a dense output unit. The model was trained with the Adam optimizer, and early stopping after 100 epochs was applied to reduce overfitting.

Since the data is time-based, an 80:20 chronological train–test split was used, with the older data being used as the training data and the more recent data being used as the test data. The evaluation of the model performance was based on MAE, MSE, RMSE, and R².

2.6. Rule-Based Irrigation System

An irrigation advisory module was included in the form of a rule-based system to aid decision-making at the farm level. This system uses deterministic thresholds, which are calculated using rainfall, evapotranspiration, and maximum temperature, based on nationally accepted agronomic guidelines and irrigation scheduling requirements of wheat production in semi-arid areas, as expressed by the FAO.

This module is designed to offer actionable and decipherable irrigation guidance instead of dynamic optimization to assist farmers in defining irrigation timing and estimated water needs depending on the current weather and crop development phase.

2.7. Application Workflow

User registers/logs in;
User enters district, land area, and sowing date;
The system checks sowing timing;
Real-time weather is displayed;
Recommendations on irrigation are produced;
Fertilizer planner offers stage directions;
Prediction is obtained based on user requests;
LSTM-based prediction is sent to the backend;
The dashboard is updated dynamically.

2.8. Functional Requirements

Table 1 shows the key characteristics of the suggested smart agriculture system and the description of its functions with references to weather analysis, crop management, and decision support tools.

2.9. Non-Functional Requirements

The summary of the most important system quality attributes and technical features are presented in Table 2, with a focus on performance, security, scalability, reliability and usability in order to guarantee strong and accessible functioning to the end users.

3. Results

The experimental analysis aimed to establish the predictability of various machine learning and deep learning models when used on 23 years of historical meteorological and soil data. To ensure a strict and unbiased comparison, the methodology used the same preprocessing pipelines and normalization procedures, feature inputs, and data partitions in all models. This part introduces the quantitative performance of the LSTM model, which is the main focus of the study, and the comparative results of other models, which are backed up by statistical and visual analysis.

3.1. Performance of the LSTM Model

To assess the performance of the LSTM model, 23 years (2000–2023) of past agro-meteorological and soil data in the form of a univariate annual time series, with multivariate inputs, were used. The model was trained to obtain cumulative climatic effects on wheat yield in consecutive growing seasons using a one-year look-back window.

Table 3 highlights the quantitative measures of performance calculated on the test data provided. The LSTM provided the coefficient of determination (R²) of 0.979, which implies that there is a good relationship between the values of predicted and observed yields. The small values of the error measures of MAE, MSE, and RMSE are also indicative that the model was able to fit the overall yield-driving patterns that were prevalent in the historical data.

The training and validation loss curves as a function of epoch are shown in Figure 1. The monotonic convergence of the two curves and the smooth convergence demonstrated in the two curves represent the stability of learning behavior and show that the model has been successful in minimizing prediction error, and there is no sign of divergent behavior or oscillatory over-fitting. The high correspondence between training and validation loss also implies that the model was generalized to the unseen data despite the small sample size.

Figure 2 shows a correlation between the predicted and actual yield of wheat. The fact that the data points are very close to the 45-degree reference line proves that the model is capable of monitoring the changes in interannual yields in varying climatic conditions. The purpose of this visualization is not to be a quantitative measure of performance per se, but to offer qualitative affirmation of alignment as a trend.

When combined, the quantitative evaluations and visual inspection show that the LSTM model has successfully learned long-term temporal interdependences hidden in the agro-climatic data. Since long-term yield observations are necessarily scarce, such results must be viewed as signs of great model viability and intertemporal stability as opposed to predictive certitude.

In order to check the predictive veracity of this trained model, Figure 2 plots the actual and predicted values from the LSTM model, which confirms that almost all the data points are very close to the ideal line, which signifies a nearly perfect correspondence between the predicted values and the actual values. This particular behavior of the model signifies that the model is not only precise but also consistent with respect to all values, including the most extreme data.

Taken cumulatively, the numeric and graphical results indicate that the LSTM neural network had a highly impressive degree of overall generalization performance, as it remained consistent throughout the entire process of training as well as the testing phases. This is especially useful for applications within agricultural systems that make decisions.

3.2. Comparison with Other Models

The performance of the LSTM model was contrasted with that of other baseline machine learning and deep learning methods, such as Random Forest, Artificial Neural Networks (ANNs), XGBoost, and Gated Recurrent Units (GRUs), in order to place the LSTM model into perspective. These models have been chosen to illustrate non-sequential and sequential learning models, which are widely applied in the agricultural yield prediction research.

Table 4 presents the results of using non-sequential models, including the Random Forest, ANN, and XGBoost, with the annual time-series data, the predictive performance of which was relatively low. This may be because they lack the capability to model long-term time dependencies and cumulative climatic impacts, which are essential in the formation of crop yields. Although such models prove to be useful when modeling static nonlinear relationships, they do not provide any explicit memory mechanisms needed to learn interannual dependencies.

Sequential deep learning architectures, on the other hand, were superior. The improvement of the GRU model compared to non-sequential baselines indicated the advantages of the gated recurrent structures in learning the order. Nonetheless, the LSTM was always more successful than the GRU and produced smaller error values and explanatory power. This performance capability is also in line with the fact that LSTM has better gating mechanisms that enable it to retain and update long-term information more effectively.

It should be mentioned that the difference between the absolute magnitude of error between models depends on dissimilarities in sensitivity to feature scaling and temporal representation. In this regard, comparative analysis focuses on relative modeling competency and temporal learning performance and not absolute metric hegemony. The findings are all indicative of the appropriateness of LSTM-based architectures to the modeling of climate-influenced agricultural systems with limited yet time-structured datasets.

3.3. Interpretation of Model Behavior in the Context of Predictive Agriculture

The findings are highly correlated with the agronomic facts of wheat production. The accumulated environmental stresses and resource availability during long periods of growth affect wheat yields. Thus, long-range nonlinear temporal dependencies refer to the ability of LSTM networks to learn and forecast such biological processes in a closer manner than models based on instantaneous or independent observations.

Moreover, the error rates obtained by the LSTM model were likely exceptionally low, which indicates that the input variables used in this study, namely, evapotranspiration, relative humidity, soil moisture, soil temperature, minimum temperature, mean sea level pressure, and rainfall, were right and sufficient to represent the main climatic determinants of the wheat yield in the target area.

The results of the LSTM model also prove the preprocessing policy, normalization strategy, and look-back structure adopted in this method. The similarity of the predicted values among the ordinary as well as abnormal climatic years also indicates the possibility that the model was not built on central tendencies but trained to deviate and detect exceptions in the past data.

The overall findings suggest that the LSTM model is an effective and predictive model with high accuracy and stable results when it comes to predicting wheat yield. Its high performance compared to any other model, such as advanced tree models and GRU architectures, reveals how effective recurrent deep learning methods are compared to various agricultural applications, particularly in areas that are defined by high climatic variability.

The results not only confirm the technical approach but also show the reason why the LSTM model should be implemented as a part of the decision support system developed. The system can improve the planning process, the allocation of resources, and strategic decision-making required by the wheat farmers in Punjab through the provision of accurate yield forecasts.

4. Discussion

The results of this work can be used to confirm the high effectiveness of deep learning-based methods, especially LSTM, in wheat yield forecasting under the conditions of complex and dynamic agro-climate in Pakistan, which is consistent with recent wheat-focused machine learning studies conducted under climate change scenarios [10]. The high performance of the LSTM model can be compared to the literature, which outlines the benefits of deep recurrent models in agricultural forecasting [5,6]. A high coefficient of determination (R² = 0.979) and low prediction errors indicate the capability of this model to simultaneously describe the cumulative and lagged impact of climatic variables as the most essential factors influencing the variability of wheat yield in Pakistan [3,4].

It was observed that classical machine learning models had a poor generalization ability for time-series agricultural data; the GRU model was worse than the traditional methods, but still worse than the LSTM because it has low long-term memory capacity [11,12]. The high correlation between the anticipated and measured yields supports the applicability of important climatic inputs, such as evapotranspiration, temperature, humidity, and soil moisture [1,13,14,15]. The findings reveal the applicability of sequential deep learning structures in the forecasting of cumulative agro-climatic effects on wheat yield. Instead of being concerned with only the accuracy of the absolute prediction, the results highlight the fact that the LSTM model is better at learning time-dependent information than its non-sequential counterparts.

Since there are few long-term past yield observations, the findings are to be taken as indicators of model feasibility and consistency of trends over time. However, deep learning-based forecasting combined with decision-support systems has a viable potential in climate-resilient agricultural planning in data-scarce areas.

Author Contributions

W.A., conceptualization; W.A. methodology; W.A., software; W.A. and M.S., validation; W.A. and S.H. formal analysis; W.A. and M.S., investigation; W.A. resources; S.H. data curation; W.A., writing—original draft preparation; W.A. writing—review and editing; W.A. and S.H., visualization; M.A., supervision; S.H., project administration. All authors have read and agreed to the published version of the manuscript.

Funding

No external funds were granted to this research. No external organization made any contributions to the article processing charges.

Data Availability Statement

The information that was used to support the results of this paper was obtained via Meteoblue under licensed access and the Pakistan Bureau of Statistics. The datasets cannot be accessed publicly because of the use and permission restrictions but can be obtained by the respective author upon reasonable request.

Acknowledgments

The authors would also wish to extend their gratitude to the Department of Computer Science, Engineering, and Technology, Iqra University, Karachi, which gave them academic guidance and research facilities in the course of creating this study. All content generated by the authors was reviewed, confirmed, and edited by them, and the authors take complete responsibility for the integrity and accuracy of this publication.

Conflicts of Interest

The authors have no conflicts of interest. The funders did not participate in study design; collection, analysis, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

References

FAO. Wheat Production and Outlook in Pakistan; Food and Agriculture Organization of the United Nations: Rome, Italy, 2022; Available online: https://www.fao.org/common-pages/search/en/?q=Wheat+production+and+outlook+in+Pakistan (accessed on 13 December 2025).
PBS. Pakistan Bureau of Statistics: Wheat Statistics. 2023. Available online: https://na.data.gov.pk/Crops/Home (accessed on 13 December 2025).
Hussain, J.; Khaliq, T.; Asseng, S.; Saeed, U.; Ahmad, A.; Ahmad, B.; Ahmad, I.; Fahad, M.; Awais, M.; Ullah, A.; et al. Climate change impacts and adaptations for wheat employing multiple climate and crop models in Pakistan. Clim. Chang. 2020, 163, 253–266. [Google Scholar] [CrossRef]
Mahmood, N.; Arshad, M.; Kächele, H.; Ma, H.; Ullah, A.; Müller, K. Wheat yield response to input and socioeconomic factors under changing climate: Evidence from rainfed environments of Pakistan. Sci. Total Environ. 2019, 688, 1275–1285. [Google Scholar] [CrossRef] [PubMed]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Automation in agriculture by machine and deep learning techniques: A review of recent developments. Precis. Agric. 2021, 22, 2053–2091. [Google Scholar] [CrossRef]
Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electron. Agric. 2018, 151, 61–69. [Google Scholar] [CrossRef]
Qazi, T.F.; Niazi, A.A.; Basit, A. Assessment of agricultural performance of districts of Punjab. Glob. Soc. Sci. Rev. 2021, 6, 158–172. [Google Scholar] [CrossRef]
GOP. Pakistan Economic Survey 2022-23: Agriculture Chapter. Government of Pakistan, Ministry of Finance. 2023. Available online: https://www.finance.gov.pk/survey/chapters_23/02_Agriculture.pdf (accessed on 4 August 2025).
Iqbal, N.; Shahzad, M.U.; Sherif, E.S.; Tariq, M.U.; Rashid, J.; Le, T.V.; Ghani, A. Analysis of wheat-yield prediction using machine learning models under climate change scenarios. Sustainability 2024, 16, 6976. [Google Scholar] [CrossRef]
Shen, Y.; Mercatoris, B.; Cao, Z.; Kwan, P.; Guo, L.; Yao, H.; Cheng, Q. Improving wheat yield prediction accuracy using LSTM-RF framework based on UAV thermal infrared and multispectral imagery. Agriculture 2022, 12, 892. [Google Scholar] [CrossRef]
Di, Y.; Gao, M.; Feng, F.; Li, Q.; Zhang, H. A new framework for winter wheat yield prediction integrating deep learning and Bayesian optimization. Agronomy 2022, 12, 3194. [Google Scholar] [CrossRef]
PARC. Pakistan Agricultural Research Council: Wheat Sowing Guidelines for Punjab. 2023. Available online: https://www.parc.gov.pk/ (accessed on 4 August 2025).
Kaur, L.; Mahal, A.K.; Kaur, S.; Singh, P. Forecasting wheat productivity in Punjab, India: A weather-based model approach using detrended data and regression analysis. MAUSAM 2024, 75, 1095–1110. [Google Scholar] [CrossRef]
Sher, F.; Ahmad, E. Forecasting wheat production in Pakistan. Lahore J. Econ. 2008, 13, 57–85. [Google Scholar] [CrossRef]

Figure 1. Performance metrics of the Long Short-Term Memory model applied for wheat yield prediction.

Figure 2. Predicted vs. actual values (LSTM).

Table 1. Key Features and Functional Description of the Proposed Smart Agriculture System.

Feature	Description
User Authentication	Allows secure registration and login
Location Selection	Provides tailored services for four districts
Bilingual Interface	Toggles between English and Urdu
Weather Analytics Dashboard	Displays real-time weather parameters
Rainfall Prediction Module	Alerts if rain is expected in the upcoming days
Sowing Time Calculator	Determines whether sowing is timely
Fertilizer Planner	Advises on chemical usage by stage and area
Irrigation Scheduler	Suggests watering frequency based on weather and crop stage
Crop Calendar & Guide	Static information on phenological stages and diseases
Yield Prediction	Uses an ML model to predict wheat yield
Responsive UI	Works across mobile, tablet, and desktop devices

Table 2. Key Features and Non-Functional Description of the Proposed Smart Agriculture System.

Feature	Description
Performance	Optimized backend ensures response time < 2 s
Availability	Hosted on Azure with >99.9% uptime
Scalability	Backend can support more districts, crops, and users.
Security	Passwords hashed, API protected, HTTPS used
Localization	Full bilingual support via the i18n framework
Responsiveness	UI adapts to multiple screen sizes.
Maintainability	Modular frontend/backend for easy updates
Offline Support	Progressive Web App support for cached features
Accessibility	Simple navigation and visual icons for ease of use by rural users
Reliability	Integrated error handling, retry mechanisms, and user alerts on failure

Table 3. Long Short-Term Memory (LSTM) performance.

Metric	Value
MAE	0.0111
MSE	0.0004
RMSE	0.0201
R² Score	0.979
Accuracy	~98%

Table 4. Model performance comparison summary.

Model	MAE	MSE	RMSE	R² Score	Accuracy
Random Forest	59.41	6298.20	79.36	0.31	95.23%
ANN	67.39	7733.34	87.94	0.15	94.53%
XGBoost	58.82	5986.46	77.37	0.34	95.25%
GRU	0.0425	0.0037	0.0611	0.8728	~87.23%
LSTM	0.0111	0.0004	0.0201	0.979	~98%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ayub, W.; Sameer, M.; Ali, M.; Hussain, S. AI-Driven Wheat Crop Optimization and Yield Prediction Tool. Biol. Life Sci. Forum 2025, 54, 9. https://doi.org/10.3390/blsf2025054009

AMA Style

Ayub W, Sameer M, Ali M, Hussain S. AI-Driven Wheat Crop Optimization and Yield Prediction Tool. Biology and Life Sciences Forum. 2025; 54(1):9. https://doi.org/10.3390/blsf2025054009

Chicago/Turabian Style

Ayub, Wareesha, Muhammad Sameer, Muhammad Ali, and Sharaf Hussain. 2025. "AI-Driven Wheat Crop Optimization and Yield Prediction Tool" Biology and Life Sciences Forum 54, no. 1: 9. https://doi.org/10.3390/blsf2025054009

APA Style

Ayub, W., Sameer, M., Ali, M., & Hussain, S. (2025). AI-Driven Wheat Crop Optimization and Yield Prediction Tool. Biology and Life Sciences Forum, 54(1), 9. https://doi.org/10.3390/blsf2025054009

Article Menu

AI-Driven Wheat Crop Optimization and Yield Prediction Tool^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of System Implementation

2.2. Technology Stack

2.3. Data Sources

2.4. Data Preprocessing and Feature Engineering

2.5. Development of Machine Learning Model

2.6. Rule-Based Irrigation System

2.7. Application Workflow

2.8. Functional Requirements

2.9. Non-Functional Requirements

3. Results

3.1. Performance of the LSTM Model

3.2. Comparison with Other Models

3.3. Interpretation of Model Behavior in the Context of Predictive Agriculture

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

AI-Driven Wheat Crop Optimization and Yield Prediction Tool †

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of System Implementation

2.2. Technology Stack

2.3. Data Sources

2.4. Data Preprocessing and Feature Engineering

2.5. Development of Machine Learning Model

2.6. Rule-Based Irrigation System

2.7. Application Workflow

2.8. Functional Requirements

2.9. Non-Functional Requirements

3. Results

3.1. Performance of the LSTM Model

3.2. Comparison with Other Models

3.3. Interpretation of Model Behavior in the Context of Predictive Agriculture

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

AI-Driven Wheat Crop Optimization and Yield Prediction Tool^†