1. Introduction
The increased demand for global food production inevitably leads to the increased irrigation and water use of several farming practices, leading to the consumption of 70% of the available freshwater globally [
1]. Conventional irrigation scheduling is mostly based on the farmers’ estimations and experience resulting in the reduction of available water for irrigation and it usually takes place in an open-loop fashion, where the supplied irrigation volume and the prevailing soil water status are indirectly connected; consequently, they appear ineffective mostly due to their inability to take into account site-specific, spatially variable factors and weather conditions occurring in the irrigated area. For the above reasons, it is crucial to implement sustainable, yet effective water management practices towards the optimal crop health status, yield, and the minimization of water consumption and cost.
Precision irrigation practices are based on the assessment of the accurate water volume and irrigation period required site-specific to each field, so as to enhance yield productivity while also reducing the farmers’ labor costs. The above tasks can be achieved through monitoring the soil moisture levels. Soil moisture is a critical variable in hydrological, climatic, and agricultural processes, influencing everything from plant growth to land-atmosphere interactions. The presence of soil moisture is beneficial for agricultural productivity, as it has the potential to enhance several physiological and biochemical processes associated with crop development and, subsequently, crop yield. Therefore, by acquiring knowledge about the soil moisture content, farmers can obtain valuable insights on optimal timing for sowing and growing crops, adequacy of soil infiltration, and sufficiency of the water supply needed to facilitate crop root growth. The realization and understanding of these procedures increases the need for the investigation of novel approaches that are capable of precisely estimating the level of soil MC in a specific location [
2]. For the above reasons, continuous and accurate soil moisture records are very important for research and practical applications. However, obtaining such records can be challenging due to sensor malfunctions, data transmission errors, or environmental interferences [
3].
For several decades, the assessment of irrigation needs in agricultural systems has relied on the application of conventional deterministic and empirical models. These models, rooted in established principles and equations, have served as the foundational framework for understanding and quantifying the water requirements essential for optimal crop growth. This approach has been instrumental in shaping irrigation strategies and resource management practices within the agricultural domain, contributing significantly to the sustainable development of crop cultivation methodologies. The applied models are typically based on already known relationships between key variables such as soil moisture, evapotranspiration (ET) rates, environmental conditions, and crop growth stages, however, they often fail to take into consideration some other important variables affecting irrigation needs, including spatial soil variability, which is attributed to natural variation in soil properties and characteristics even within the same investigated field. To overcome these limitations, the effective combination of deterministic and empirical models, coupled with advanced computational technologies such as machine learning (ML), can provide more accurate and adaptive irrigation management approaches.
ML models can effectively predict key variables in irrigation, including ET and MC, with minimal human intervention by capturing their complex relationships, adapting to changing conditions, and incorporating nonlinear and interactive effects, elevating the potential to revolutionize the prediction of soil properties [
4]. Wu et al. [
5] developed a two-level ensemble model using ML models for the estimation of daily evapotranspiration (ET
o). The first level included Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), and K-Nearest Neighbors (KNN), while the second layer contained the Linear Regression (LR) model, employed as the meta-learner for results extraction. The applied model demonstrated high accuracy, with coefficient of determination (R
2) values ranging from 0.66 to 0.99. Other common ML approaches for the investigation and prediction of MC dynamics included the application of support vector machines (SVMs) [
6] and adaptive neuro-fuzzy inference systems (ANFIS) [
7] for simulating times series regarding soil MC, utilizing weather, precipitation, and crop coefficient data as input. The aforementioned traditional ML approaches demonstrated satisfactory performances compared to the deterministic and physical models, however they appeared sensitive to different environmental conditions, unstable in offering reliable predictions for the entire range of soil MC levels, and weak in providing insightful conclusions when employed in different regions than those that they were initially calibrated and developed in. In addition to the weaknesses of traditional ML models, the common ANNs models have been widely applied, but they are also limited in modelling dynamic data and approximating complex processes due to their tendency to lose previously attained and processed information, too [
8]. There are some commonly applied techniques in order to enhance the performance of the traditional ANNs for time series prediction including the addition of extra steps during pre-processing [
9] and parameters adjustment with the help of genetic algorithms [
10]. However, preprocessing techniques are time-consuming due to their high dependency on time and frequency, and they negatively affect the models’ adaptability to unknown environments. The above ambiguities often pose disastrous results to the models’ prediction performances especially when dealing with highly causal systems. One further technique to improve the performance of traditional ANNs is the employment of sliding windows. Sliding windows can be employed in an ANN by arranging sequential data into windows of defined length that can either overlap or not overlap with each other. These windows are then used as input for the model. However, these models lack inherent memory mechanisms to retain information from past inputs. Each window is treated independently, and the network does not naturally capture long-term dependencies in sequential data [
11]. For the above reasons, robust and scalable data-driven models, in the form of Deep Learning (DL) models, such as convolutional neural networks (CNNs), radial basis function networks (RBFN), long short-term memory networks (LSTM), recurrent neural networks (RNN) and deep belief network (DBN) are employed. Their efficacy lays in their special design’s ability to recognize complex patterns through capturing and retaining information among large sequential data, and to manage high-dimensional and spatiotemporal data resulting in accurate predictions. Among the above-mentioned prediction algorithms, the RBFN and LSTM are regarded as highly effective computational tools utilized for learning sequential data and are more suitable for predicting time series. More specifically, the efficiency of the LSTM models can be attributed to their architecture; they are structured of recurrent connected block units that are composed of at least one memory cell in order to store and access the information for a certain period. The existence of at least one memory cell helps to raise their efficiency, while also lowering the gradient error [
12]. The self-looped cells of the LSTM model are capable of learning long-term temporal dependencies in sequential data, simultaneously retaining the information from previous time steps [
8]. Agyeman et al. [
13] developed a predictive control model with discrete actuators for predicting irrigation scheduling. In this approach, the soil–water–atmosphere system is evaluated with the help of an LSTM model, aiming to assess the optimal water uptake in crops. Jetitha and Rajesh [
14] proposed a novel approach for irrigation scheduling based on a deep bi-directional LSTM model Deep Data Logger and Irrigation Activator Unit (D
2LIAU). The results indicated that the DBLSTM-based D
2LIAU approach enables efficient irrigation scheduling with a significantly higher degree of reliability in irrigation monitoring since it manages to save water consumption from 21.96% to 63.05% when compared to other controlled irrigation practices.
The main aim of the current study is to demonstrate a hybrid LSTM approach for irrigation scheduling prediction. The irrigation needs have been predicted taking into account soil, weather, and vegetation data. An LSTM model has been employed so as to investigate the predictive capabilities of soil MC reduction per day based on the acquired weather data including average daily air temperature, the total daily solar radiation (SR), the average daily relative humidity (RH), and also the Leaf Area Index (LAI) in two maize fields located in Bakırköy village in Karacabey, Turkey. Maize, a widely cultivated cereal crop, is particularly sensitive to variations in moisture levels [
3]. The study employed an LSTM model that integrates data from three distinct sources, including soil sensors, a weather station, and satellite imagery. The data measurements from soil sensors, the weather station and satellite were acquired on a daily basis for the entire investigated period. In order to overcome the lack of MC missing values in Field 1 and the limited number of soil MC measurements obtained from only four soil sensors in Field 2, the water-driven model Aquacrop 7.0 has been used [
15,
16] in order to compute the daily MC and to acquire data from two extra locations within Field 2 for the entire investigated period. The limited number of soil measurements during the investigated cultivation period has been attributed to some difficulties encountered in establishing a reliable and continuous data transmission channel between the sensors responsible for MC measurements and the data logger. These transmission issues significantly affected the quantity of the MC data collected. Aquacrop 7.0 is regarded as a reliable tool due to its proven capability of simulating successfully parameters used as input data. These types of data can include soil MC, evapotranspiration (ET), and yield in the occasion that they fail to be measured in a continuous and accurate manner during experimental field measurements [
17]. In the current study, the failure of acquiring continuous and accurate records is often attributed to sensor malfunctions and data transmission errors. However, the performance of the LSTM model reveals its significant potential in efficiently predicting the future MC reduction in maize fields, overcoming the model’s physical tendency to be influenced by the availability and quality of data which is attributed both to its architecture and the simulation effect of Aquacrop, introducing novel techniques towards more efficient and precise agricultural irrigation practices and patterns. Moreover, the current approach encourages the adoption of site-specific irrigation patterns that enable Variable Rate Irrigation (VRI), paving the way towards enhancing the annual production yields of grains.
3. Results and Discussion
In order to assess and predict the irrigation needs for each of the investigated fields, the reduction of MC within a defined period, from 18 June 2022 to 3 August 2022 needed to be assessed and predicted as well. The dataset was split into four distinct phases in response to fluctuations of crop irrigation needs and also to facilitate the repeated training and testing loops of the LSTM model to ensure that the developed model will be refined with the most pertinent data for every temporal interval, in order to yield the accurate predictions. The model predicts the decrease in MC in millimeters (mm) converting them using the volumetric water content (% VWC).
Table 5 presents the results of the LSTM predictions for each of the four investigated phases regarding Field 1. These findings reveal the model’s effectiveness in estimating the reduction in MC within the four designated time periods.
The findings derived from the LSTM predictive model demonstrate promising performance in estimating the decrease in MC in maize in Field 1 across the four discrete phases. The obtained results demonstrate an upward trend in the efficacy of the model as we move through the phases, a phenomenon that can be attributed to the increasing size of the training dataset. During the initial phase, the coefficient determination (R-squared, R2) was determined to be 0.8264, which suggests a significant degree of reliability in forecasting the decrease in moisture content. This phase corresponds to the time period from16 May to 28 June. During this phase the daily reduction of MC was low and exhibited a steady trend. As a result, the model exhibited very good performance during this phase, as the patterns observed in the testing dataset closely mirrored those present in the training data. Moving to the second phase, the R2 remains similarly high to that at phase 1, at 0.8163. While this value is slightly lower than the first phase, it remains quite impressive, since during this period the reduction in MC was more variable, which can explain the minor reduction in performance compared to the first phase. In the third phase, the model’s performance improves more, with an R2 equal to 0.8422. This increase in performance may be attributed to the continued expansion of the training dataset, giving the model the opportunity to capture a wider range of MC reduction patterns. The final phase demonstrates the highest R2 of 0.9181. This substantial increase regarding the model’s predictive accuracy indicates that the model has become exceptionally adept at forecasting MC reduction, leading to the general observation that the larger training dataset, that conveyed data from all the three previous phases, contributed significantly to enhancing the model’s predictive ability for the fourth phase.
Figure 5 illustrates the comparison between the observed values and the predicted values generated by the LSTM model for each of the nine soil sensor locations throughout the first phase in Field 1.
As observed in
Figure 5, the reduction in MC at each location is generally minimal, with the exception of the final (ninth) day. This phenomenon occurs due to the relatively low ET rates during these eight days, leading to a subtle reduction of MC. Consequently, the model exhibits a high performance across all locations throughout the initial eight-day period. On the contrary, on the ninth day of the first phase, a slight deviation between the observed and predicted MC value is indicated.
In
Figure 6 the comparison between the observed values and the predicted values derived from the LSTM model for each of the nine soil sensor locations during the second phase conducted in Field 1 are depicted. Each panel of the illustration portrays a comparison between the three sensors.
The three panels in
Figure 6 exhibit a noticeable variation between the observed and forecasted values for all sensors for the whole duration. In comparison to the first figure, it is evident from the graph depicting the MC reduction over the timeframe including 29 June 2022, to 12 July 2022, that there has been a notable decrease in MC. The considerable rate of MC reduction can be attributed to the significantly elevated levels of evapotranspiration that have been prevalent during this period. One particularly noteworthy observation is that on day 7, the most substantial decrease in MC from all the nine locations in the maize field was recorded. This day appears to have experienced conditions that contribute directly to MC reduction, including high temperatures, increased solar radiation, and lower relative humidity levels.
Figure 7 depicts the comparison between the observed values and the predicted values of MC for the third phase.
The performance of the LSTM model in the third phase appears to be significantly high for all sensors, except sensor 13. This notably abnormal behavior can be explained by the high content of clay in the soil, which contributed to the significant reduction of MC on a daily basis, and is comparatively lower in this particular location. The presented figure illustrates a consistent and significant daily decrease in MC across all locations.
Figure 8 presents a visual representation of the comparison conducted between actual and predicted MC for different locations during the fourth phase.
As previously indicated, the estimated values closely align with the observed values in this phase. Moreover, it is noteworthy that the reduction in MC during this phase is of a higher magnitude compared to the previous phases (1st, 2nd and 3rd). The performance of the LSTM model on the testing dataset for Field 2 is shown in
Table 6.
In contrast to the results observed for Field 1, in the occasion of Field 2 the LSTM model exhibits lower precision in estimating MC reduction. This difference can be attributed to multiple factors, with the limitation of data availability being the main factor. Field 2 demonstrates a comparatively constrained dataset compared to Field 1, so it would be reasonable to assume that this would have an immediate negative impact on the model’s ability to accurately capture and extrapolate the patterns of MC reduction. However, it is noteworthy to mention that the performance trends observed in Field 2 seem to follow a similar pattern to those of Field 1. In the first phase, Field 2 exhibited an R2 value of 0.8392, which indicates a good level of reliability. This phase corresponds to a period from 11 May to 28 June, which is characterized by a small and steady reduction in MC, making it easier for the model to make accurate predictions. However, during the second phase, the value of R2 decreases to 0.7602. Such a decrease of R2 in Field 2 can possibly be attributed to the increased variability or to the complexity of MC reduction patterns during this period, similar to Field 1. In the third phase, the model’s performance improves slightly with an R2 equal to 0.7992. Finally, in the fourth phase, Field 2 demonstrates an R2 equal to 0.8417. The performance of this phase concerning Field 2 closely resembles that of Field 1, indicating the model’s improvement, attributed to the inclusion of a bigger training dataset that incorporates data from the three previous phases.
Figure 9 illustrates the comparison between the observed values and the predicted values generated by the model for each of the six soil locations throughout the first phase in Field 2.
Figure 10 depicts the comparative outcomes between the actual and predicted MC after the second phase in Field 2.
At this phase, the model indicates satisfactory results in predicting the reduction of MC in locations 11, 14, and 18. For the other locations, the model appeared less accurate in predicting the MC reduction values for day eight. The uniformity in soil MC reduction during the initial five days, followed by variations in the subsequent four days across all six sensor locations, can be explained due to the occurrence of several interacting factors during the four different phases that contributed to the reduction of MC. More specifically, weather conditions, including but not limited to air temperature, humidity and solar radiation, hold a pivotal role over the rate of evapotranspiration, which captures both plant transpiration and soil evaporation. The constancy in moisture reduction during the first five days may be attributed at least partially to the consistent meteorological conditions prevailing during the investigated time period. Finally, the progressive growth of maize plants over time, followed by the extension and deepening of their root systems, contributes to the inevitable decrease of MC since the plants have a greater ability to capture water and nutrients from deeper soil layers. Consequently, the maize crop development tends to significantly alter the soil moisture kinetics patterns.
Figure 11 and
Figure 12 illustrate the comparison of the observed values and the predicted values produced by the model for each soil location during the third and fourth phases in Field 2.
Similar to Field 1, the daily changes are substantial due to the heightened evapotranspiration during these stages and the greater frequency of irrigation by the farmer. Additionally, the model’s performance exhibits improvement during phases 3 and 4, similar to what was observed in Field 1.
In summary, it can be concluded that the LSTM model exhibited encouraging potential forecasting the reduction of MC in maize fields. The results validated that the performance of the LSTM model was greatly influenced by the availability and quality of data confirming previously conducted research in the field [
37], which utilized a Bi-LSTM model, and demonstrated a notable R
2 improvement, benefiting from a comprehensive dataset spanning 14 years. A more extended period of investigation would have ideally offered a vast array of data for model training through the extensive historical datasets [
38] to ensure robust predictive capabilities. However, in the current study, irrigation needs have been effectively predicted despite the challenge that has been encountered due to the soil data measurements’ discontinuity, revealing that predictions for irrigation scheduling can be made without the existence of extensive historical datasets that often require a span of several years to decades to yield high predictive performances. The high performance of the LSTM model is attributed to its special asset among traditional RNNs, characterized by its special gates that are responsible for controlling the flow, the volume, and the type of the previously attained and processed information that is going to be maintained and forwarded from the memory cell to the next hidden state. The utilization of the Aquacrop 7.0 model to simulate soil MC data due to data limitations in the investigated fields, equipped the LSTM model with a consistent and adequate volume of data in order to perform precise predictions for irrigation scheduling in maize crops, confirming its effectiveness towards the accurate and reliable simulation of moisture content (MC) [
17] that is also mirrored in our findings.
For both investigated fields, the LSTM model becomes more efficient in predicting MC reduction as we move through the four investigated phases, since the dataset is enriched with data from previous phases. The high performance of the model has been assisted by the sufficient number of soil sensors which provided a satisfactory volume of measurements. Comparing the performance of the LSTM model between the two investigated fields, the results appear slightly lower for Field 2 compared to those for Field 1, indicating an R2 ranging from 0.7602 to 0.8417 can be explained by the limited data availability. However, Field 2 continues to show similar behavior in its predictive ability, close to that of the LSTM model demonstrated for Field 1. It is worth noting that despite the lack of a sufficient number of sensors, there were only six for Field 2, the observed variations in MC reduction occur mostly due to the same factors that are mostly related to the weather conditions and the expected crop growth over time.
Incorporating these methods into a comprehensive system, such as the predictive irrigation scheduling system we employed, underscores the value of integrating real-time data with advanced modelling techniques. This integration allows for more effective irrigation strategies, aligning with the broader goals of sustainable agriculture and water resource management. The employed LSTM model has been proven capable of maintaining its predictive accuracy, highlighting the potential for real-time, season-specific irrigation management. This advancement is particularly significant in precision agriculture, where timely and accurate predictions can lead to more efficient water usage and improved crop management strategies.
The further enhancement of the Aquacrop model with more comprehensive field data, including precise measurements of root distribution and Leaf Area Index (LAI), could substantially elevate its accuracy in simulating crop water needs. This improvement will enable more precise irrigation scheduling, optimizing water usage while ensuring optimal crop growth and yield. In the future, additional studies will include the integration of detailed data regarding the root systems and canopy characteristics, that are expected to assist the model in reflecting the actual water uptake and evapotranspiration patterns of crops in a better and more detailed manner, leading to more precise predictions regarding the irrigation needs.