Field Data Forecasting Using LSTM and Bi-LSTM Approaches

Water, an essential resource for crop production, is becoming increasingly scarce, while cropland continues to expand due to the world’s population growth. Proper irrigation scheduling has been shown to help farmers improve crop yield and quality, resulting in more sustainable water consumption. Soil Moisture (SM), which indicates the amount of water in the soil, is one of the most important crop irrigation parameters. In terms of water usage optimization and crop yield, estimating future soil moisture (forecasting) is an essentially valuable task for crop irrigation. As a result, farmers can base crop irrigation decisions on this parameter. Sensors can be used to estimate this value in real time, which may assist farmers in deciding whether or not to irrigate. The soil moisture value provided by the sensors, on the other hand, is instantaneous and cannot be used to directly compute irrigation parameters such as the best timing or the required water quantity to irrigate. The soil moisture value can, in fact, vary greatly depending on factors such as humidity, weather, and time. Using machine learning methods, these parameters can be used to predict soil moisture levels in the near future. This paper proposes a new Long-Short Term Memory (LSTM)-based model to forecast soil moisture values in the future based on parameters collected from various sensors as a potential solution. To train and validate this model, a real-world dataset containing a set of parameters related to weather forecasting, soil moisture, and other related parameters was collected using smart sensors installed in a greenhouse in Chiang Mai province, Thailand. Preliminary results show that our LSTM-based model performs well in predicting soil moisture with a 0.72% RMSE error and a 0.52% cross-validation error (LSTM), and our Bi-LSTM model with a 0.76% RMSE error and a 0.57% cross-validation error. In the future, we aim to test and validate this model on other similar datasets.


Introduction
'Water' is one of the most important resources required for crop production. In different stages of crops life cycles, they require different amounts of water. Water influences, among other things, respiration, photosynthesis, mineral nutrient translocation, absorption, mineral nutrient utilization, and cell division. Water scarcity has a huge impact on crop quality and yield. As a result, water has an impact on nutrient availability, operation timing, and other factors, in addition to having a direct impact on crop production [1]. As a result, crops require watering in order to grow and develop. Crop watering, also known as 'irrigation,' is a method used to help crops grow as an alternative to rain-fed farming. Canals, sprinklers, pipes, sprays, drips, pumps, and other man-made devices provide irrigation [2,3].
According to the report of AQUASTAT [4], water withdrawal ratios of the Earth's freshwater are 70% in the agricultural sector for crop irrigation, 11% in municipal, and 19% in industrial, indicating that agriculture is by far the largest consumer of the Earth's available freshwater. Meanwhile, freshwater accounts for only 0.5% of the world's water, with seawater accounting for the majority (97%) and frozen water accounting for the remaining 2.5% [5]. Irrigation needs are expected to increase agriculture's global water demand by 15% by 2050 [6]. Currently, artificially irrigated areas produce approximately 40% of the world's food [7]. Agriculture's water needs, on the other hand, already compete with people's and the environment's daily needs, particularly in areas where irrigation is required, threatening ecosystem survival. According to an OECD report, agriculture production is heavily reliant on water, and water threats are becoming more prevalent as agricultural regions around the world have faced water issues in recent years [7]. Furthermore, agriculture is both the primary user of water for agricultural production and the primary polluter of water due to the use of chemical pesticides and fertilizers. Moreover, in the coming years, climate change will have a significant and uncertain impact on water supply [7]. As a result, agricultural water management must be improved in order to make agriculture more sustainable, contributing to global food and water security.
Irrigation scheduling is the process by which irrigators determine and manage crop watering frequency and duration. Farmers benefit from irrigation scheduling by increasing crop yield and quality while reducing water loss due to deep precipitation and runoff, lowering pumping costs, increasing water efficiency, and ensuring long-term sustainable water usage. Four parameters are required to successfully schedule irrigation: soil moisture content, soil water holding capacity, soil texture, and crop water use at various growing stages [8]. It is also necessary to consider the irrigation system's capacity. During the growing season, different crop types consume varying amounts of water. For example, canola consumes water at a rate of seven mm/day during pod fill, but consumes water at only two mm/day during the rosette stage. Peas, for example, can consume water at a maximum of six mm/day and no more than two mm/day during pod development [9,10]. In this paper, we focus on soil-based methods because we are predicting water requirements before drought stress occurs. Based on soil moisture measurements, the soil-based approach calculates the amount of water currently available to the crop. Smart irrigation technologies are now being used to assist irrigators with on-site field moisture measurement in order to predict soil moisture values for optimal water usage [10,11]. This prediction will be used to estimate and schedule irrigation in order to improve irrigation controls by tracking moisture-related conditions on the field and performing watering at optimal levels automatically [12]. The smart irrigation technology that this paper focuses on is soil moisture-based smart irrigation. This technology employs sensors to determine the actual moisture content of the soil. It adjusts the time of water irrigation based on this information. However, one of soil moisture sensors current limitations is their inability to report on or represent the entire farm. Farmers must install a large number of soil moisture sensors in each area of the farm to monitor soil moisture, which raises their costs. As a result, soil moisture value forecasting is a low-cost but promising software-based alternative that requires fewer sensors and can produce accurate predictions when given the right set of input data.
There are significant advantages to combining technological advances and farmer experience, such as improved crop quality and yield, as well as water savings through effective irrigation mechanisms. Our ultimate goal is to develop an automated water irrigation management system that uses a variety of technologies and tools to aid farmers' decision-making and automate the water management process. The Internet of Things (IoT) makes use of various types of sensors and wireless communication technologies to provide an efficient and effective information collection and management infrastructure. Furthermore, with the massive amounts of data that are frequently generated by such IoT devices, there must be an efficient way to analyze the collected data and use it for decision support via machine learning (ML) methods. ML methods are widely used in agriculture, for example, to predict or identify soil.
In this paper, we primarily focus on methods for forecasting soil moisture. Several machine learning methods, including Artificial Neural Networks (ANN), Random Forests 3 of 16 (RF), Support Vector Machines (SVM), and elastic net regression, were used to predict soil moisture using satellite imagery (EN). A method proposed in [13] used Landsat 8 satellite imagery as well as some geospatial data of land-use types on previously untested conditions in an Iranian semi-arid region. The authors use satellite optical and thermal sensors to calculate soil reflectance and estimate soil moisture. One study [14] proposed a soil moisture prediction model based on deep learning regression networks. Further, Ref. [15] describes a novel soil moisture prediction method in vineyards based on digital images and a multilayer perceptron (MLP) and support vector regression (SVR) implementation. Both methods presented by the authors were successful in soil moisture forecasting, with high correlation values between the predicted and measured soil moisture value when tested on unseen data. A soil moisture prediction method using a Convolutional Neural Network (CNN) is presented in [16]. The authors of [14] proposed a soil moisture prediction model based on a deep learning regression network (DNNR) using meteorological and soil moisture data. In [17], a relevance vector machine (RVM) model for content estimation was presented. Predicting soil moisture content is described using a variety of machine learning models, including Support Vector Machines (SVM), Adaptive Neuro-Fuzzy Inference Systems (ANFIS), and Multiple Linear Regressions (MLR). The authors of [18] conclude that the ANFIS and SVM models are more suitable for predicting soil water con-tent under water stress conditions. A new soil moisture prediction method in vineyards using digital images with a support vector regression (SVR) and multilayer perceptron (MLP) implementation was presented in [16]. Both methods were successful in forecasting soil moisture and produced high correlation values between measured and predicted moisture when tested on unknown data. A new ResBiLSTM model to predict soil water content was proposed by [19]. The authors of [20][21][22] all investigated soil moisture estimation using satellite-based data soil moisture content prediction in fields using a CNN-based method, which was presented in [23].
Following our review of the literature, we concluded that, due to the lack of a realworld testbed, most of the methods do not leverage data acquired from IoT sensors, and instead focus on using imaging data as input.
Consequently, this paper proposes a new LSTM-based approach to predict soil moisture and efficiently manage crop irrigation to provide intelligent irrigation while leveraging smart technologies such as the Internet of Things (IoT) to collect and manage data from various types of sensors. The paper is structured as follows. Section 2 discusses data collection and the methodology used to design our soil moisture forecasting model. Section 3 presents the results of our model, which was tested and validated using a real-world dataset that we collected. Section 4 discusses the performance and usability of our approach, as well as our approach's conclusions, and highlights potential areas for improvement to our proposed model.

Materials and Methods
In this section, we present the methodology for our new approach to predicting future soil moisture, which is based on deep learning LSTM models and uses a low-cost setup.
The LSTM was invented in 1997 by Hochreiter and Schimdhuber, however, it has gained popularity as an RNN architecture in recent years for a variety of applications [24]. The LSTM deviated from traditional neuron-based neural network architectures by introducing the concept of a memory cell. Based on its inputs, the memory cell can remember an important value rather than just the most recently computed value. Recent CNN and LSTM applications have resulted in image and video captioning systems that use natural language to caption an image or video. The CNN processes images or videos, and the LSTM is trained to translate the output of CNN to natural language [24,25].
The memory cell of LSTM has three gates (input, forgot, and output gate). They are used to control the flow from the input to the output of the cell. The input gate will control the new information when it can enter the memory. The forgot gate will check the existance of information in the memory and determines whether or not the cell can remember new data. Finally, the information in the cell is determined to be used in the output cell by the output gate. Each cell contains weights to control each gate. These weights are optimized by a training algorithm based on an error resulting of network output [25,26]. In contrast, the LSTM approach is not used for crop irrigation systems or soil moisture prediction using real-time datasets from smart sensors.
Data are the most valuable asset in any machine learning approach. We collected a large amount of data from a testbed located in our university's Innovative Village (see de-tails in Section 3.1). The data were then thoroughly preprocessed before we started the LSTM model design lifecycle to test and validate it on our data. The plan shown in Figure 1 highlights the steps in our methodology. It depicts the five steps in creating our pro-posed soil moisture forecasting model.

1.
Step 1-Data Collection: the relevant data are measured using sensors and collected on a cloud database; 2.
Step 2-Data Preprocessing: the missing data and irrelevant data will be processed in this step. A new clean dataset is the most important outcome; 3.
Step 3-Modeling and Pattern Selection: both LSTM and Bi-LSTM forecasting models are created. Moreover, a set of hyperparameters is tuned to obtain the best performance from the model. the hyperparameters in our case are the parameters that affect the performance of the proposed model comprising time step, batch size, epoch, learning rate, and split ratio; 4.
Step 4-Evaluation and Interpretation: the proposed model will be trained, tested, and validated based on the collected data.
the new information when it can enter the memory. The forgot gate will check the existance of information in the memory and determines whether or not the cell can remember new data. Finally, the information in the cell is determined to be used in the output cell by the output gate. Each cell contains weights to control each gate. These weights are optimized by a training algorithm based on an error resulting of network output [25,26]. In contrast, the LSTM approach is not used for crop irrigation systems or soil moisture prediction using real-time datasets from smart sensors. Data are the most valuable asset in any machine learning approach. We collected a large amount of data from a testbed located in our university's Innovative Village (see details in Section 3.1). The data were then thoroughly preprocessed before we started the LSTM model design lifecycle to test and validate it on our data. The plan shown in Figure  1 highlights the steps in our methodology. It depicts the five steps in creating our proposed soil moisture forecasting model.

1.
Step 1-Data Collection: the relevant data are measured using sensors and collected on a cloud database; 2.
Step 2-Data Preprocessing: the missing data and irrelevant data will be processed in this step. A new clean dataset is the most important outcome; 3.
Step 3-Modeling and Pattern Selection: both LSTM and Bi-LSTM forecasting models are created. Moreover, a set of hyperparameters is tuned to obtain the best performance from the model. the hyperparameters in our case are the parameters that affect the performance of the proposed model comprising time step, batch size, epoch, learning rate, and split ratio; 4.
Step 4-Evaluation and Interpretation: the proposed model will be trained, tested, and validated based on the collected data.

Data Collection (Study Area)
Our data were collected using a testbed at Innovative Village in Pa Daet Sub-district, Mueang, Chiang Mai, Thailand (GPS coordinates: 18.7453356, 98.9801823). The sensors are installed in the greenhouse and include a soil sensor, an air indoor sensor, and an outdoor weather station (see Figure 2). The data collection list and proposals are explained in Table  1. Every five minutes, data are collected and stored on a Google Cloud IoT database.
1. Soil sensor: used to monitor the real-time soil moisture, soil temperature, soil pH, and soil electrical conductivity (EC) which impact crops growth and health; 2. Air indoor sensor: used to monitor the real-time air temperature, relative humidity, UV index, and light intensity, which help to control the crops environment and maintain it as suitable to crop production inside the greenhouse; 3. Outdoor weather station: used to monitor the weather parameters outside the greenhouse comprising air temperature, relative humidity, UV, light intensity, rainfall or precipitation, and wind speed, which also impact the environment inside the greenhouse

Data Collection (Study Area)
Our data were collected using a testbed at Innovative Village in Pa Daet Sub-district, Mueang, Chiang Mai, Thailand (GPS coordinates: 18.7453356, 98.9801823). The sensors are installed in the greenhouse and include a soil sensor, an air indoor sensor, and an outdoor weather station (see Figure 2). The data collection list and proposals are explained in Table 1. Every five minutes, data are collected and stored on a Google Cloud IoT database.

1.
Soil sensor: used to monitor the real-time soil moisture, soil temperature, soil pH, and soil electrical conductivity (EC) which impact crops growth and health; 2.
Air indoor sensor: used to monitor the real-time air temperature, relative humidity, UV index, and light intensity, which help to control the crops environment and maintain it as suitable to crop production inside the greenhouse; 3.
Outdoor weather station: used to monitor the weather parameters outside the greenhouse comprising air temperature, relative humidity, UV, light intensity, rainfall or precipitation, and wind speed, which also impact the environment inside the greenhouse

Data Preprocessing
Following the collection of the data from multiple sensors (see Table 2) and t descriptive statistics for the dataset (see Table 3), we undertook an extensi preprocessing step to clean up the missing data. Several parameters were also scaled. T missing values from the dataset's other training samples were estimated using the me imputation technique. The Imputer class from the scikit-learn Python library [27] w used to replace a missing value with the mean value of the entire feature column.   The historical collected soil moisture value will be used for retraining the proposed forecasting model.

Soil Temperature
The historical collected soil temperature value will be used to train/retrain the proposed model. And the real-time soil temperature value will be used to predict the future value of soil moisture. 3 Indoor: Air Temperature The air indoor temperature indicates the air temperature inside the greenhouse. 4 Indoor: Relative Humidity The indoor relative humidity indicates the air moisture inside the greenhouse that helps in making a decision for irrigation.

5
Indoor: Light Intensity The indoor light intensity indicates the temperature and relative humidity inside the greenhouse. 6 Indoor: UV index The UV index value impacts the temperature and relative humidity inside the greenhouse. 7 Outdoor: Air Temperature The air outdoor temperature indicates the air temperature outside the greenhouse. 8 Outdoor: Relative Humidity The outdoor relative humidity indicates the air moisture outside the greenhouse. Outdoor: Light Intensity The outdoor light intensity impacts the temperature and relative humidity outside the greenhouse. 10 Outdoor: UV index The UV index value also impacts the temperature and relative humidity outside the greenhouse.

11
Outdoor: Wind Speed The wind speed value indicates the speed of wind outside the greenhouse that may impact the wind flow inside the greenhouse. 12 Outdoor: Wind Direction The wind direction indicates the direction of wind outside the greenhouse. 13 Outdoor: Precipitation Rate The precipitation rate indicates the rate of rainfall at that time.
14 Outdoor: Precipitation Total The precipitation total indicates the total amount of rainfall in one day.

Data Preprocessing
Following the collection of the data from multiple sensors (see Table 2) and the descriptive statistics for the dataset (see Table 3), we undertook an extensive preprocessing step to clean up the missing data. Several parameters were also scaled. The missing values from the dataset's other training samples were estimated using the mean imputation technique. The Imputer class from the scikit-learn Python library [27] was used to replace a missing value with the mean value of the entire feature column.  Regarding time, we encoded this parameter using one-hot encoding where we divided a day into 4 different periods being (see Table 4). In this section, we describe the design methodology of our LSTM-based soil moisture forecasting model. Both LSTM and Bi-LSTM are used in our model. Our design (see Figure 3) was developed through a trial phase in which we tested various model architecture settings such as layer count, size, and so on. Our model has 14 inputs, which are the environmental parameters. Following the first layer is a stack of 4 pairs of LSTM and Dropout layers. A dense layer of 12 units is used to encode the feature pattern of the input data and an output prediction unit is the prediction of the soil moisture.
It is worth noting that some the outcomes of the SupplyLedger Project (The Sup-plyLedger Project www.supplyledger.qa, accessed on 6 December 2021) were used in the development of the LSTM model.

Hyperparameters Selection
Following our preliminary tests, we concluded that a good selection of hyperparameters is related to the model's performance. In order to achieve the best results in terms of prediction accuracy and error value, we went through an extensive model hyperparameters tuning step for our model. According to our tests, the most significant hyperparameters are the model Learning Rate (LR) while training, a split ratio of training and testing data, batch size of training and testing data, time steps, and the validation model's time interval. Table 5 reports the values of the best hyperparameters based on our empirical study.
In this section, we describe the design methodology of our LSTM-based soil moisture forecasting model. Both LSTM and Bi-LSTM are used in our model. Our design (see Figure  3) was developed through a trial phase in which we tested various model architecture settings such as layer count, size, and so on. Our model has 14 inputs, which are the environmental parameters. Following the first layer is a stack of 4 pairs of LSTM and Dropout layers. A dense layer of 12 units is used to encode the feature pattern of the input data and an output prediction unit is the prediction of the soil moisture.
It is worth noting that some the outcomes of the SupplyLedger Project (The SupplyLedger Project www.supplyledger.qa) were used in the development of the LSTM model.

Hyperparameters Selection
Following our preliminary tests, we concluded that a good selection of hyperparameters is related to the model's performance. In order to achieve the best results in terms of prediction accuracy and error value, we went through an extensive model hyperparameters tuning step for our model. According to our tests, the most significant hyperparameters are the model Learning Rate (LR) while training, a split ratio of training and testing data, batch size of training and testing data, time steps, and the validation model's time interval. Table 5 reports the values of the best hyperparameters based on our empirical study. Table 5. The best hyperparameters based on our empirical study.  Based on our empirical study, the learning rate has a significant impact on the model's performance and results. As a result, we conducted a more detailed analysis to determine the best values for this parameter based on various training/testing data split ratios.
During the model's training phase, the learning rate is a ratio that is applied to the model error. Selecting the learning rate is difficult because a too-low value may impact the long process of training, which becomes stuck, whereas a too-high value may result in a suboptimal set of weights learning too quickly or in an unstable process of training. The split ratio specifies how the dataset was split into training and testing. To select the appropriate case of the forecast model, 12 cases with different values of learning rate and split ratio are shown in Table 6. There are, however, a number of hyperparameters that are critical to the performance of the proposed forecasting model. This paper will focus on optimizing the learning rate (LR) and split ratio (SR) to improve the proposed model performance.

1.
The learning rate (LR) is one of the hyperparameters that controls the change in the model in response to the estimated error each time the weights of model are updated; 2.
The split ratio (SR) is the split interval of the dataset for training and testing. Table 6 divides the various learning rates and split ratios into 12 cases. The model's performance is compared using these numerous cases. The total number of dataset samples used to test and train the forecasting model is 17,749 samples. The split ratios are divided into two categories. The first case is composed of 70% training data, which equates to 12,424 samples of the total dataset, and 30% testing data, which equates to 5325 samples of the total dataset. The second case involves 80% training data equaling 14,200 samples of the total dataset and 20% testing data equaling 3549 samples of the total dataset. Table 6 displays the values for the learning rate and split ratio. To determine the appropriate values of learning rate and split ratio for the proposed model, we must test and compare these cases. The learning rate ranges from 0.1 to 0.000001, and its value influences the training error of the proposed model. Furthermore, the split ratios are divided into two groups: 70% for training and 30% for testing in one setup, and 80% for training and 20% for testing in the other. The next hyperparameters that we tweaked were the number of time steps and the time interval. The number of time steps is a critical hyperparameter for LSTM models. It is the number of observations required by the model as input to make a future prediction. The time interval is the amount of time that elapses between the last time step in the input and the predicted future. Table 7 shows the effect of time steps and time interval values on the soil moisture validation graph. The appropriate time interval for the proposed forecasting model is also chosen when defining the appropriate time interval to forecast the next soil moisture value. In our experiments, time intervals of 12 h, 8 h, 6 h, 4 h, 3 h, 2 h, 1 h, and 30 min were used. We used 144 time steps, 96 time steps, 72 time steps, 48 time steps, 36 time steps, 24 time steps, 12 time steps, and 6 time steps. To minimize the combination, we first tested the various time intervals, and once the optimal model for a time interval was found, we tested the model with the various time steps for that interval.

Test Setup
To undertake our experiments, we used a machine with an intel ® core™ i7-6700HQ, CPU 2.60 GHz, RAM 16 GB, and GPU intel ® HD Graphics 530. Our model was implemented in Python (Jupyter Notebook 6.0.3 web-based) using the Keras deep learning library [27], having Tensorflow as backend. Our mode takes 30 min to train for 100 epochs, a 0.001 learning rate, a split ratio of 70% for training and 30% for testing, and a 72 batch size.
To increase confidence in the proposed model's results, a cross-validation step is re-quired. This entails dividing the datasets into K subsets and rotating the validation and training subsets. Finally, the model average performance is calculated by averaging the K-folds' performance. In this paper, we use K-Fold coding to divide our data into five subsets, which means that the holdout method is repeated five times, with one of the five subsets serving as the test set and the other four serving as the training set, each time.

Results and Discussion
The performance of our model was assessed using the Sklearn Python library [28], as well as the Root Mean Square Error (RMSE) and K-Fold cross-validation score after dividing the data into five subsets, as described in Section 2.1. We trained our model with 100 epochs of various settings and hyperparameters. In this section, we report the forecasting model's training and validation results based on the data we collected and preprocessed.
The different learning rates and split ratios are divided into 12 cases, as shown in Table 5.   Figure 4), the values o training error, test error, and RMSE validation error are quite low, indicating that th models perform well. When the LSTM model is compared to six cases (cases 5-10), case has a lower train error, test error, and RMSE validation than the others, indicating that th LSTM model's training performance is a 0.03% error, a 0.08% error, and a 1.057% RMS error. A comparison of the Bi-LSTM model across six cases shows that case 7 (the yellow box in Figure 5) has a lower train error, test error, and RMSE validation than the other (cases 5 to case 10). This means that the Bi-LSTM model has a training error of 0.03%, testing error of 0.04%, and a model validation error of 0.783%. As shown in Case 5, th appropriate learning rate and split ratio values for the LSTM model are 0.001, 70% (fo training), and 30%, respectively (for testing); see the yellow box in Figure 4. Th appropriate learning rate and split ratio values for the Bi-LSTM model are 0.0001 and 70% (for training) and 30% (for testing), respectively, as shown in case 7 (see the yellow box i Figure 5). forecasting model's training and validation results based on the data we collected preprocessed. The different learning rates and split ratios are divided into 12 cases, as show Table 5. Figures 4 and 5 show the comparison results for the different cases (LSTM m and Bi-LSTM model) in order to compare the best results.  Figure 4), the valu training error, test error, and RMSE validation error are quite low, indicating tha models perform well. When the LSTM model is compared to six cases (cases 5-10), c has a lower train error, test error, and RMSE validation than the others, indicating th LSTM model's training performance is a 0.03% error, a 0.08% error, and a 1.057% R error. A comparison of the Bi-LSTM model across six cases shows that case 7 (the y box in Figure 5) has a lower train error, test error, and RMSE validation than the o (cases 5 to case 10). This means that the Bi-LSTM model has a training error of 0.0 testing error of 0.04%, and a model validation error of 0.783%. As shown in Case appropriate learning rate and split ratio values for the LSTM model are 0.001, 70% training), and 30%, respectively (for testing); see the yellow box in Figure 4 appropriate learning rate and split ratio values for the Bi-LSTM model are 0.0001 and (for training) and 30% (for testing), respectively, as shown in case 7 (see the yellow b Figure 5).   Figure 4), the values of training error, test error, and RMSE validation error are quite low, indicating that the models perform well. When the LSTM model is compared to six cases (cases 5-10), case 5 has a lower train error, test error, and RMSE validation than the others, indicating that the LSTM model's training performance is a 0.03% error, a 0.08% error, and a 1.057% RMSE error. A comparison of the Bi-LSTM model across six cases shows that case 7 (the yellow box in Figure 5) has a lower train error, test error, and RMSE validation than the others (cases 5 to case 10). This means that the Bi-LSTM model has a training error of 0.03%, a testing error of 0.04%, and a model validation error of 0.783%. As shown in Case 5, the appropriate learning rate and split ratio values for the LSTM model are 0.001, 70% (for training), and 30%, respectively (for testing); see the yellow box in Figure 4. The appropriate learning rate and split ratio values for the Bi-LSTM model are 0.0001 and 70% (for training) and 30% (for testing), respectively, as shown in case 7 (see the yellow box in Figure 5).
Following the selection of appropriate learning rate and split ratio values, the prediction model is tested with different time intervals that include forecasting for the next 12 h, 8 Figures 6 and 7 show the error results of forecasting soil moisture values different time intervals using the LSTM and Bi-LSTM models, respectively. The L model is expected to perform well in the next hour, with a training error rate of 0.0 testing error rate of 0.06%, and a validation error rate of 0.024% (see the red box in F 6). In contrast, the Bi-LSTM model is expected to perform well in the next 30 min, w training error of 0.01%, a testing error of 0.02%, and a validation error of 0.515% R (see the red box in Figure 7).
As a result, the soil moisture forecasting model with LSTM and Bi-LSTM mod chosen for the next 1 h and 30 min, as shown in Figures 8 and 9.    Figures 6 and 7 show the error results of forecasting soil moisture values different time intervals using the LSTM and Bi-LSTM models, respectively. The L model is expected to perform well in the next hour, with a training error rate of 0.0 testing error rate of 0.06%, and a validation error rate of 0.024% (see the red box in F 6). In contrast, the Bi-LSTM model is expected to perform well in the next 30 min, w training error of 0.01%, a testing error of 0.02%, and a validation error of 0.515% R (see the red box in Figure 7).
As a result, the soil moisture forecasting model with LSTM and Bi-LSTM mod chosen for the next 1 h and 30 min, as shown in Figures 8 and 9.  Figures 6 and 7 show the error results of forecasting soil moisture values over different time intervals using the LSTM and Bi-LSTM models, respectively. The LSTM model is expected to perform well in the next hour, with a training error rate of 0.03%, a testing error rate of 0.06%, and a validation error rate of 0.024% (see the red box in Figure 6). In contrast, the Bi-LSTM model is expected to perform well in the next 30 min, with a training error of 0.01%, a testing error of 0.02%, and a validation error of 0.515% RMSE (see the red box in Figure 7).
As a result, the soil moisture forecasting model with LSTM and Bi-LSTM models is chosen for the next 1 h and 30 min, as shown in Figures 8 and 9.  Figures 8 and 9 illustrate the results of forecasting soil moisture values for the next hour (LSTM model) and 30 min (Bi-LSTM approach). When compared to measurements and soil moisture value forecasting, the results in Figures 8 and 9, both models perform well in predicting soil moisture value, with approximately 0.06% and 0.15% of soil moisture value error predicted using the LSTM and Bi-LSTM models, respectively. To estimate validity of the performance of the models, K-fold cross-validation is used, and the total effectiveness of our LSTM and Bi-LSTM models is calculated by averaging the results of all five folds, as shown in Table 8.   Figures 8 and 9 illustrate the results of forecasting soil moisture values for the next hour (LSTM model) and 30 min (Bi-LSTM approach). When compared to measurements and soil moisture value forecasting, the results in Figures 8 and 9, both models perform well in predicting soil moisture value, with approximately 0.06% and 0.15% of soil moisture value error predicted using the LSTM and Bi-LSTM models, respectively. To estimate validity of the performance of the models, K-fold cross-validation is used, and the total effectiveness of our LSTM and Bi-LSTM models is calculated by averaging the results of all five folds, as shown in Table 8.   Figures 8 and 9 illustrate the results of forecasting soil moisture values for the next hour (LSTM model) and 30 min (Bi-LSTM approach). When compared to measurements and soil moisture value forecasting, the results in Figures 8 and 9, both models perform well in predicting soil moisture value, with approximately 0.06% and 0.15% of soil moisture value error predicted using the LSTM and Bi-LSTM models, respectively. To estimate validity of the performance of the models, K-fold cross-validation is used, and the total effectiveness of our LSTM and Bi-LSTM models is calculated by averaging the results of all five folds, as shown in Table 8. Table 7 compares measured and predicted soil moisture values, as well as error estimation results from our LSTM and Bi-LSTM models using K-Fold cross-validation. In both the LSTM and Bi-LSTM models, the error discrepancy between measured and predicted soil moisture values is quite small, according to the results. The LSTM model, on the other hand, has a larger error between predicted and measured soil moisture values than the Bi-LSTM model. In terms of cross-validation error, the LSTM results in all five trials, as well as the averaged overall error estimation, are lower than Bi-LSTM, which is a 0.72% RMSE error and a 0.52% cross-validation error. The RMSE error for Bi-LSTM is 0.76%, and the cross-validation error is 0.57%. -RSME loss 0.77% 0.81% -CV loss 0.60% 0.66% 3.5. Round 5 -RSME loss 0.69% 0.73% -CV loss 0.48% 0.54% 3.6. Averaged overall error estimation -RSME loss 0.72% (+/−0,06%) 0.76% (+/−0,06%) -CV loss 0.52% (+/−0,08%) 0.57% (+/−0,08%) The proposed soil moisture forecasting using LSTM and Bi-LSTM models accurately predicts soil moisture value, according to the results. However, while modeling the proposed model, we had to test the learning rate in each case individually using the Adam optimizer, which took some time. The Adam optimizer is, even still, working on the proposed model's construction; however, the Adam optimizer works best on different datasets and requires drastically different learning rate schedules. Furthermore, these two models use a small dataset for training, testing, and validation, which may have an impact on model performance, and they use data from a single location.

Conclusions and Future Works
Water management for crop production is a difficult subject with implications for water sustainability. However, managing this resource is costly, requiring the use of numerous hardware tools, such as soil sensors, to effectively manage crop irrigation. In this paper, we proposed a novel method for estimating soil moisture in the context of crop production water management. We use machine learning to forecast soil moisture in the future using the output of low-cost IoT sensors. We propose a soil moisture forecasting model with Long-Short Term Memory based on our deep learning approach (LSTM). The data we use to train and validate our model were collected on a testbed in the Thai province of Chiang Mai. An array of IoT sensors, including a soil sensor, a water sensor, an air sensor, and a weather station, is used to collect data. An extensive data preprocessing step is performed to clean the collected data. The LSTM model we propose uses environmental indicators to predict future soil moisture based on farm environmental data. Our model was extensively tuned, and we tested various setups and architectures. In the future, more datasets will be used to estimate the performance of our models. In addition, we will put our model through its paces in various locations to see how well it performs. The data we used are available at https://github.com/SFDataset/DataSet.git (accessed on 6 December 2021) (see an Appendix A). Data Availability Statement: The dataset presented in this paper is available at: https://github. com/SFDataset/DataSet.git.