1. Introduction
The soil moisture level is one of the most critical factor related to the health of crop. The stress level of crop is directly related to the amount of moisture held by the crop, which is highly dependent on the soil moisture level where the crop is planted. Since the crop yield is obviously affected by the stress level of the crop, it is important to check the soil moisture level of the crop field frequently and irrigate on the proper part of the field at a proper time. However, it is difficult and expensive to set up static sensor systems covering an entire vast crop field to measure the soil moisture levels. Since the number of people working in a huge crop field is not large currently, it takes a great amount of time for farmers to measure the soil moisture level of the whole field, which rapidly changes over time. In this paper, a soil moisture retrieval model with multispectral and infrared images from unmanned aerial vehicles (UAV) is designed with a convolutional neural network (CNN) to resolve the issues on monitoring soil moisture level of large crop fields.
The soil moisture retrieval from remotely acquired measurements is one approach to monitor the moisture level of the crop field without static sensor systems or a great amount of manpower. There have been studies in which soil moisture retrieval has been performed using measurements from satellites [
1,
2,
3,
4]. One issue is that the area covered by one pixel on the satellite images is large due to the high altitude of the satellite. This implies that the soil moisture level estimate of a large area is represented by one moisture level estimate from a single pixel of a satellite image. It is difficult to conduct remote sensing whenever monitoring is required since a satellite is available only when it passes over the crop field. Moreover, clouds above the crop field can affect the satellite images. Airplanes were suggested as another remote sensing platform for soil moisture retrieval in [
5]. Airborne images from airplanes can have higher spatial resolution than satellite images since airplanes operate at much lower altitudes than satellites. However, it is also difficult to collect data frequently with airplanes because a trained pilot is mandatory for operating them. In this paper, a small quadrotor UAV is proposed as a remote sensing platform for the soil moisture retrieval. The quadrotor UAVs can provide images with higher spatial resolution than satellites or airplanes because they are operated at significantly lower altitudes. The cost to maintain quadrotor UAVs is much smaller, and non-experts can easily operate quadrotor UAVs. This enables convenient collection of data for soil moisture retrieval whenever it is required.
The objective of soil moisture retrieval is to estimate the moisture level of the soil using remote sensing. However, it is difficult to take airborne images of bare soil when crops cover the field. This implies that it is important to select a proper sensor or a set of sensors which can provide data to estimate the moisture level of soil even when their measurements are affected by canopies. A multiple-channel radiometer called an advanced very high resolution radiometer (AVHRR) was introduced in [
1], and a microwave radiometer was applied in [
6,
7]. In [
4], a synthetic aperture radar (SAR) (ENVIronmental SATellite, ENVISAT/Advanced Synthetic Aperture Radar, ASAR) was utilized. A set of radar and another sensor has been proposed as a sensor system for soil moisture retrieval in previous studies. Radar together with AVHRR was considered in [
2], and in [
5], a combination of radar and optical camera was proposed. The issue is that these sensors or set of sensors cannot be equipped on small quadrotor UAV systems because radiometers and radar are heavy and suitable commercial radiometer or radar for small UAVs are not available at this point, to the best of the authors’ knowledge. The sensor system for the soil moisture retrieval suggested in [
3] consists of infrared(IR) sensor with Fourier transform infrared spectroscopy (FTIR) and a multispectral sensor. Commercial multispectral and IR image sensors for small UAVs are available and it is easy to incorporate these sensors with UAV systems. Thus, the combination of multispectral and IR image sensors is considered as a sensor system for the soil moisture retrieval in this paper.
An algorithm or model is required to estimate the moisture level of soil covered by crops from the airborne images of the crop field. One approach is to utilize the relationship between the value of a certain parameter for a crop field and the soil moisture level. The temperature–vegetation dryness index (TVDI), proposed in [
1], is a dryness index for a land surface. TVDI is designed by empirically parameterizing the relationship between the temperature of land surface and the normalized difference vegetation index (NDVI). The relationship between TVDI and soil moisture measurement was investigated in [
3] via correlation and regression analysis. A Bayesian approach based on backscattering coefficients was suggested in [
5] and backscattering coefficients together with emissivities were utilized in [
8]. A regression-technique-based approach was introduced in [
2] by applying the support vector machine (SVM) to radar backscatter, incidence angle, and NDVI data. The above approaches require the procedures to calculate parameters known to have relationships with soil moisture level, such as backscattering coefficients or NDVI, by applying models or equations in the previous studies to the sensor measurements. Thus, the soil moisture retrieval models based on these approaches are confined by the known relationships between the parameters and soil moisture level. The information contained in the measurements, which is irrelevant to the parameter but has the possibility to have relationship with soil moisture level, can be abandoned during these procedures. There have been research studies on neural-network-based soil moisture retrieval models [
4,
6,
7,
8]. These models figure out the relationship between input data and soil moisture level via training without any previous knowledge. However, since the input data for those neural-network-based models are the parameters such as backscattering coefficients, emissivities, or brightness temperature, the procedures to calculate these parameters from sensor measurements are still required. This work is an extension of our conference work [
9] where the CNN [
10] is utilized for the design of the soil moisture retrieval model. The CNN has been utilized for various agricultural applications [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]. One advantage of a CNN-based model is that an image with multiple layers of different wavelengths is directly utilized as an input for the model. This implies that calculations of parameters related to soil moisture level from sensor measurements are not required, resulting in no information loss, which can take place during parameter calculations. The spatial information of each pixel on input images are also maintained since the structure of the input image is not changed before it is exerted on the CNN-based model. Moreover, the CNN-based soil moisture retrieval model is trained to figure out the relationship between a soil moisture measurement and corresponding images without preliminary knowledge of their relationship. Thus, the CNN-based model is not restricted by the previous understandings, and it has a possibility to come up with and utilize unknown correlations between soil moisture level and input multispectral and IR images. The proposed CNN-based model has an advantage on online operations since the trained CNN-based model estimates the soil moisture level directly and rapidly from the acquired multispectral and IR images. Since the soil moisture level of a certain point on the ground is estimated with the CNN-based model from a set of multispectral and IR images including the pixel corresponding to the point at the center, the size of the input image for the point decides the ratio of this pixel to the overall input image. The effects of the input image size on the performance of the CNN-based model is investigated in this paper. The error on identifying the pixel corresponding to the soil moisture measurement point on the ground has significant impact on the performance of the model since the input image is not correctly matched with the soil moisture measurement data. In this paper, the performance degeneration due to the input image error on corresponding pixel identification is studied via training and testing result of the CNN-based model.
To sum up, a CNN-based model is designed in this paper to estimate a soil moisture level of a point covered by canopy from airborne multispectral and IR images including this point obtained from UAV. Each component of this approach has the following advantages. By utilizing UAV, remote sensing for soil moisture retrieval can be conducted easily by non-experts whenever it is needed. Since multispectral and IR image sensors compatible with commercial small UAVs are available, the remote sensing platform for soil moisture retrieval can be constructed easily. The CNN-based model takes multispectral and IR images directly as inputs for soil moisture level estimation. This implies that there is no loss of data in input images, such as spatial information of the pixels, and the implementation of trained model is advantageous for online operation.
This paper is organized as follows. The descriptions on the data collection for soil moisture retrieval are addressed in
Section 2. In
Section 3, the design of the soil moisture retrieval model based on CNN is proposed. Training and testing of the proposed soil moisture retrieval model is conducted in
Section 4. The overall concluding remarks of the paper are addressed in
Section 5.
4. Training, Validation, and Testing of Soil Moisture Retrieval Model
4.1. Setup for Training, Validation, and Testing
Training, validation, and testing of the proposed CNN-based soil moisture retrieval model is performed in this section to verify and validate the model and analyze the effects of input image construction on the performance of the model. Training is a process to fit the parameters of the CNN-based model, like weights on the filters of convolution layers, using a data set which is composed of input images and the soil moisture level measurements. Validation is conducted to check the performance of the model during training with the data sets that do not include data utilized for training, and its results are considered for tuning the hyperparameters. Testing is a process to check the performance of the trained model using data sets that do not include data utilized for either training or validation. In order to address the effect of input image size, four different cases of
p are defined as
, 3, 5, and 7. Training, validation, and testing are conducted iteratively for each
p with different training, validation, and testing data set for each iteration. Note that all the data for all the six bands gathered by the UAV system are utilized for constructing training, validation, and testing data sets. Each iteration is conducted with different sets of soil moisture level measurement points for training, validation, and testing. The number of iterations is set to be 50 in this paper. In each iteration, all the soil moisture level measurement points are separated into 3 sets randomly to make differences between iterations. The total number of soil moisture level measurement point is 130 in this paper. 110 points among them are randomly selected as training data for each iteration. Another 10 points are classified as a validation set and the remaining 10 points are utilized for testing after training. For the same iteration, the sets of measurement points for training, validation and testing are the same for all the
p cases. Only the difference between the
p cases is the size of the input image. The distribution of the soil moisture level measurements is shown in
Figure 7.
The next study is about the effects of the errors on obtaining the pixels for corresponding measurement points. This is conducted in the similar way as the investigation on the effects of input image size described above. The difference in this study is that the pixels corresponding to measurement points on the multispectral image are assumed to have errors on the x coordinate within the image. Thus, the coordinate of the ith pixel is assumed to be for and . The layers for the ith input image from the multispectral image are obtained with as the pixel on the center. The number of iterations is 50 and the sizes of training, validation, and testing data sets are 110, 10, and 10, respectively.
The parameters for the construction and training of the CNN-based soil moisture retrieval model are defined as follows. The sizes of the filter for both convolution layers are selected to be 1. The first convolution layer consists of 15 filters while the second convolution layer has 5 filters. ReLU is introduced as the activation functions for both convolution layers. Adam optimizer, which is a variation of stochastic first-order gradient descent optimization algorithm, is applied for the training of the CNN-based model. The maximum number of epochs for training is set to be 1000 and validations are performed for every 50 epochs. The initial learning rate is selected to be and it decreases by per 100 epochs.
4.2. Results and Analysis
An illustrative training, validation, and testing result with
is suggested in
Figure 8 to show the performance of the proposed soil moisture retrieval model.
The soil moisture level of the points for training, validation, and testing are shown in
Figure 8a. Note that the root mean square error (RMSE) is calculated from the errors between the soil moisture level estimates of the proposed CNN-based model and the actual soil moisture level measurements. With this data set, RMSE drops from
%Vol to
%Vol during training and the RMSE of validation decreases from
%Vol to
%Vol as given in
Figure 8b.
Figure 8c indicates that the magnitude of prediction error from the testing is obtained to be smaller than
%Vol, resulting in the RMSE of
%Vol. This means that the magnitude of the prediction error level, which is defined in this paper as the ratio of the soil moisture retrieval error to the actual soil moisture level, is less than
and the root mean square of the prediction error level is
. Those results of the illustrative training and testing imply that the proposed model accurately estimates the soil moisture level with small errors for the data set collected in this paper.
The training, validation, and testing results with different
p under no measurement errors are provided in
Figure 9, and the results with errors on coordinates of pixels for measurement points on the multispectral image are presented in
Figure 10.
It is shown in
Figure 9 that the soil moisture estimation error with the proposed model increases as the size of input image
p becomes larger when there is no error in the input image. As discussed before in
Section 3.1, only the pixel on the center of the
ith input image directly indicates the
ith measurement point. This means that the increase of
p can make the input image to include pixels irrelevant to the soil moisture level at the
ith point. Thus, the soil moisture level estimation becomes more accurate with smaller
p.
However,
Figure 10 indicates that the soil moisture level estimation error is the smallest with
. In this training and testing study, the pixel defined as the center of the
ith input image is set to be one pixel next to the pixel which actually corresponds to the
ith measurement point. As a result, the input images do not include the pixels corresponding to the soil moisture measurement points when
. These pixels starts to be contained in the input images as
p increases to be more than or equal to 3. Thus,
case show significantly smaller estimation error than
case. This estimation error becomes larger as
p increases since more pixels which do not corresponds to the measurement points are included in the input image.
To sum up, the soil moisture level estimation with the proposed CNN-based model is expected to be optimized by designing the size of the input image to be as small as possible while guaranteeing the pixels corresponding to the points to estimate soil moisture level to be contained in the input images.
4.3. Comparative Study
A comparative study is performed between the proposed CNN-based model and a deep-neural-network (DNN)-based model. The proposed CNN-based model utilizes the multispectral and IR images directly as input. On the other hand, since DNN-based model can take a vector as input, the multispectral and IR images are required to be restructured into a vector. When the size of images at each soil moisture measurement point is
, the image for each wavelength is restructured into a
vector by linking its columns into one vector. The input for the DNN-based model is constructed as a
vector by linking all of the
vectors obtained from the images of all six wavelengths. The procedures explained above implies not that the pixel values are changed during the above procedures, but that the spatial information of the pixels is abandoned. Thus, the comparative study performed in this paper can show the effect of losing spatial information of pixels in input images on the estimation performance. The DNN-based model for comparison has 2 hidden layers. The first layer consists of 45 neurons and the second layer has 15 neurons.
p is set to be 3 for both the proposed CNN-based and the DNN-based models. Training, validation, and testing are conducted iteratively for 50 times with different sets of soil moisture level measurement points for training, validation, and testing in each iteration. Note that the set of soil moisture level measurement points for training, validation, and testing in each iteration is the same for both models. The testing results for both the proposed CNN-based and the DNN-based models are shown in
Figure 11.
It is observed from
Figure 11 that the proposed CNN-based model utilizes the input images directly, as input shows smaller estimation error compared to the DNN-based model, which requires loss of spatial information of pixels during restructuring images into input data.
4.4. Discussion
The proposed CNN-based soil moisture retreival model shows small soil moisture estimation errors for the data collected from the UAV flight. The study on the effects of the input image size and errors on input image shows that the input image size p is required to be as small as possible while guaranteeing the pixel corresponding to the point to estimate soil moisture level to be included in the input image. The comparative study result imply that the proposed CNN-based model, which maintains information of the input images, such as spatial information of the pixels, can be advantageous on estimation performance compared to the model that requires reconstruction of the input images.
The results of this paper have the following advantages compared to the previous research on soil moisture retrieval using the multispectral and IR images in [
3]. Since the images from satellites are utilized in [
3], the spatial resolution of the images are low and the time for data acquisition cannot be decided by the operator. This implies that it is difficult to acquire precise soil moisture level of a certain point in a crop field, and the time to conduct soil moisture retrieval is highly restricted. By utilizing the proposed method in this paper, soil moisture retrieval of each point in a crop field can be conducted more precisely using the images with higher spatial resolution acquired by UAV platform flying at extremely lower altitudes, and data acquisition can be conducted by a non-expert operator at any time it is required. The methodology in [
3] requires calculation of temperature vegetation dryness index (TVDI) from the remotely sensed images, and the relationship between TVDI and soil moisture level is modelled by utilizing linear regression. However, it is shown in [
3] that the relationship between TVDI and the soil moisture level is highly nonlinear. As a result, the linear model in [
3] shows high RMSE. On the other hand, the CNN-based model proposed in this paper utilizes the remotely sensed images directly as inputs, and nonlinearities on the relationship between the input images and the output soil moisture level are considered by the CNN structure, showing low RMSE for training, validation, and testing for the data set utilized in this paper.
It is difficult to guarantee that the proposed CNN-based soil moisture retreival model trained in this section will show high estimation performance with the multispectral and IR images from other site. However, as the structure of the proposed CNN-based model is not designed specifically for the data collection site, the UAS system or the soil moisture level of the data utilized in this paper, it can be retrained with the data collected at different sites under different soil moisture level with different UAS systems with multispectral and IR image sensors. This will enable the retrained CNN-based model to be utilized for soil moisture level retrieval at the different site under different soil moisture level with different UAS systems and to show high estimation performance. Thus, the procedures to construct CNN-based soil moisture retreival model proposed in this paper can be applied for developing a soil moisture retreival model at different sites under different conditions with different UAS systems if multispectral and IR image sensors are utilized.