Soil Moisture Retrieval Model Design with Multispectral and Infrared Images from Unmanned Aerial Vehicles Using Convolutional Neural Network

This paper deals with a soil moisture retrieval model design with airborne measurements for remote monitoring of soil moisture level in large crop fields. A small quadrotor unmanned aerial vehicle (UAV) is considered as a remote sensing platform for high spatial resolutions of airborne images and easy operations. A combination of multispectral and infrared (IR) sensors is applied to overcome the effects of canopies convering the field on the sensor measurements. Convolutional neural network (CNN) is utilized to take the measurement images directly as inputs for the soil moisture retrieval model without loss of information. The procedures to obtain an input image corresponding to a certain soil moisture level measurement point are addressed, and the overall structure of the proposed CNN-based model is suggested with descriptions. Training and testing of the proposed soil moisture retrieval model are conducted to verify and validate its performance and address the effects of input image sizes and errors on input images. The soil moisture level estimation performance decreases when the input image size increases as the ratio of the pixel corresponding to the point to estimate soil moisture level to the total number of pixels in the input image, whereas the input image size should be large enough to include this pixel under the errors in input images. The comparative study shows that the proposed CNN-based algorithm is advantageous on estimation performance by maintaining spatial information of pixels on the input images.


Introduction
The soil moisture level is one of the most critical factor related to the health of crop. The stress level of crop is directly related to the amount of moisture held by the crop, which is highly dependent on the soil moisture level where the crop is planted. Since the crop yield is obviously affected by the stress level of the crop, it is important to check the soil moisture level of the crop field frequently and irrigate on the proper part of the field at a proper time. However, it is difficult and expensive to set up static sensor systems covering an entire vast crop field to measure the soil moisture levels. Since the number of people working in a huge crop field is not large currently, it takes a great amount of time for farmers to measure the soil moisture level of the whole field, which rapidly changes over time. In this paper, a soil moisture retrieval model with multispectral and infrared images from unmanned aerial vehicles (UAV) is designed with a convolutional neural network (CNN) to resolve the issues on monitoring soil moisture level of large crop fields.
The soil moisture retrieval from remotely acquired measurements is one approach to monitor the moisture level of the crop field without static sensor systems or a great amount of manpower. There have been studies in which soil moisture retrieval has been performed using measurements from satellites [1][2][3][4]. One issue is that the area covered by one pixel on the satellite images is large due to the high altitude of the satellite. This implies that the soil moisture level estimate of a large area is represented by one moisture level estimate from a single pixel of a satellite image. It is difficult to conduct remote sensing whenever monitoring is required since a satellite is available only when it passes over the crop field. Moreover, clouds above the crop field can affect the satellite images. Airplanes were suggested as another remote sensing platform for soil moisture retrieval in [5]. Airborne images from airplanes can have higher spatial resolution than satellite images since airplanes operate at much lower altitudes than satellites. However, it is also difficult to collect data frequently with airplanes because a trained pilot is mandatory for operating them. In this paper, a small quadrotor UAV is proposed as a remote sensing platform for the soil moisture retrieval. The quadrotor UAVs can provide images with higher spatial resolution than satellites or airplanes because they are operated at significantly lower altitudes. The cost to maintain quadrotor UAVs is much smaller, and non-experts can easily operate quadrotor UAVs. This enables convenient collection of data for soil moisture retrieval whenever it is required.
The objective of soil moisture retrieval is to estimate the moisture level of the soil using remote sensing. However, it is difficult to take airborne images of bare soil when crops cover the field. This implies that it is important to select a proper sensor or a set of sensors which can provide data to estimate the moisture level of soil even when their measurements are affected by canopies. A multiple-channel radiometer called an advanced very high resolution radiometer (AVHRR) was introduced in [1], and a microwave radiometer was applied in [6,7]. In [4], a synthetic aperture radar (SAR) (ENVIronmental SATellite, ENVISAT/Advanced Synthetic Aperture Radar, ASAR) was utilized. A set of radar and another sensor has been proposed as a sensor system for soil moisture retrieval in previous studies. Radar together with AVHRR was considered in [2], and in [5], a combination of radar and optical camera was proposed. The issue is that these sensors or set of sensors cannot be equipped on small quadrotor UAV systems because radiometers and radar are heavy and suitable commercial radiometer or radar for small UAVs are not available at this point, to the best of the authors' knowledge. The sensor system for the soil moisture retrieval suggested in [3] consists of infrared(IR) sensor with Fourier transform infrared spectroscopy (FTIR) and a multispectral sensor. Commercial multispectral and IR image sensors for small UAVs are available and it is easy to incorporate these sensors with UAV systems. Thus, the combination of multispectral and IR image sensors is considered as a sensor system for the soil moisture retrieval in this paper. An algorithm or model is required to estimate the moisture level of soil covered by crops from the airborne images of the crop field. One approach is to utilize the relationship between the value of a certain parameter for a crop field and the soil moisture level. The temperature-vegetation dryness index (TVDI), proposed in [1], is a dryness index for a land surface. TVDI is designed by empirically parameterizing the relationship between the temperature of land surface and the normalized difference vegetation index (NDVI). The relationship between TVDI and soil moisture measurement was investigated in [3] via correlation and regression analysis. A Bayesian approach based on backscattering coefficients was suggested in [5] and backscattering coefficients together with emissivities were utilized in [8]. A regression-technique-based approach was introduced in [2] by applying the support vector machine (SVM) to radar backscatter, incidence angle, and NDVI data. The above approaches require the procedures to calculate parameters known to have relationships with soil moisture level, such as backscattering coefficients or NDVI, by applying models or equations in the previous studies to the sensor measurements. Thus, the soil moisture retrieval models based on these approaches are confined by the known relationships between the parameters and soil moisture level. The information contained in the measurements, which is irrelevant to the parameter but has the possibility to have relationship with soil moisture level, can be abandoned during these procedures. There have been research studies on neural-network-based soil moisture retrieval models [4,[6][7][8]. These models figure out the relationship between input data and soil moisture level via training without any previous knowledge. However, since the input data for those neuralnetwork-based models are the parameters such as backscattering coefficients, emissivities, or brightness temperature, the procedures to calculate these parameters from sensor measurements are still required. This work is an extension of our conference work [9] where the CNN [10] is utilized for the design of the soil moisture retrieval model. The CNN has been utilized for various agricultural applications [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. One advantage of a CNN-based model is that an image with multiple layers of different wavelengths is directly utilized as an input for the model. This implies that calculations of parameters related to soil moisture level from sensor measurements are not required, resulting in no information loss, which can take place during parameter calculations. The spatial information of each pixel on input images are also maintained since the structure of the input image is not changed before it is exerted on the CNN-based model. Moreover, the CNN-based soil moisture retrieval model is trained to figure out the relationship between a soil moisture measurement and corresponding images without preliminary knowledge of their relationship. Thus, the CNN-based model is not restricted by the previous understandings, and it has a possibility to come up with and utilize unknown correlations between soil moisture level and input multispectral and IR images. The proposed CNN-based model has an advantage on online operations since the trained CNN-based model estimates the soil moisture level directly and rapidly from the acquired multispectral and IR images. Since the soil moisture level of a certain point on the ground is estimated with the CNN-based model from a set of multispectral and IR images including the pixel corresponding to the point at the center, the size of the input image for the point decides the ratio of this pixel to the overall input image. The effects of the input image size on the performance of the CNN-based model is investigated in this paper. The error on identifying the pixel corresponding to the soil moisture measurement point on the ground has significant impact on the performance of the model since the input image is not correctly matched with the soil moisture measurement data. In this paper, the performance degeneration due to the input image error on corresponding pixel identification is studied via training and testing result of the CNN-based model.
To sum up, a CNN-based model is designed in this paper to estimate a soil moisture level of a point covered by canopy from airborne multispectral and IR images including this point obtained from UAV. Each component of this approach has the following advantages. By utilizing UAV, remote sensing for soil moisture retrieval can be conducted easily by non-experts whenever it is needed. Since multispectral and IR image sensors compatible with commercial small UAVs are available, the remote sensing platform for soil moisture retrieval can be constructed easily. The CNN-based model takes multispectral and IR images directly as inputs for soil moisture level estimation. This implies that there is no loss of data in input images, such as spatial information of the pixels, and the implementation of trained model is advantageous for online operation.
This paper is organized as follows. The descriptions on the data collection for soil moisture retrieval are addressed in Section 2. In Section 3, the design of the soil moisture retrieval model based on CNN is proposed. Training and testing of the proposed soil moisture retrieval model is conducted in Section 4. The overall concluding remarks of the paper are addressed in Section 5.

Platform and Sensors for Data Collection
The airborne multispectral and infrared images of an agricultural field are obtained with a UAV system. In order to make the multispectral and infrared sensors to obtain the images of the same location at the same time, both sensors are equipped on the UAV simultaneously. The UAV system for the data collection consists of the commercial products to show that the proposed soil moisture retrieval model can operate with the remote sensing system, which can be easily constructed. The UAV system for the data collection is shown in Figure 1. The quadrotor UAV platform in Figure 1 is Inspire 1 of DJI [25]. The maximum takeoff sea level of this UAV is 2500 m and its maximum speed is 21.9 m/s. The weight of this UAV is 2845 g and its maximum takeoff weight is 3500 g. The maximum flight time is about 1080 s [25]. The IR sensor, which is marked with a blue circle in Figure 1, is ZENMUSE XT of FLIR [25], and the spectral band of this sensor is 7.5 ∼ 13.5 µm [25]. The multispectral image sensor utilized in this paper is RedEdge-M by MicaSense [26], which is surrounded by a red circle in Figure 1. This sensor captures the images of five different wavelengths which are near-IR, red edge, red, green, and blue at the same time. The center wavelength and bandwidth of each image are provided in Table 1.

Soil Moisture Sensor
In order to train and validate the soil moisture retrieval model proposed in this paper, the soil moisture level data corresponding to the set of multispectral and infrared images are required as ground truth. The soil moisture sensor system utilized in this paper is constructed by connecting the handheld soil moisture sensor, SM200 from Delta-T [27], and the handheld moisture meter, HH2 of Delta-T [27], shown in Figure 2.   [27]. Reprinted with permission from ref. [27]. Copyright 2019 Delta-T.
This handheld soil moisture sensor system measures the moisture content level at a single point by driving the two 51 mm stainless steel rods of SM200 into the point as deep as possible and connecting power to the sensor. The soil moisture level is displayed on the screen of HH2 in one of two units, mV and %Vol. The ground truth soil moisture data for this paper are acquired in %Vol unit.

Overall Flow of the CNN-Based Soil Moisture Retrieval Model Development
The overall flowchart of the CNN-based soil moisture retrieval model development in this paper, which includes data acquisition, training of the model, and testing of the trained model, is shown in Figure 3. The first step of the CNN-based soil moisture retrieval model development is to collect data from an agricultural field. The multispectral and IR images of the field are acquired with the quadrotor UAV platform in Section 2.1, and the soil moisture level data at multiple points in the field are measured using the soil moisture sensor introduced in Section 2.2. In the next step, the training and validation data sets are constructed from a part of the collected data, and the CNN-based soil moisture retrieval model proposed in this paper is trained and validated with these data sets by utilizing the multispectral and IR images as inputs and the soil moisture level measurements as ground truth. As a result, a trained soil moisture retrieval model is obtained, and it is tested to check the performance in the last step by utilizing the test data set constructed from multispectral and IR images and the soil moisture level measurements, which are not included in the training and validation data sets.

Data Collection Date and Site
The airborne multispectral and IR images, together with corresponding soil moisture level measurements, are collected at an agricultural field by conducting flights with the UAV system in Figure 1. The data are collected in a single day, 3 September 2018. This field for data collection is located at Claxton, York, United Kingdom. The type of the crop grown in this field at the time of data collection is potato. The picture of the field on the date for data collection is shown in Figure 4. It is clearly shown that most area of the field is covered by green canopy, which makes it difficult to obtain the airborne images of bare soil. This implies that the soil moisture level of each point on this field can be monitored with a quadrotor UAV if the soil moisture retrieval model can estimate the soil moisture level at any point covered by canopy from its airborne images.

Data Collection Procedures
The objective of the data collection is to obtain sets of multispectral and IR images together with soil moisture level at selected points. The data collection starts with the definition of two reference points on the field. Those points are marked with white plates to be clearly shown on the airborne image of any wavelength. One of the reference points is defined as an origin and the straight line connecting those two points is assumed to be a x GR -axis. A straight line that passes through the origin and perpendicular to the x GRaxis is defined as a y GR -axis. The soil moisture levels are measured at the data collection points, whose positions are defined in this new coordinate system. The number of the data collection points is 130. The airborne multispectral and IR images of the entire field are remotely obtained with the quadrotor UAV platform. Since the distance between the reference points is known and the data collection points are defined with respect to one of the reference point, the pixels on the images which coincide with the data collection points can be recognized as described in Section 3.1.
Note that the altitude of the quadrotor UAV platform is maintained during the image acquisition in order to make the spatial resolutions of all the points to be the same. If the altitude is high, the spatial resolutions of the images are low, but the overall flight time is small. The spatial resolutions can be improved with a low altitude of the quadrotor UAV platform, but the acquired images can have severe errors since the crops can rapidly move due to the vortex from propellers of the quadrotor UAV platform. Thus, the altitude for the UAV operation should be selected with those factors in considerations. The altitude of the quadrotor UAV platform is defined to be 50 m for the data acquisition in this paper.

Input Image Data for Soil Moisture Retrieval Model
The soil moisture retrieval model based on CNN estimates the soil moisture level at a certain point on the field from the input image of the point. The input image for the CNN-based model is a single image with multiple channels. Thus, the combinations of soil moisture level measurement and corresponding input image consisting of the multispectral and IR images are required for the training and validation of the model. Since the multispectral and IR images are obtained for the whole data collection site, each of the combinations are constructed by obtaining the multispectral and IR image for each soil moisture level measurement point from the images of the whole site.
In order to define the input image for each measurement point, the first step is to figure out the pixel on each image corresponding to the point. One of the reference points on the ground is defined as an origin for the coordinate system on the ground, x R1 GR , y R1 GR = (0, 0). The other reference point is assumed to be on the x GR -axis of the coordinate system on the ground x R2 GR , y R2 GR = 0, y R2 GR . The position of the ith soil moisture level measurement point is defined in this coordinate system as x i GR , y i GR . The images with 6 different wavelengths are labeled with numbers as follows for the coordinate system definitions; W L1 for IR, W L2 for Near-IR, W L3 for Red Edge, W L4 for Red, W L5 for Green and W L6 for Blue. On the image of jth wavelength, the coordinates of two reference points and ith soil moisture level measurement point are defined as x R1 W Lj , y R1 W Lj x R2 W Lj , y R2 W Lj and x i W Lj , y i W Lj , respectively. 0, y R2 GR and x i GR , y i GR are measured on the ground and x R1 W Lj , y R1 W Lj and x R2 W Lj , y R2 W Lj can be recognized easily due to the white plates. This implies that the pixel on the image of jth wavelength corresponding to ith point, x i W Lj , y i W Lj , can be found with Equation (1).
The next step is to obtain the image of jth wavelength for ith measurement point by defining area on the image of jth wavelength for the whole field, which corresponds to the ith point. The area on the image of jth wavelength for the ith measurement point is cropped as a (p × p)-square image with x i W Lj , y i W Lj as the center point. The length of the side, p, is a design parameter, which is an odd number. When p is small, the portion of the pixel for the ith point within the input image is large. The ratio of the pixel for the ith point to the input image becomes smaller by increasing p. However, the coordinate of pixels for measurement points on arbitrary images can be obtained to have errors due to the distortion of the images or errors on the coordinates of reference points. In this case, the probability that the input image includes the actual pixels corresponding to the measurement points decreases with smaller p and the probability becomes larger as p increases. The effect of p on the performance of the proposed soil moisture retrieval model is investigated in Section 4 for the cases with and without errors on the coordinate of pixel for soil moisture level measurement points.
The last step of the input image construction for the soil moisture retrieval model is to stack up the images of all of the six wavelengths for the ith measuremet point. As a result, the pixels of all six of the images corresponding to the same point on the ground are stacked up on the same position. This makes the stacked input image for the ith measurement point to possess the spatial information of each pixel, which can be recognized and utilized by a convolutional neural network. The overall procedures of cropping and stacking up the input images for the ith measurement point are illustrated in Figure 5. The soil moisture retrieval algorithms proposed in the previous studies require procedures to calculate parameters related to the soil moisture level from the measurement images and these parameters are utilized as inputs for the algorithms. This implies that only the information required for the calculation of the parameters are utilized while other data like spatial information are ignored for the soil moisture retrieval algorithms. Also, this calculation requires additional efforts and time. On the contrary, the input image construction procedures described above do not require any steps to calculate parameters from the measurement images. They only recognize pixels for the measurement points on the images and crop the images around the pixels. Thus, the input images for the proposed soil moisture retrieval model have no data loss and do not consume additional efforts and time for parameter calculations.

Soil Moisture Retrieval Model Design
The soil moisture retrieval model proposed in this paper is designed by applying CNN. CNN is utilized in the previous studies for the classification or regression from images with multiple layers of different channels. Since the soil moisture level calculation from input images can be regarded as a regression, the soil moisture retrieval model is constructed to estimate soil moisture level with input images using CNN. The proposed CNN-based soil moisture retrieval model consists of two sequentially connected convolution layers with activation functions and one fully connected layer as addressed in Figure 6. The CNN-based model is operated to estimate the soil moisture level from input images as follows. The first step of regression for the ith measurement point is applying the first convolution layer to the input image of the ith point consisting of six layers with different wavelengths. Each convolution layer has filters that are arrays of constant weighting parameters, and each filter consists of a two-dimensional array for each layer. These weights on all the filters of all the layers are optimized to minimize the estimation error of the network by training the CNN-based regression model. One or more of the filters can be defined for one convolution layer, and the size of filters is a design parameter of a CNN model. When the first convolution layer is applied to the input image, each filter for the first convolution layer is moved across each layer of the input image and convolution operations are conducted for all the possible partial images with the size of the filter of the input image. The convolution operation is to sum up all the dot products between the weighting parameters in the filter and the pixels on the input image overlapped with the filter. The result of the convolution operation is a single number representing the area covered by the filter. For each filter, a two-dimensional array is obtained by accumulating all the results from all the partial images with the relative positions between the original partial images maintained. A feature map is constructed as an output of the first convolution layer by stacking up all these layers from all the filters. This feature map is utilized as an input data for the second convolution layer. The output of the second convolution layer is another feature map constructed by operations similar with the first convolution layer. The filters of the second convolution layer are defined separately from those of the first convolution layer.
The activation functions attached after each convolution layer leads to the nonlinearities of the CNN-based regression model. The self-training of the model with backpropagation methods are applicable due to the introduction of the activation functions.
Flattening is conducted to convert the feature map from the second convolution layer into a one-dimensional vector. This vector is a input for the fully connected layer, which is constructed as a conventional multilayered neural network for regression. The training of the CNN also optimizes the weights and biases of the perceptrons in the fully connected layer to reduce the regression errors. The output of this fully connected layer is the prediction on the soil moisture level at the ith measurement point, which is given as a single number. The training and testing of the proposed soil moisture retrieval model are performed in Section 4 with the data obtained as described in Section 2. Note that the design parameters of the proposed CNN-based model, like the number of filters and their sizes in each layer, are presented in Section 4.

Setup for Training, Validation, and Testing
Training, validation, and testing of the proposed CNN-based soil moisture retrieval model is performed in this section to verify and validate the model and analyze the effects of input image construction on the performance of the model. Training is a process to fit the parameters of the CNN-based model, like weights on the filters of convolution layers, using a data set which is composed of input images and the soil moisture level measurements. Validation is conducted to check the performance of the model during training with the data sets that do not include data utilized for training, and its results are considered for tuning the hyperparameters. Testing is a process to check the performance of the trained model using data sets that do not include data utilized for either training or validation. In order to address the effect of input image size, four different cases of p are defined as p = 1, 3, 5, and 7. Training, validation, and testing are conducted iteratively for each p with different training, validation, and testing data set for each iteration. Note that all the data for all the six bands gathered by the UAV system are utilized for constructing training, validation, and testing data sets. Each iteration is conducted with different sets of soil moisture level measurement points for training, validation, and testing. The number of iterations is set to be 50 in this paper. In each iteration, all the soil moisture level measurement points are separated into 3 sets randomly to make differences between iterations. The total number of soil moisture level measurement point is 130 in this paper. 110 points among them are randomly selected as training data for each iteration. Another 10 points are classified as a validation set and the remaining 10 points are utilized for testing after training. For the same iteration, the sets of measurement points for training, validation and testing are the same for all the p cases. Only the difference between the p cases is the size of the input image. The distribution of the soil moisture level measurements is shown in Figure 7. The next study is about the effects of the errors on obtaining the pixels for corresponding measurement points. This is conducted in the similar way as the investigation on the effects of input image size described above. The difference in this study is that the pixels corresponding to measurement points on the multispectral image are assumed to have errors on the x coordinate within the image. Thus, the coordinate of the ith pixel is assumed to be x i W Lj + 1, y i W Lj for i = 1, · · · , 130 and j = 2, · · · , 6. The layers for the ith input image from the multispectral image are obtained with x i W Lj + 1, y i W Lj as the pixel on the center. The number of iterations is 50 and the sizes of training, validation, and testing data sets are 110, 10, and 10, respectively.
The parameters for the construction and training of the CNN-based soil moisture retrieval model are defined as follows. The sizes of the filter for both convolution layers are selected to be 1. The first convolution layer consists of 15 filters while the second convolution layer has 5 filters. ReLU is introduced as the activation functions for both convolution layers. Adam optimizer, which is a variation of stochastic first-order gradient descent optimization algorithm, is applied for the training of the CNN-based model. The maximum number of epochs for training is set to be 1000 and validations are performed for every 50 epochs. The initial learning rate is selected to be 0.1 and it decreases by 10% per 100 epochs.

Results and Analysis
An illustrative training, validation, and testing result with p = 1 is suggested in Figure 8 to show the performance of the proposed soil moisture retrieval model.
The soil moisture level of the points for training, validation, and testing are shown in Figure 8a. Note that the root mean square error (RMSE) is calculated from the errors between the soil moisture level estimates of the proposed CNN-based model and the actual soil moisture level measurements. With this data set, RMSE drops from 2.4735 %Vol to 2.5973 × 10 −2 %Vol during training and the RMSE of validation decreases from 1.6400 %Vol to 1.7478 × 10 −2 %Vol as given in Figure 8b. Figure 8c indicates that the magnitude of prediction error from the testing is obtained to be smaller than 6 × 10 −2 %Vol, resulting in the RMSE of 1.7478 × 10 −2 %Vol. This means that the magnitude of the prediction error level, which is defined in this paper as the ratio of the soil moisture retrieval error to the actual soil moisture level, is less than 4% and the root mean square of the prediction error level is 1.1611%. Those results of the illustrative training and testing imply that the proposed model accurately estimates the soil moisture level with small errors for the data set collected in this paper. The training, validation, and testing results with different p under no measurement errors are provided in Figure 9, and the results with errors on coordinates of pixels for measurement points on the multispectral image are presented in Figure 10.
It is shown in Figure 9 that the soil moisture estimation error with the proposed model increases as the size of input image p becomes larger when there is no error in the input image. As discussed before in Section 3.1, only the pixel on the center of the ith input image directly indicates the ith measurement point. This means that the increase of p can make the input image to include pixels irrelevant to the soil moisture level at the ith point. Thus, the soil moisture level estimation becomes more accurate with smaller p.
However, Figure 10 indicates that the soil moisture level estimation error is the smallest with p = 3. In this training and testing study, the pixel defined as the center of the ith input image is set to be one pixel next to the pixel which actually corresponds to the ith measurement point. As a result, the input images do not include the pixels corresponding to the soil moisture measurement points when p = 1. These pixels starts to be contained in the input images as p increases to be more than or equal to 3. Thus, p = 3 case show significantly smaller estimation error than p = 1 case. This estimation error becomes larger as p increases since more pixels which do not corresponds to the measurement points are included in the input image.
To sum up, the soil moisture level estimation with the proposed CNN-based model is expected to be optimized by designing the size of the input image to be as small as possible while guaranteeing the pixels corresponding to the points to estimate soil moisture level to be contained in the input images.

Comparative Study
A comparative study is performed between the proposed CNN-based model and a deep-neural-network (DNN)-based model. The proposed CNN-based model utilizes the multispectral and IR images directly as input. On the other hand, since DNN-based model can take a vector as input, the multispectral and IR images are required to be restructured into a vector. When the size of images at each soil moisture measurement point is (p × p), the image for each wavelength is restructured into a p 2 × 1 vector by linking its columns into one vector. The input for the DNN-based model is constructed as a 6p 2 × 1 vector by linking all of the p 2 × 1 vectors obtained from the images of all six wavelengths. The procedures explained above implies not that the pixel values are changed during the above procedures, but that the spatial information of the pixels is abandoned. Thus, the comparative study performed in this paper can show the effect of losing spatial information of pixels in input images on the estimation performance. The DNN-based model for comparison has 2 hidden layers. The first layer consists of 45 neurons and the second layer has 15 neurons. p is set to be 3 for both the proposed CNN-based and the DNN-based models. Training, validation, and testing are conducted iteratively for 50 times with different sets of soil moisture level measurement points for training, validation, and testing in each iteration. Note that the set of soil moisture level measurement points for training, validation, and testing in each iteration is the same for both models. The testing results for both the proposed CNN-based and the DNN-based models are shown in Figure 11. It is observed from Figure 11 that the proposed CNN-based model utilizes the input images directly, as input shows smaller estimation error compared to the DNN-based model, which requires loss of spatial information of pixels during restructuring images into input data.

Discussion
The proposed CNN-based soil moisture retreival model shows small soil moisture estimation errors for the data collected from the UAV flight. The study on the effects of the input image size and errors on input image shows that the input image size p is required to be as small as possible while guaranteeing the pixel corresponding to the point to estimate soil moisture level to be included in the input image. The comparative study result imply that the proposed CNN-based model, which maintains information of the input images, such as spatial information of the pixels, can be advantageous on estimation performance compared to the model that requires reconstruction of the input images.
The results of this paper have the following advantages compared to the previous research on soil moisture retrieval using the multispectral and IR images in [3]. Since the images from satellites are utilized in [3], the spatial resolution of the images are low and the time for data acquisition cannot be decided by the operator. This implies that it is difficult to acquire precise soil moisture level of a certain point in a crop field, and the time to conduct soil moisture retrieval is highly restricted. By utilizing the proposed method in this paper, soil moisture retrieval of each point in a crop field can be conducted more precisely using the images with higher spatial resolution acquired by UAV platform flying at extremely lower altitudes, and data acquisition can be conducted by a non-expert operator at any time it is required. The methodology in [3] requires calculation of temperature vegetation dryness index (TVDI) from the remotely sensed images, and the relationship between TVDI and soil moisture level is modelled by utilizing linear regression. However, it is shown in [3] that the relationship between TVDI and the soil moisture level is highly nonlinear. As a result, the linear model in [3] shows high RMSE. On the other hand, the CNN-based model proposed in this paper utilizes the remotely sensed images directly as inputs, and nonlinearities on the relationship between the input images and the output soil moisture level are considered by the CNN structure, showing low RMSE for training, validation, and testing for the data set utilized in this paper.
It is difficult to guarantee that the proposed CNN-based soil moisture retreival model trained in this section will show high estimation performance with the multispectral and IR images from other site. However, as the structure of the proposed CNN-based model is not designed specifically for the data collection site, the UAS system or the soil moisture level of the data utilized in this paper, it can be retrained with the data collected at different sites under different soil moisture level with different UAS systems with multispectral and IR image sensors. This will enable the retrained CNN-based model to be utilized for soil moisture level retrieval at the different site under different soil moisture level with different UAS systems and to show high estimation performance. Thus, the procedures to construct CNN-based soil moisture retreival model proposed in this paper can be applied for developing a soil moisture retreival model at different sites under different conditions with different UAS systems if multispectral and IR image sensors are utilized.

Conclusions
The soil moisture retrieval model is designed with airborne multispectral and IR images from quadrotor UAV systems in this paper. The proposed model is constructed by applying CNN to utilize images as inputs for the model without the information loss of images induced by the input parameter calculations from measurements for other models. The platform and procedures for the data collection are introduced. The input images for soil moisture level measurements are obtained by properly cropping and stacking up the areas in the multispectral and IR images corresponding to the measurement points. The structure of the proposed soil moisture retrieval model is constructed with CNN and the descriptions on each element of the model are addressed. Training and testing of the proposed model with data collected by the quadrotor UAV system is conducted to show the performance of the proposed model and investigate on the effects of input image size and pixel coordinate errors on the estimation performance of the model. The results of this training and testing imply that the size of the input images for the proposed soil moisture retrieval model should be selected to be as small as possible, whereas the pixels corresponding to points to estimate soil moisture level are guaranteed to be included in the input images. A comparative study is conducted between the proposed CNN-based soil moisture model and DNN-based model, and the results show that the proposed CNNbased model which utilizes the images directly as inputs show enhanced performances comparing to the DNN-based model.