Assessment of Surface Soil Moisture Using High-Resolution Multi-Spectral Imagery and Artificial Neural Networks

Many crop production management decisions can be informed using data from high-resolution aerial images that provide information about crop health as influenced by soil fertility and moisture. Surface soil moisture is a key component of soil water balance, which addresses water and energy exchanges at the surface/atmosphere interface; however, high-resolution remotely sensed data is rarely used to acquire soil moisture values. In this study, an artificial neural network (ANN) model was developed to quantify the effectiveness of using spectral images to estimate surface soil moisture. The model produces acceptable estimations of surface soil moisture (root mean square error (RMSE) = 2.0, mean absolute error (MAE) = 1.8, coefficient of correlation (r) = 0.88, coefficient of performance (e) = 0.75 and coefficient of determination (R2) = 0.77) by combining field measurements with inexpensive and readily available remotely sensed inputs. The spatial data (visual spectrum, near infrared, infrared/thermal) are produced by the AggieAirTM platform, which includes an unmanned aerial vehicle (UAV) that enables users to gather aerial imagery at a low price and high spatial and temporal resolutions. This study reports the development of an ANN model that translates AggieAirTM imagery into estimates of surface soil moisture for a large field irrigated by a center pivot sprinkler system.


Introduction
Soil moisture content (SMC) is an important factor in managing irrigated farms. SMC includes two main components: surface soil moisture (SSM) (held in the upper 10 cm of soil) and root zone soil moisture (held in the upper 200 cm of soil). Surface soil moisture is a key component for addressing energy and water exchanges at the land surface/atmosphere interface and can be estimated using different techniques, such as in situ measurements, physically based models, remote sensing, etc. Grayson and Western addressed the estimation of soil moisture by applying: (1) field (or in situ) measurements; (2) remote sensing techniques; and (3) soil water balance simulation models [1,2]. Soil moisture constitutes a very small volume in terms of the total global water balance, but it plays a significant role in water resources planning and management [2]. Many current crop production management decisions that are made by growers, production managers, and crop advisors in precision agriculture are already based on observation from remotely sensed data such as satellite imagery. The objective of this research is to generate surface soil moisture (SSM) estimates using high-resolution, remotely sensed data, collected at 15 cm pixel resolution, as inputs to a learning machine algorithm (Artificial neural networks (ANNs))developed under supervised learning procedures. ANNs are used to build the SSM estimation model. To our knowledge, this is the first study to document estimation of surface soil moisture using remotely sensed data at such a fine spatial resolution and readily available in the sense of temporal resolution. The results will contribute not only to efficient and reliable high-resolution multi-spectral remote sensing validation, but also to better utilization of remotely sensed soil moisture products for enhanced irrigation modeling and scheduling.
Various techniques for retrieving soil moisture content have been the subject of research for almost four decades. Gravimetric measurements of soil moisture are very reliable but are laborious. Measuring SMC with imbedded sensors, such as time and frequency domain reflectometers (TDRs and FDRs), does not require a huge investment of time or facilities; however, most of these methods suffer from some of these same disadvantages. In situ measurements can be exhaustive and expensive if large areas are involved, as these measurements are mainly "local," with a particular footprint representing moisture conditions in only a fraction of a cubic meter of soil [3]. Because of the spatial heterogeneity of soil moisture due to different soil conditions, vegetation, topography, or impact of human activities, local measurements when are carried out on a larger scale such as fields or watersheds, might result in inaccuracies [4]. Remote-sensing techniques might provide a useful tool to address these data acquisition difficulties.
Some of the early works in estimating SMC using remote sensing [5][6][7][8][9][10] established that thermal remote sensing, in concert with in situ measurements, can be used to measure, or at least quantitatively infer, soil moisture content. The possibility of estimating SSM (0-7.6 cm) from visible and near-infrared (NIR) reflectance data has also been demonstrated [11]. Optical and thermal remote-sensing techniques or passive and active microwave sensors offer large-scale monitoring of SSM [11][12][13]. Some meteorological satellites, such as the Advanced Microwave Scanning Radiometer (AMSR-E), the European Remote Sensing (ERS) satellite scatterometer or the Meteorological Satellite (METEOSAT), offer the possibility of monitoring operational SSM [3]. However, the coarse spatial resolutions (ERS-Scat: 50 km, AMSR-E: 56 km and METEOSAT: visible and infrared (IR) 5 km) of the instruments are often not consistent with the scale of hydrologic processes of interest [14,15]. A number of studies on soil moisture estimations introduced the error sources that have degraded the accuracy of satellite remotely sensed soil moisture content such that it is critical to calibrate soil moisture estimation algorithms and to validate derived products using ground-truth data. The error sources comprise radio-frequency interference (RFI) [16], vegetation water content [13,17], surface roughness [16], and land surface heterogeneity [18]. It has been stated in the literature that a space-borne sensor designed to interpret SMC on the basis of soil microwave emission, and therefore the relationship between soil dielectric constant and water content, will show considerable systematic uncertainty of around 4% with maximum figures at relatively low water content in SMC retrieval [19].
Remote-sensing measurements in the thermal IR band has given rise to the thermal inertia (TI) approach for SMC retrieval. The TI approach relates SMC to the magnitudes of the differences between daily maximum and minimum soil and crop canopy temperatures [6]. This approach retrieves SMC from models that describe TI as a function of water content [20,21]. The implementation of the TI approach is simple because knowledge of soil physical properties and climate can produce representative SMC profiles up to a depth of 1 m. The limitation of the approach, however, is its sensitivity to the uncertainty of soil physical properties, which are complex to determine spatially and are typically obtained with point measurements [22]. The TI method provides large-scale spatial coverage, but the functions are empirical and have the drawback of being site-and time-specific, such that none of them are general enough to be applied extensively [21]. Monitoring soil moisture by remote sensing includes another set of approaches that permits SSM retrieval from the information contained in satellite-derived surface temperature (Ts) and vegetation index (VI). However, one of the major drawbacks of the Ts-VI method is that, in order to have enough points in a remote-sensing image to use in the determination of the boundaries of extreme conditions, a sufficiently large number of pixels must be sampled. This limitation is a handicap when dealing with smaller scale imagery on the order of the size of a typical farm field [11].
The difficulties associated with the above introduced approaches have led researchers to look for data-driven modeling tools, such as artificial neural networks (ANNs), support vector machines (SVMs), and relevance vector machines (RVMs), to estimate soil moisture [2,3,[23][24][25][26][27]. For example, Landsat data has been used for soil moisture estimation using relevance vector and support vector machines [3]. One of the major advantages of the machine learning approach to SMC estimation is that it can provide estimates having resolutions commensurate with remotely sensed data [3].

Artificial Neural Networks (ANNs)
This section presents a brief description of ANNs relevant to this study. A three-layered feed-forward neural network (FFNN) model was developed that includes "I" input neurons, "h" hidden neurons, and "o" output neurons, which can be shown symbolically as ANNs (i,h,o) [28]. Connection weights and bias connect these neurons. Input is multiplied by the connection weights. These products are simply summed, fed through a transfer function to generate a result, and then output. The hidden layer neurons usually use a sigmoidal activation function, while the output layer neurons utilize a linear activation function. The activation functions are used to transform inputs to targeted outputs with a nonlinear regression procedure. Each ANN model requires training and testing operations. In the training operation, by minimizing the cost function (Mean Squared Error (MSE) in this study), the connection weights and bias values are optimized. Once trained, an independent set of data that was not used for training is applied to test the neural network model [26]. The issue that threatens the application of ANN-based models is the randomness of predicted output, which is fixed in this study [29]. This was carried out by applying seed generation function. Since weights are initialized randomly, seed generation function was reset to overcome the randomness of the results by fixing the weights initialization and make the results reproducible. Also the models were run for a wide range of seed values. The training operation of ANNs was performed by a back-propagation algorithm, which is the most commonly used supervised training algorithm in the multilayer feed-forward networks. The network weights are simultaneously modified by the back-propagation algorithm which seeks to minimize the difference between the targets and the computed outputs. In this kind of algorithm the processing operation is performed in a forward direction, from inputs to hidden layers and eventually to an output layer [30]. A back-propagation method uses a least-mean-square-error method and generalized-delta rule to optimize the network weights. The derivative chain rule and the gradient-descent method are utilized to adjust the network weights [31]. Forward pass and reverse pass are two main phases of the training operation. In the first phase, the input data are multiplied by the initial weights, forming weighted inputs that then are added to yield the net to each neuron. This net generates the output of the neuron after passing through an activation or transfer function.
In back-propagation networks, a derivative of the activation function modifies the network weights. Therefore, continuous-transfer functions are targeted. The Log-sigmoid transfer function and Hyperbolic tangent sigmoid transfer function are the most common continuous-transfer functions in back-propagation networks [32]. The Log-sigmoid transfer function was used in this study. The output of the neuron is transmitted to the next layer as an input, and this procedure is repeated until it reaches the output layer. The error between the network outputs and the target outputs is computed at the end of each forward pass and it is checked with a specific value. If the error passed this value, the procedure continues with a reverse pass; otherwise, training is stopped [33]. In the reverse pass, the weights in the network are modified using the error value. The modification of weights in the output layer is different from the modification of weights in the hidden layers. In the output layer, the target outputs are provided, whereas in the intermediate layers, target values do not exist [31]. Therefore, back propagation uses the derivatives of the objective function regarding the weights in the entire network to distribute the error to neurons in each layer in the entire network [33].

Selection of Possible Input Variables
One of the critical issues in training learning machine algorithms such as ANNs is to select the appropriate input variables. The idea is to choose the combination of variables that are highly correlated with soil moisture. Previous studies have shown good correlation between soil water content and infrared (IR) skin temperature and normalized difference vegetation index (NDVI), and between IR heating rate and thermal images [34,35]. Optical and microwave remotely sensed data have been used for surface soil energy balance modeling [6,11,12,36]. After collecting these variables from independent datasets, the correlation and dependency among these variables were evaluated in the study reported here. Some Vegetation Indices (VIs) are considered as input variables with some contributions in soil moisture estimations [4,[37][38][39].

Study Area
The study area is a farm in Scipio, Utah (39°14ʹN, 112°6ʹW), equipped with a center pivot irrigation system covering an area of approximately 84 acres. The main crops are alfalfa and oats, grown from April to October. Figure 1 shows the location of the farm in Utah, and provides information about the heterogeneity within the farm due to different crop types and the presence of an access road. Generally the center pivot lateral rotates clockwise and supplies irrigation water to the field at a constant rate from an upstream reservoir. In the current study a full rotation of the center pivot takes three days and six hours to irrigate the field fully to field capacity. This study was carried out for the crop growing cycle starting 16 May 2013 and ending 17 June 2013 (4 days).

Instrumentation: AggieAir Minion (Remote Sensing Platform)
AggieAir is the remote sensing platform applied in the current study. This platform is comprised of an autonomous unmanned aerial vehicle (UAV) that carries a multispectral sensor payload. The UAV navigates over the area of interest based on a pre-programmed flight plan and captures images using the on-board sensor payload system. The UAV is a small aircraft (8 feet wing span, 14-pound take-off weight) that can fly for an hour at a speed of 30 miles per hour. In this study, the UAV was equipped with visual, near-infrared, and thermal cameras and flew over the study area on four dates in 2013 (16 May, 1 June, 9 June, and 17 June), acquiring imagery with the optical cameras at 0.15 m resolution and with the thermal camera at about 60 cm. The wavelength range peaks around 420, 500, 600 and 800 nm, respectively, for blue, green, red and NIR sensors. Detailed information about the operation of the AggieAir system has been previously published by Jensen [40]. After the AggieAir UAV completes a flight mission, the aircraft may have acquired 300-400 images from each camera: visual, near-infrared, and thermal ( Figure 2a). The images can be georeferenced directly using the position and orientation of the UAV when the image was exposed (Figure 2b) [40]. EnsoMOSAIC is used to orthorectify the AggieAir imagery with high accuracy [40,41]. EnsoMOSAIC generates hundreds of tie-points between overlapping images and uses photogrammetry and block adjustment to refine the position and orientation information for each image, thereby accurately georeferencing each image (Horizontal Accuracy: 1-2 pixels Vertical Accuracy: 1.5-2 pixels (when all error sources are controlled)). EnsoMOSAIC also generates an internal digital elevation model (DEM) to compensate for distortions in the imagery caused by changing elevations. The resulting product is an orthorectified mosaic (Figure 2c) that is in 8-bit digital format. AggieAir uses a modified "reflectance mode" method to convert the digital numbers of the mosaic to reflectance values [40]. This radiometric normalization is the ratio of the digital number from the mosaic to the digital number from a spectralon white reflectance panel with known reflectance coefficients, multiplied by the reflectance factor which accounts for the zenith angle of the sun at the time, date, and location of the photos. The product of this method is an orthorectified mosaic in reflectance values. The reflectance values (for all four flights) range from 0.11 to 0.36, 0.20 to 0.49, 0.15 to 0.51 and 0.51 to 0.61 for blue, green, red and NIR, respectively. Thermal values range from 10.2 to 43.3 degrees Celsius.

Ground-Based Data Collection
In order to perform ground truthing, at the same time the AggieAir UAV flew over the study area, intensive ground sampling was conducted at precisely determined locations in the field [42]. Soil samples were collected based on a pre-defined spatial distribution map that was developed in light of the crop types and soil characteristics in the field. The data collection included almost 50 samples per AggieAir flight scattered all over the field (minimum of 12 in each quarter) to cover the soil condition properties. Further, the unusable samples were discarded and the data collected from the four days were pooled (making a data set of 184 points) and utilized in the modeling procedure. The research crew collected soil samples from the surface soil and determined gravimetric soil moisture values after the samples were oven dried and weighted. The crew also used a hand-held measuring device to make in-field measurements and double-check the laboratory soil moisture results. The device, manufactured by Decagon Inc. (Pullman, WA, USA), includes a sensor read-out and storage system for real-time readings. Called "Procheck," it was connected to a GS3 soil moisture, temperature, and EC sensor from Decagon Inc as well [43]. Figure 3 illustrates the location of soil moisture samples in the study area.

Soil Texture Analysis
The upper and lower limits of soil moisture storage in the root zone are a function of soil texture. After the soil has been saturated and drained by gravity, the soil is said to be at "field capacity," and the amount of water that remains in the root zone but which the crop can no longer extract is called the "wilting point" [44]. In order to take these two parameters in to account, 14 different points from around the field were selected for soil texture sampling. After soil type determination, the corresponding field capacity values were acquired from previously published values and considered as model inputs [45]. Figure 4 illustrates the soil field capacity map developed by utilizing a Spherical Kriging interpolation method for the information from the 14 available sampling locations.

Relevant Vegetation Indices (VIs) from AggieAir Imagery
Visual spectrum (red, green, and blue, or RGB), near-infrared (NIR), and infrared/thermal remotely sensed data and some vegetation indices (VIs) are used as input variables for the soil moisture model. All AggieAir data (RGB, NIR, and thermal imagery), normalized difference vegetation index (NDVI), vegetation condition index (VCI), enhanced vegetation index (EVI), vegetation health index (VHI), and filed capacity were chosen as model inputs with surface soil moisture as the target or output. The VHI was proposed by Kogan (1995), which is an additive combination of VCI and Temperature Condition Index (TCI) [36]. Equations ((1)-(5)) represent the vegetation indices included in this study: . (3) . .
where , and are NIR, red, and blue reflectance bands; , and L are the coefficients of the aerosol resistance term, which uses the blue band to correct for aerosol influences in the red band; and BT is the thermal brightness, which is the thermal band reflectance.

Model Validation
A K-fold cross validation was used as the model validation technique in order to generalize an independent dataset. In general, in K-fold cross validation the original dataset (including all samples) is partitioned in to K sub-data sets. Each time, a single sub-data set is retained for evaluation and the remaining (K-1) sub-data sets are used for training. This process repeats K times, and the errors for each time are estimated. Furthermore, the K model errors are averaged to represent the best model [46,47]. Since the authors were not confident about the optimal percentage of data being considered for training, testing and validation to avoid over-fitting, a 5-fold cross validation technique is applied to the original data set and Mean Squared Error (MSE) is the calculated evaluation criterion. The 5-fold cross-validation was done repeatedly, and during the training phase different values for the training technique's parameters were used in concert with different network architectures. Further, the authors ended up with the best values for number of hidden nodes and training parameters. Then with these in hand, finally the network was trained using all the data, with the best umber of hidden nodes and training parameters.

Wrapper Selection
For model construction, it is necessary to identify the best combination of input variables from the available data. A wrapper selection method was used to accomplish this. Guyon (2003) introduced the advantages of applying this method with reference to three main aspects: (1) improving the performance of predictors; (2) obtaining faster and more cost-effective predictors; and (3) providing a better understanding of the underlying process that generated the data [48]. This method is recommended over the backward selection method and is applicable to cases with a small number of inputs. Wrapper selection considers all possible combinations of input variables and develops a separate model for each combination. The models are then scored based on their predictive power, and the best model can be selected based on the corresponding score [48,49]. In order to check the goodness of fit, root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (r), coefficient of performance (e), and coefficient of determination (R 2 ) are the statistical parameters that were calculated to evaluate the performance of the many alternative models and score their predictive power [50].

Division Set Up in ANN Model Architecture
The input data division set up can have a significant influence on the performance of an ANN model. Bowden (2002) presented two methodologies for dividing data into representative subsets (training, testing and validation) with similar statistical properties. These methods were proven to develop more robust results compared to conventional approaches in which the dataset was simply divided into arbitrary subsets [51]. The methods were applied by using a 5-fold cross validation method for data generalization. Other water resources related studies have utilized Bowden's approach and concluded that it ensures that the training, testing, and validation sets are representative of the same population [52][53][54][55][56][57].
It is difficult to assess beforehand how large an artificial neural network model should be for a specific application to avoid over-fitting. Model size strongly relates to sample size, and collecting more data and increasing the size of the training set or reducing the size of a network are recommended as solutions [28]. In this study, collecting more data was impossible; therefore, the error of the validation data set was checked as alternative method of investigation [28]. As training initiates, the error for all three data sets (training, testing, and validation) decreases, and in the case of over-fitting, the error for validation set increases while the error in the training set maintains a decreasing trend. If the error in the validation set continues in a reducing trend, there is no danger of over-fitting.

Soil Moisture Data Calculation Results
In order to ground truth data and relate soil moisture values to remotely sensed data, gravimetric soil moisture measurements were checked with the corresponding in-field measurements of volumetric soil moisture using soil bulk density values that were extracted from soil texture data. A t-test comparing the gravimetric soil moisture measurements against the volumetric soil moisture measurements showed these two data sets are not statistically different at a 95 percent confidence level with P-value of 0.3. The results from the t-test indicates that either of these data sets can be used for further calculations. Finally, the gravimetric soil moisture values from four flight dates were pooled representing the maximum, minimum and mean values of 30.6, 10.1 and 19.7, respectively, and used as model targets. Also, the spatial distribution of soil moisture from high to low values is in accordance with time after irrigation. The highest values occur immediately behind the center pivot lateral, and the driest spots were concentered in front of the lateral.

. Spatial Information of Vegetation Indices
Due to heterogeneity within the field because of the different crop types, an access road, wheel tracks, the center pivot station, and historic locations of fence lines and ditch banks that once occupied the modern field, spatial analysis was required. The significance of spatial information comes from the ability of the human brain to detect spatial patterns in a map or an image. Table 1 represents the temporal and spatial changes of NDVI during the study period. The same information is provided for other three VIs in the supplementary material.

Wrapper Selection Outcome
Goodness-of-fit statistics were used to test the degree of association between the observed and estimated data. As noted previously, root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (r), coefficient of performance (e) and coefficient of determination (R 2 ) were calculated for the models to score their predictive power. In the scoring phase, the authors referred to RMSE and judged the models predictive power based on them, further MAE and R 2 were considered and finally e and r came to account. The models with high but similar predictive power were compared spatially against thermal, NIR and false color images. Also the research crew has collected a set of notes about their observations during the data collection procedure. The notes paid attention to crop types, crops growing stage, location of lateral, irrigation uniformity, wet and dry spots (created due to deficiencies in the irrigation sprinkler system), existence of wind (wind direction if it scatters the water) and weather condition. After the models with high but similar predictive power were developed, the best model was selected visually to accommodate the spatial distribution of above information. Figure 5 illustrates how schematically wrapper selection would evaluate the models for the inputs from AggieAir (RGB, NIR, and Thermal) as an example of wrapper scoring. for this study, 1023 models in 10 sets for all possible combinations of 10 inputs were developed (10 combinations of 1, 45 combinations of 2, 120 combinations of 3, 210 combinations of 4, 252 combinations of 5, 210 combinations of 6, 120 combinations of 7, 45 combinations of 8, 10 combinations of 9 and 1 combination of 10 inputs), and the model results were compared. A trial-and-error approach was utilized to select those models that worked on different numbers of neurons (up to 2 × (number of inputs) + 1 to avoid over-fitting issues), hidden layers, training functions, and division setups [32]. Finally, the model with 8 inputs (red, blue, NIR, thermal, NDVI, VCI, EVI, and field capacity) was selected because it had the best predictive power and best spatial pattern, which was checked visually. Table 2 shows the best model results for all 10 sets of combinations along with their highest predictive power statistics.

Results Extracted from Artificial Neural Networks (ANNs)
After the intensive trial and error selection procedure using cross validation procedure, a network architecture with one hidden layer and 17 nodes and a division set up of 80:10:10 with trainlm (Levenberg-Marquardt backpropagation) as a training function was selected. Figure 6  . The false color map (NIR-Green-Blue) is related to the relative density of vegetation in the image. Exposed soil (bare) is expected to have lower soil moisture content while areas with high vegetation density the opposite. The concept of using false color composite images was taken from previously published studies [56,58].   As shown in Figure 8, the soil moisture maps have a direct association with the false color composite maps. The field exterior area was not irrigated during the growing cycle and was expected to be less moist. Although the wheel tracks and the access road are located within the irrigation zone, they are expected to be drier since they are covered by bare soil, become more compacted due to traffic over them, and lose moisture rapidly. This assumption also applies to the zones where the crops have been cut.
Different crop types have different water demands and water up-take rates that cause surface soil moisture heterogeneity even after a uniform irrigation event. This heterogeneity appeared in the form of cropping patterns in the soil moisture maps. According to the clockwise rotation of the lateral, the spots with the maximum soil moisture values are expected to fall near the lateral and in a counterclockwise direction. This status is clearest in Figure 7a where the field was under a heavy irrigation event at the time the aerial imagery was captured. Table 3 shows the comparison between measured and estimated soil moisture values for different crop type zones. Soil moisture numerical values are presented in the supplementary materials. The main problem with such modelling procedures is being dependent to site and time. This implies that ground sampling and modeling will be required for every flight to ensure accurate and quality data. So far this is a handicap and should be strengthened with more studies over different types of crops, in different areas at different stages of growth. Having such a model (or a collection of models) makes this practical for routine use (independent of site and time). In addition, the current study was targeted toward showing the detailed information that can be interpreted from high resolution data. Even though such a high resolution might not be required for monitoring agricultural farm conditions that are cropped with inexpensive crops such as alfalfa and oats, this resolution presents its value for other crops that require high resolution data (e.g., vineyards, orchards). These results essentially help to justify future work to look at the value of high resolution data for precision farming activities.
One step forward in generalizing the presented modeling methodology in temporal scale could be the idea of pooling the soil moisture data collected from different dates. In the case of the current study, every single sampling location experiences four different conditions of soil moisture level, which provides a wide range of information about soil moisture status through time. This type of information makes the model more robust in its ability to simulate previously unseen soil moisture conditions through time.

Conclusions
This paper demonstrates the application of a high resolution remote sensing technology (AggieAir) for estimating surface soil moisture as a key piece of information in irrigation water management. High-resolution multi-spectral imagery, in combination with ground sampling, provided enough information for the modelling approaches to accurately estimate spatially distributed surface soil moisture.
This paper presents the results of a modelling approach utilizing ANN in concert with time and site specific information. Parallel to other modeling approaches, such as data mining algorithms or linear regression, the ANN model is calibrated for this study within the conditions of the information collected including soil moisture measurements, soil texture, crop type information, and high resolution multi-spectral imagery. While the site-specific calibrated ANN in this study cannot be used immediately in another location, the modeling procedure (identifying spatial information with the most significant contribution to soil moisture estimation (Table 2)) along with similar field measurements and high resolution multi-spectral imagery and the data mining algorithm, are transferable from this study.
Surface soil moisture estimation was accomplished with an ANN model (RMSE: 2.0, MAE: 1.3, r: 0.87, e: 0.75, R2:0.77) for four dates in 2013 (16 May, 1 June, 9 June, and 17 June). These results show the capability of the model to accurately estimate surface soil moisture. Compared to the traditional soil moisture estimations that are based on a farmer's visual perceptions or a few soil moisture samples averaged across the farm, the modelling approach presented enables greater precision in the application of water and identifies dry/wet spots and water stressed crops.
AggieAir imagery, combined with appropriate analytic tools, allows spatial estimation of surface soil moisture. These estimates were made at much finer resolutions in space and time than those available from conventional remote sensing technologies (e.g., satellite or commercial aerial photography services). Also, the application of data mining algorithms to AggieAir aerial imagery allows for quantification of actionable information for precision agriculture (soil moisture values across the field). The soil moisture maps that are produced can then be related to irrigation water management for scheduling and application rates.
The results from the wrapper selection (Table 2) prove the significance of thermal imagery as the most relevant information in surface soil moisture estimations. In the case of one input, a model with thermal imagery can estimate the soil moisture values with RMSE of approximately 3% (thermal images are provided in supplementary material section).
Soil water holding capacity as a function of soil texture plays an important role in soil moisture values. This parameter was observed by utilizing field capacity as an input to the models. Table 2 shows that field capacity is a component of most of the models. The effect of this parameter is confounded by other important inputs in the spatial distribution of soil moisture in Figure 8. Based on the information presented in Table 2, among the available vegetation indices, NDVI and VCI have a greater explanatory contribution in surface soil moisture estimates. The soil moisture maps have a good association with false color composite maps that allows for distinction of agricultural features in the field.

Future Work
Further studies will involve the estimation of surface soil moisture using other data mining algorithms and its application as a boundary condition to produce remotely sensed estimates of root zone soil moisture. In addition, Pixel-wise estimation of soil moisture could also be applied in a water balance models.

Author Contributions
Austin Jensen guided the AggieAir team in preparing the high-resolution imagery as the required data. Mac McKee and Alfonso Torres-Rua developed the research plan and experiment design and supervised the work required for this paper. Leila Hassan-Esfahani completed the literature review, data acquisition, method selection, modelling, manuscript preparation, and discussions. All authors shared equally in the editing of the manuscript.