Soil Heat Flux Modeling Using Artificial Neural Networks and Multispectral Airborne Remote Sensing Imagery

The estimation of spatially distributed crop water use or evapotranspiration (ET) can be achieved using the energy balance for land surface algorithm and multispectral imagery obtained from remote sensing sensors mounted on airor space-borne platforms. In the energy balance model, net radiation (Rn) is well estimated using remote sensing; however, the estimation of soil heat flux (G) has had mixed results. Therefore, there is the need to improve the model to estimate soil heat flux and thus improve the efficiency of the energy balance method based on remote sensing inputs. In this study, modeling of airborne remote sensing-based soil heat flux was performed using Artificial Neural Networks (ANN). Soil heat flux was modeled using selected measured data from soybean and corn crop covers in Central Iowa, U.S.A. where measured values were obtained with soil heat flux plate sensors. Results in the modeling of G indicated that the combination Rn with air temperature (Tair) and crop height (hc) better reproduced measured values when three independent variables were considered. The combination of Rn with leaf area index (LAI) from remote sensing, and Rn with surface aerodynamic resistance (rah) yielded relative larger overall correlation coefficient values when two independent variables were included using ANN. In addition, air temperature (Tair) may be a key variable in the modeling of G as suggested by the ANN application (r of 0.83). Therefore, it is suggested that Rn, LAI, rah and hc and potentially Tair be considered in future modeling studies of G. OPEN ACCESS Remote Sens. 2011, 3 1628


Introduction
There are several available models to spatially estimate the surface energy balance components from remote sensing [1].In particular, the soil heat flux (G) models have only been applied, with different degrees of success, in conditions similar to those in which they were developed (e.g., a given vegetation type, soil background, and environmental conditions) [2,3].There is a need to further understand the variables that explain G for a range of surface crops and for different environmental and climatic conditions.Net radiation (R n ) estimates from remote sensing are fairly accurate according to Neale et al. [4], but soil and sensible heat flux (H) estimates need more research [5,6].In the case of G, Chávez [5] evaluated several G models for remote sensing applications, found in the literature, over corn and soybean fields.In that study, the models were a function of vegetation indices and a budget of short and long wave radiation.Results of that study indicated that the tested G models under predicted measured G with mean bias errors and root mean square errors (MBE ± RMSE) varying from −5.4 ± 25 to −47 ± 44.3 W m −2 (i.e., an error of approximately 30%) and therefore showing that there was a need to further infer on other surface or climatic variables that could help improve the modeling of soil heat flux for remote sensing purposes.In another study over cotton fields [6], it was found that a G model that incorporated surface thermal temperature, retrieved using a remote sensing sensor, performed better −9.9 ± 20 W ma −2 (MBE ± RMSE) than simply using vegetation indices and a budget of short and long wave radiation.Thus, there is a need to identify appropriate variables that would improve the prediction of G. Improvement in the estimation of the energy balance variables is needed as latent heat flux (LE) is usually obtained as a residual from the energy balance equation (LE = R n − G − H).Estimation errors in terms of the energy balance equation (i.e., R n , G, and H) would propagate and affect the accurate estimation of LE or distributed vegetation water use.
The soil heat flux is the energy that passes through the surface of the soil (in or out).During the day G flows into the soil (positive flux) adding energy to it and thus increasing its temperature.However, during the night time the heat stored during the day in the soil is released to the atmosphere (negative flux) cooling the soil down.Since the incident solar and atmospheric radiation reaching the soil surface is reduced by the amount of the vegetation biomass presence then several authors [7][8][9] have suggested relating G to R n and to a vegetation index (i.e., an index like NIR/RED, NDVI, SAVI, OSAVI, etc.).NIR is surface reflectance in the Near Infra-red band of the electro-magnetic spectrum, RED is surface reflectance in the red band, NDVI is the normalized difference vegetation index that uses reflectance values in the RED and NIR bands, SAVI is the soil adjusted vegetation index, and OSAVI is the optimized soil adjusted vegetation index.OSAVI, SAVI and NDVI use surface reflectance in the RED and NIR bands and are used to quantify the amount of vegetation in the land surface.Some examples of the use of vegetation indices (VI), in an effort to relate linearly G to R n , include Clothier et al. [7] who used the NIR/RED reflectance ratio for an alfalfa surface cover and Kustas and Daughtry [10] who used the NDVI index.Other researchers [2,3,11] used an exponential relationship to relate G to NDVI and R n .Additional methodologies to obtain G from remote sensing inputs suggested just a constant function of R n [8,9,12].However, G models can estimate soil heat flux with a wide degree of uncertainty depending on the vegetation and environmental conditions considered and the method used in their development [2,3,5].As shown above, developers of G models have been limited to a few numbers of variables (i.e., net radiation and VI) and statistical methods (simple linear regressions and exponential fits of G to R n and VI) thus the resulting G models, of Clothier et al. [7] and Kustas and Daughtry [10] did not present a very large coefficient of determination (R 2 ).Their coefficients of determination were 0.74-0.76for studies involving VI (NIR/RED) over alfalfa, cotton, and bare soil.Perhaps some variables other than R n and VI (e.g., plant parameters, surface and atmospheric conditions) could improve G modeling and increase the coefficient of determination.
As an alternative, Artificial Neural Networks (ANNs) represent one of several techniques derived from the artificial intelligence (AI) field of knowledge.They are computational models based on the behavior of biological neural networks, and can be adjusted (trained) so that a particular input leads to a specified target output.ANN can be employed as a data analysis tool, instead of multiple regression analysis (MRA), for forecasting and prediction based on historical data.ANN is most likely to be superior to other techniques under certain conditions, such as fuzzy inputs, subtle patterns, unpredictable non-linear data, and chaotic data (in the mathematical sense) [13].ANN techniques have been applied to different hydrological processes and were able to model them more accurately by integrating a large number of variables and boundary conditions.For instance, ANN have been successfully used to predict long-range precipitation [14], groundwater levels [15], multi-ecosystems carbon flux [16], nitrogen and ammonia removal [17], and evapotranspiration [18][19][20].
In a particular study conducted in semi-arid regions [21], reference evapotranspiration values predicted using ANN showed better agreement with observed values than those obtained with the well-known Hargreaves' method.
Therefore, in this study, ANNs were used to identify the variables that contributed to the flux of heat into the soil and model the process considering multispectral remote sensing inputs.

Materials and Methods
Data were acquired near the city of Ames, Iowa at the Walnut Creek watershed over an area of approximately 12 × 22 km.The data acquisition in Iowa was part of the 2002 soil moisture and water cycle field experiments conducted in support to the Aqua Advanced Microwave Scanning Radiometer (AMSR), NASA's Global Water and Energy Cycle Program, and future satellite missions for Terrestrial Hydrology.Main elements of the experiment were validation of AMSR brightness temperature and soil moisture retrievals, extension of instrument observations and algorithms to more challenging vegetation conditions, integration of land surface and boundary layer measurements, and the evaluation of new instrument technologies for soil moisture remote sensing.The Soil Moisture Experiments in 2002 (SMEX02) and the Soil Moisture-Atmosphere Coupling Experiment (SMACEX02) were conducted in Iowa over a one-month period between mid-June and mid-July.An overview article provides background and including rationale for study, site description, experimental design, hydro-meteorological conditions and summary of results by Kustas et al. [22].

Remote Sensing Data
Multispectral digital imagery were acquired over soybean and corn fields in the study area using the USU airborne digital remote sensing system [23,24].The system comprised four cameras: (a) three Kodak Megaplus digital frame cameras, for short wave bands, with interference filters centered in the green (0.545-0.560 μm), red (0.665-0.680 μm) and NIR (0.795-0.809 μm) portions of the electro-magnetic spectrum; (b) one Inframetrics 760 thermal infrared scanner (8-12 µm) camera that provided surface radiometric temperature images.High-resolution multispectral imagery was acquired on several different days during the intensive sampling period of the SMACEX experiment during the months of June and July 2002.Three major flight dates were 16 June (DOY 167), 1 July (DOY 182) and July 8 (DOY 189), planned to coincide with Landsat 5 Thematic Mapper overpasses.These regional flights covered the entire study area (12 × 22 km) at a nominal pixel resolution of 1.5 m from an altitude of approximately 3,350 m (11,000 feet above ground level, a.g.l.).Imagery was also acquired on the same days from a lower altitude of 2,100 m (7,000 ft a.g.l.) over selected fields containing energy balance stations (flux flights), resulting in a short wave bands pixel resolution of 1 m.The short wave camera lens F-stop settings were f5.6 for the green and red bands and f8.0 for the NIR.The shutter speed settings were 10 ms for the first flights (DOY 167 and 182) and 15 ms for the DOY 189 flight.The sky conditions for all flight time periods were mostly free of clouds.Considerable atmospheric interference due to smoke was present during the acquisition period on DOY 189.The prevailing wind direction during all the flights was from the south to southwest.
The short-wave images were corrected for lens vignetting effects and geometric radial distortions in procedures similar to those described in the literature [23,25].The individual spectral images were registered into 3-band images and rectified to a 1:24,000 digital orthophotoquad base map.The individual rectified images were then mosaicked along the flight lines to form image strips that in turn were stitched together to form a mosaic of the entire study area, for each regional flight.
The digital numbers of the rectified multispectral image strips were converted to radiance using a system calibration method [23].These radiances were divided by the incoming solar irradiance obtained from radiance measured, with an Exotech radiometer, concurrently over a barium sulfate standard reflectance panel with known bi-directional properties [26] to obtain surface spectral reflectance.The reflectance imagery calibration was verified using ground-truth reflectance values obtained with a similar Exotech ground 'roaming' radiometer.Surface reflectance readings performed with the roaming Exotech radiometer were carefully related to geographic coordinate readings obtained with a global positioning system (GPS) device in order to properly compare surface reflectance measured on the ground (roaming radiometer) and by the airborne system (digital cameras) at the exact same areas or locations.
The thermal infrared imagery was mosaicked along the flight lines and rectified to the high-resolution 3-band image mosaic described above.The digital numbers were transformed into apparent (at sensor) temperature values using the system calibration bar at the bottom of each image.The images were corrected for atmospheric and surface emissivity effects using MODTRAN v3.0 [27], an Atmospheric Radiative Transfer Model (software) following published recommendations and procedures [28,29].This correction resulted in at-surface radiometric temperatures.Radiosonde observations acquired at the Lidar site during the over flights were used to obtain the necessary input data to the MODTRAN model.
All image processing was conducted using the ERDAS Imagine software.Spatial distribution of the fluxes was obtained using the high resolution calibrated and geo-referenced imagery as inputs along with ground measured meteorological data at the towers, by programming the appropriate equations within ERDAS Imagine model maker.
Surface reflectance images were used to develop vegetation indices, percent cover, surface albedo, leaf area index, and crop height.Remote sensed surface radiometric temperature and albedo images along with weather data (air temperature, relative humidity, and barometric pressure) were used to obtain net radiation; as described in [30].

Soil Heat Flux Plate Sensor Data
Soil heat flux plates (model HFT3, REBS, Seattle, WA, USA), four at each Eddy Covariance (EC) site of monitoring stations, were used to measure soil heat flux (G).In addition, the stations were equipped with two thermal infrared thermometers (IRT) from Apogee (Logan, UT, USA).One of these instruments was located below the canopy measuring an "effective" soil brightness temperature and the other was located on the tower viewing the canopy from nadir, nominally at 5 m a.g.l. for the corn sites and 3 m a.g.l. for the soybean sites.Soil heat flux was measured at each station with four soil heat flux plates distributed across the corn and soybean rows and installed at an 8 cm depth along with soil temperature thermocouples placed within the topsoil layer and soil moisture sensors to account for heat storage above the plates.
There were a total of 11 different EC flux stations in nine different corn and soybean fields and from DOY 167, 169, 174, 182, 184, and 189.A detailed description of the micro-meteorological data, net radiation, crop biophysical characteristics, and instrumentation setup can be found in [30].
Table 1 below, and Table A1 in Appendix A summarize the available data used in this study.Nine (9) independent variables were used to predict one (1) dependent variable.The independent variables were: net radiation (R n ), horizontal wind speed (U), air temperature (T air ), relative humidity (RH), aerodynamic resistance (r ah ), surface temperature from airborne remote sensing (Ts RS), leaf area index (LAI), crop height (h c ), and the height of wind speed measurement (Z m ).The dependent variable was the soil heat flux (G).There were 50 records for each variable.As shown in [7] and [10], G has been related to R n and VI with good success.In this study, G was also thought to be related to U because of the ability of wind to remove heat from the surface [5,30].In this regard, also the height of the wind speed readings was included because this height along with the roughness of the surface will dictate the magnitude of wind recorded.Similarly, r ah was used in this study because it encapsulates surface conditions as vegetation roughness, U, Z m , and h c .In the case of T air and RH, they were included in the analysis considering that they could be an indication of the capacity of the atmosphere to retain and/or transmit heat to the surface.Ts RS was also included since surface temperature, along with VI, captured using remote sensing techniques can be instrumental in distinguishing vegetated/bare and dry/wet surfaces [30].Finally, LAI was included because it is known [5] that the larger the value of LAI the smaller the fraction G/R n .

ANN Models
The development of a model for a particular type of artificial neural network includes the selection of the number of layers, activation functions, and training algorithms for the network.Figure 1 shows a schematic of the initial neural network model used in this study, consisting of the input layer with nine (9) nodes or units (independent variables), one hidden layer with four (4) nodes, and the output layer with one (1) node (G, the dependent variable).The nodes of one layer are connected to the nodes of the other layers by synapses (weights), and combined using specific transfer functions.Predicted values at the output layer are compared with observed values and, if necessary, weights are adjusted and the process is repeated.The models referred to in this study correspond to regression models.Once the initial model, using all 9 independent variables, was analyzed, additional models were constructed using 8, 7, 6, 5, 4, 3 and 2 independent variables.The goal was to find a high performance model developed with the fewest possible independent variables.The independent variables were combined using the Multinomial Coefficient (C n,k ), which is a generalization of the Binomial Coefficient [31,32], and which allows to select subsets of k elements from a set of n elements without repetitions: where C n,k Number of possible subsets of k from the set of n; n Total number of independent variables (size of the set); k Number of independent variables to be grouped from the set n > k.

ANN Software
Once all models were constructed, an artificial neural network software was used to train and test them.The ANN software used in this project was the NeuroIntelligence V.2.2 from Alyuda Research, Inc. [33], which includes a friendly user interface and a number of options for the different stages of the analysis.

Data Preprocessing
The available data were randomly divided into three sets: 70% for training the algorithms, 15% for validation, and 15% for testing the results obtained from the adopted ANN.This division can also be performed following a specific pattern, and the percentages used for training, validation, and testing can also be different than the ones used in this study.Due to the small size of the data set available, the existing anomalies in the data, such as missing values and outliers, were replaced by the average value rather than discarding them.This approach, however, could introduce distortions in the model.Additionally, all variables involved were treated as numerical values, and were scaled into an appropriate range [−1,1] before their entry in the network.

Type of Network
There are many types of ANN, which can be classified according to several criteria: need of supervision, architecture, training algorithm, task performed, etc. Neuro-intelligence utilizes a supervised type of network called Multiple Layer Perceptron (MLP), very suited for prediction tasks [33].This latter ANN type was the one selected for our study.

Network Architecture
To design a network, it is necessary to specify the network architecture (number of hidden layers and nodes, or neurons, in each layer) and network properties (error and activation functions).For our case, an architecture consisting one (1) hidden layer was selected, with the number of neurons for this layer equal to half neurons of the input layer.The number of neurons in the input layer was in itself a variable which changed as the number of independent variables was changing (from 9 to 2).The output layer consisted of only one neuron for all cases, G (the dependent variable).The activation functions used for both the hidden and the output layers were logistic and hyperbolic tangent [34,35].

Training Algorithms
All training algorithms supported by NeuroIntelligence were applied to the analysis of the data.Those training algorithms were: Quick Propagation (QP), Conjugate Gradient Descent (CGD), Quasi-Newton (Q-N), Limited Memory Quasi-Newton (LMQ-N), Levenberg-Marquardt (L-M), Online Back Propagation (OBP), and Batch Back Propagation (BBP).

Termination Criterion
The termination criterion (when to stop training) used was the number of iterations.A number of 1,000 iterations were used each time a new configuration was trained.

Testing
The performance of each network model was assessed numerically using both the Correlation Coefficient, r [−1,1] and the Coefficient of Determination, R 2 [−1,1].It is convenient to note that R 2 is equal to r 2 in linear regression analyses, but that is not necessarily the case in ANN.The testing was conducted for all data sets: training, validation, and testing.If the coefficients did not reach the minimum values accepted (r and R 2 ≥ 0.90) then the network was re-trained by adjusting the weights (jogging), choosing a different training algorithm or changing the activation functions.When the models included only 2-3 independent variables in the input layer, the performance requirements were lowered (r and R 2 ≥ 0.80) due to the increasing difficulty of correctly predicting the dependent variable, G.

Results and Discussion
Table 2 shows the performance of one (1) ANN model created using all nine independent variables.All training algorithms were used to estimate G, the dependent variable.Results showed that, for most cases, values of both the Correlation Coefficient (r) and the Coefficient of Determination (R 2 ) were very high (r and R 2 ≥ 0.90), particularly for the Training step which uses 70% of the available data.The Testing component is the only one that uses unseen data and, for most cases, the performance of the model was not very good (r and R 2 << 0.90) probably due to the small size of the data set available compared to the number of independent variables used initially.It was not necessary to switch from Logistic to Hyperbolic Tangent activation function to obtain those values.Overall, the performance of the model was very good for the training and validation phases, reaching values of both r and R 2 ≥ 0.90.However, the analysis included all independent variables and therefore we cannot determine with precision at this point the variables that affect the dependent variable (G) the most.In view of this, additional models with fewer independent variables were needed.The process of reducing the number of independent variables, in this study, was based on the performance of the models.Additional techniques that can be applied include the Proper Orthogonal Decomposition (POD), the Principal Component Analysis (PCA), and others.Table 3 presents the performance of nine ( 9) ANN models created with combinations of eight (8) independent variables (9 = 9!/((8!*(9-8)!)).Once more, high values of r and R 2 (≥90) were obtained for all Training cases.For most cases, the Quick Propagation training algorithm (the first one tried) was enough to reach those values.However, in two cases it was necessary to switch from Logistic to Hyperbolic Tangent activation functions.At this stage, low values for R 2 (<<0.90)started to show during the Validation and Testing steps.It seems that the large number of independent variables included in each model resulted in high values of r and R 2 , and that the removal of one variable did not affect significantly the model performance (values of r and R 2 still above 0.90 for all models during the Training step).However, additional models with fewer independent variables still seem to be needed to narrow down the true or appropriate variables that properly model G.   (7) independent variables (36 = 9!/((7!*(9-7)!)).Again, high values of r and R 2 (≥90) were obtained for all Training cases, and the Logistic activation function was used for most cases; however, it was necessary to try more training algorithms, mainly the Conjugate Gradient Descent and the Quasi-Newton to obtain high performance values.As seen in the models of eight independent variables, relatively low values of R 2 occur in some cases during the Validation and Testing steps.Due to the fact that, at this point, all models meet the performance criteria, additional independent variables need to be removed from subsequent models in order to determine driving forces in the estimation of the dependent variable (G).
When the number of independent variables was reduced to six (6), five (5), and four (4), respectively, the resulting ANN models started to fail to reach the minimum threshold values set as acceptable (r and R 2 ≥ 0.90 for Training).This failure indicated that some independent variables, which were dropped from the analysis, were driving forces in the estimation of G and, therefore, their inclusion in the process was necessary in models with fewer independent variables to produce a successful ANN model for soil heat flux.Examples of those variables were R n (mainly), LAI, T air , U, and/or RH.
Table 4 presents the performance of ANN models constructed with combinations of three independent variables only.As it is shown, in some cases the minimum values of r and R 2 (initially set as ≥0.90) were not totally reached, even when using the two activation functions and the seven training algorithms.Good independent variables in the modeling of G, with r and R 2 values very close to target values, appear to be the combination of R n with T air and h c , and R n with T air and RH.The combination of R n with T air and h c is preferred over R n with T air and RH because the former combination includes the variable "crop/vegetation height" which is a measure of the amount of biomass (that can be modeled with remote sensing algorithms) that plays a role in the regulation of the energy transferred to the ground.In contrast, the combination R n with T air and RH does not include the vegetation on the surface in any direct way.In addition, RH already includes T air in the form of saturation vapor pressure.
Values of r and R 2 ≥ 0.80 are still indicators of good performance for the models.Therefore, it was decided to lower the set threshold of r and R 2 to ≥0.80 and further try combinations of only two independent variables.Finally, Table 5 shows the performance of ANN models constructed with combinations of only two independent variables.In those cases, it was even more difficult to obtain high values of r and R 2 .However, the combinations R n with LAI, and R n with RH seem to consistently perform better, particularly for Training, showing the largest values of r.Other combinations failed, even using the two (2) transfer functions and the seven (7) training algorithms available in Neurointelligence.considered.Nevertheless, net radiation (R n ) resulted being the main driving force behind G, as it was present in all successful combination models including combination of three and two independent variables.Furthermore, the combination of R n with T air and h c seems to better reproduce measured values of G, when three independent variables were considered; while the combination of R n with r ah and R n with LAI followed as second and third choices, respectively, when two independent variables were considered.In addition, it is suggested that T air be included as an important variable in future estimations using larger datasets because it was present in all successful models with three variables.
Therefore, this study showed that it was possible to model soil heat flux using ANN and remote sensing derived independent variables.
The software used in this study, Neurointelligence, proved to be powerful yet easy to use; however, it included only one type on neural networks (multiple layer perceptron).It is recommended to try different types of neural networks in future studies, as well as other approaches for data division and independent variables reduction.

Figure 1 .
Figure 1.Schematic of the initial artificial neural network model.

Table 1 .
Summary of available data.

Table 2 .
Performance of the ANN model created using all independent variables and training algorithms.

Table 3 .
Performance of ANN using combinations of eight independent variables.

Table A2 in
Appendix A presents the performance of thirty six (36) ANN models created with combinations of seven

Table 4 .
Performance of ANN using combinations of three independent variables.

Table 5 .
Performance of ANN using combinations of two independent variables.

Table A2 .
Performance of ANN using combinations of seven independent variables.