1. Introduction
Soil heat flux (G) is one of the main components of the surface energy balance (SEB) and accounts for the energy transferred to and from the land surface and deeper layers of the ground [
1]. The SEB determines the energy and mass exchanges of the soil-plant-atmosphere continuum, surface temperature (T
s), the amount of energy available for evaporation and, ultimately, SEB defines local climate conditions and water availability [
2]. Most energy that is transferred into the soil during sunlight hours is transferred back at night. Thus, G in vegetated surfaces is generally the smallest of the SEB fluxes and has been neglected in daily considerations of the SEB with some success [
3,
4]. However, significant energy transfers occur both during the day and night, and G’s exclusion from instantaneous SEB estimations can lead to substantial errors, e.g., mid-morning G (usually between 9 am and 12 pm) can be as large as sensible heat flux (H) for some land covers [
5,
6].
Several remote sensing products provide valuable knowledge at different spatiotemporal scales to assess land surface and meteorological conditions. Motivated by this availability, many models were developed to calculate SEB fluxes from the integration of remote sensing data, mainly focused on actual evapotranspiration (ET) estimation [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Among them, the SEBAL (surface energy balance algorithm for land) is one of the most used models around the world [
11,
12,
19,
20,
21,
22,
23], with relatively accurate predictions of ET [
24,
25,
26].
However, the empirical nature of the developed G prediction models usually implicates that their application to conditions other than the ones for which they were developed (e.g., land cover, climate) may result in large uncertainties [
27], suggesting a need for local calibration for accurate estimations [
28,
29]. Moreover, errors in G can profoundly decrease the accuracy of SEB models [
1,
23,
29,
30,
31,
32]. To address this limitation, a local calibration can be applied [
21,
33], but overall this may be impractical for operational purposes, such as SEB mapping over large areas. Comparative studies revealed a wide range in models’ performance, especially across different land covers [
27,
33], and seasonal and regional biases have been identified, indicating potential issues with model comparisons at the local scale [
27].
To overcome the issue of regional biases when performing a local scale model calibration/validation, we built a network of flux towers from regional initiatives to represent the wide-ranging climate and ecosystem diversity of South America [
34]. In addition to the most used flux tower network data compilation [
35] from the large-scale biosphere-atmosphere experiment in the Amazon (LBA-ECO) [
36], we also included data from the SULFLUX (South Brazilian network of surface fluxes and climate change), ONDACBC (National Observatory of Water and Carbon Dynamics in the Caatinga Biome) [
37], and the PELD Pantanal (long-term ecological research in Pantanal) [
38], in conjunction with flux measurements supported by Brazilian universities, including the UFMT (Federal University of Mato Grosso) [
39] and the USP (University of Sao Paulo) [
40], funded by national and regional research agencies. This network encompasses most land covers in South America, corresponding to over 80% of its area (except for urban, barren, and high altitude forest land covers) [
41], and has been used as validation data for SEB flux estimations via remote sensing [
23,
32,
33].
An alternative to the existing methods of modeling G is the use of artificial neural networks (ANNs), which are universal approximators [
42] and have been successfully tested (used or applied) in several hydrological processes and modeling attempts and have shown better performance than existing conceptual and empirical models [
42,
43,
44,
45,
46,
47,
48]. ANNs are computational models analogous to the brain’s biological behavior, simulating its capabilities of learning and memorizing. The ANN relates input data to specified outputs through a series of intertwined layers of neurons. The neurons connect and transform input variables via activation functions. An ANN can be trained by pairing historical data and calibrating the connections’ synaptic weights to obtain the best relationship between inputs and outputs [
42].
One of the main disadvantages of ANNs is that their structure does not describe the underlying processes that explain the target variable, thus being labeled as black-box models [
49]. On the other hand, the empirical origin of the existing models for instantaneous G prediction also presents the same limitation. With this in mind, ANNs become an attractive methodology for G prediction. In fact, previous applications of ANNs for G prediction at the local scale [
46,
47] have yielded promising results. Considering the availability of the long time series of measured G, representative of the South American landscape [
34,
41], and the readily available remote sensing data, the usage of the ANN approach for large-scale applications is encouraged as a next step to assess its efficiency in predicting G.
Therefore, given the diversity of G estimation methods and their limited application conditions, this paper focuses on assessing the ability of ANNs to predict G over a wide range of ecosystems in South America, using long-term remote sensing and meteorological time series, and comparing the ANN’s performance to that of commonly used G models. With this study, we hope to assist future SEB closure studies and ET estimates with more accurate estimates of G and consequently H and LE.
2. Materials and Methods
2.1. Study Sites
In this study, data from 23 flux towers were used to calibrate and validate the ANNs, located in South America. The area surrounding each flux tower was considered one study site.
Table 1 lists the flux towers surrounding land covers and the data acquisition periods, while
Figure 1 displays the location and land cover of the study sites. For simplification and repeatability, the land covers were grouped into one of five types: forest, irrigated cropland, cropland, grassland, and savanna.
2.2. Input Data
Table 2 shows the input data used in the development of the ANN. Surface temperature (T
s), surface albedo (α), and enhanced vegetation index (EVI) data were obtained from a moderate resolution imaging spectroradiometer (MODIS) product at daily time intervals. Given that the training process of ANN requires a large amount of data, the MODIS products were selected due to their high temporal resolution. MODIS’s image acquisition (on-board of the Terra satellite) occurs at around 10:30 AM (local time). The net radiation (R
n) dataset was acquired from the global land data assimilation system version 2 (GLDAS 2.1) reanalysis product at a 3 h time intervals, averaged for the 9 AM to 12 PM window. Both satellite and reanalysis products were available from the Google Earth Engine platform (
https://earthengine.google.com/ [Accessed date: 30 June 2020]).
Flux tower data were obtained at hourly and sub-hourly time intervals and were also resampled and averaged to the 9 AM to 12 PM local time window. A search was executed on each flux tower dataset to remove incomplete (gap-filling) data. R
n values 10% lower than the five days moving average were discarded to remove cloudy sky data. Finally, G values beyond the range of −50% to 50% of R
n were also removed from the analysis [
74,
75,
76]. After this filtering procedure, a total of 3778 samples remained available.
Figure 2 displays the correlation plot between the observed G from the flux towers and the R
n, T
s, EVI, and α datasets. The correlation plots and r values are distinguished by land cover. Correlation among the input datasets is generally weak, with |r| < 0.50 for most land covers, where |r| is the absolute value of r. Exceptions (|r| > 0.50) are mainly found for T
s and R
n on most land covers. The correlation of observed G to input datasets is also low in general. The correlation plots are useful to identify potentially redundant inputs of the ANN. Due the low correlation among the variables, it can be assumed that they may contribute independently to the ANN predictions.
2.3. ANN’s Structure
Figure 3 displays a generic scheme of the ANNs that were built. The input data are combined into a series of processing units in the hidden layer, called neurons. These “hidden” neurons are then connected to an output neuron that converts the hidden layer’s information into the desired output. In order to transform the data, each neuron passes the received information through an activation function, such as the bipolar and unipolar sigmoid functions. The connections between neurons are established by synaptic weights (or simply weights), and these, together with the activation functions, determine the relationship between the input data and the corresponding outputs. The weight values are defined during the training process. The number of neurons in the hidden layer (m) determines the ANN complexity and is assessed based on a search process for a network with the least complexity that still obtains the same performance, with a validation sample, as an initial network, that is purposefully over-sized [
77]. The ANNs were developed with the most common inputs used in models for G computation [
43,
44,
45,
46,
47]. This allows for the assessment of the benefits of using ANNs compared to preexisting models.
The study sites were grouped into different land cover types to consider characteristics that are difficult to capture from remote sensing data, such as subcanopy vegetation density, vegetation height, and soil skin temperature. Land cover data were inputted as numerical data in the ANNs, as follows: 1 for cropland; 2 for irrigated cropland; 3 for forest; 4 for grassland; and 5 for savanna. The impact of the inclusion of land cover data as an input was assessed using a comparison of the ANNs with and without the land cover information. Thus, two networks were generated. From this point on, the two networks will be referred to as: (i) Gc: an ANN that considers land cover as an input; and (ii) Gnc: an ANN that does not consider land cover as an input.
In order to avoid preferential treatment of inputs with different magnitudes, input and output data were scaled using a normalization procedure, shown by Equation (1):
where X′ is the scaled variable; X is the raw variable; and X
bottom and X
top are the data bottom and top limits, respectively. The top and bottom limits were chosen based on observations of the 23 study sites’ available series and are shown in
Table 3. These values also represent the valid range for applications of the trained ANNs.
The developed ANNs are generically represented by Equation (2):
where O′ is the ANN’s output, equivalent to G in its scaled form; n is the number of input datasets; m is the number of neurons in the hidden layer; w
h, b
h, f
h, w
o, b
o, and f
o are the synaptic weights (w), biases (b), and activation functions (f) of the hidden (h) and output (o) layers, respectively.
The unipolar sigmoid function was used as the activation function for both the hidden and output layers.
2.4. ANN’s Training Process
The ANN’s training consisted of adjusting of the synaptic weights and biases by minimization of the output error using the backpropagation algorithm [
78]. w and b values were updated using the method presented in [
79] and the learning rate was calculated according to the training acceleration techniques proposed by [
80].
The cross-validation method was used to control overfitting. This method consists of dividing the datasets into three series of similar sizes: a training series, a validation series, and a verification series. Weight and bias updates occur to reduce the total quadratic error in the training dataset (St) with each iteration, but the quadratic error in the validation series (Sv) is also observed. The final values of w and b are the ones that provide the minimal Sv value. The ANN performance is then assessed for the verification series.
The number of iterations (or cycles) must be sufficient to identify the best iteration (minimal error). If the best cycle is too close to the end of the network training, it is possible, and probable, that a better solution could be found if there were more iterations. Thus, a method for increasing the total number of cycles was applied to the ANN training, as follows:
A starting minimum of 200,000 cycles was defined;
If the best iteration occurred before the 100,000th cycle, the training was considered complete. If it happened after, the total number of cycles was doubled, up to 400,000 cycles.
This second step was repeated either until the best cycle was found in the first half of the training process or until a limit of five duplications was reached, reaching up to 6.4 million iterations. This limit was applied to reduce computational running time and to avoid infinite loops.
In order to assess the ANN’s complexity, the training process was repeated with the progressive reduction in the number of neurons in the hidden layer, from a maximum of 12 neurons, until a decrease in its generalization capacity was observed, measured by the quadratic error in the validation series. The smallest number of neurons that did not yield a decrease in the network’s generalization capacity was then chosen.
After assessing the ANN’s complexity, the data series division was evaluated, as recommended by [
81]. One-third of the total data was reserved for the verification series. The remaining data were split into calibration and validation series. The tested split ratio range was larger than that recommended by [
81], because it was noticed that better predictions of G happened for smaller training datasets. Therefore, a broader range of split ratios needed to be tested. A sequence of tests between 10/90 and 90/10 was executed to determine the optimal calibration/validation split ratio. The quadratic error present in the validation series was used to determine the optimal data division.
Due to the ANN’s empirical nature and the randomness of the initial conditions, training runs could prematurely stop at a local minimum on the objective function surface. To avoid this, after each randomization and partitioning of the data series, the cross-validation process was repeated ten times. Since data randomization was performed 40 times, ANN training was repeated in parallel 400 times for each of the two networks created. The trained ANNs with the best performance in the validation series were then used to predict G in the verification data series.
2.5. Performance Assessment and Input Data Contribution Analysis
The ANN’s predictions were assessed against data measured in the flux towers via a set of statistical indexes, calculated for the validation data series of each of the 40 data randomizations. The Nash–Sutcliffe coefficient (NS) [
82] value in the validation dataset was used as the criterion for selecting the optimal ANN.
For comparison purposes, G was also estimated by the original and adjusted Jackson et al. (1987) [
83] and Bastiaanssen (1995) [
84] equations. Reparameterization was performed using the minimum quadratic error as the goal function.
Table 4 presents the originals and adjusted equations (for results, see
Section 3.2. Models’ performance evaluation), where T
s is the surface temperature (°C); α is the surface albedo; and NDVI is the normalized difference vegetation index.
The performances of the Gc and Gnc ANNs, and of the Jackson et al. (1987) and Bastiaanssen (1995) models, were evaluated via the comparison of NS, mean absolute error (MAE), root-mean-square error (RMSE) [
85], bias [
86], the slope coefficient (A
0) [
87], and the linear correlation coefficient (r) [
88], calculated for the verification series. The MAE, RMSE, and bias metrics were calculated as absolute values and as a percentage of the mean of the observed verification data series.
The input data sets contribution to the ANNs was calculated by the connection weights method proposed by [
89] and modified by [
90]. In this method the relative importance of the input variable is entirely derived from the synaptic weights of the ANN post-training.
4. Discussion
Uncertainties in the estimation of SEB fluxes from remote sensing data have motivated the use of ANNs as an alternative to existing models [
91,
92,
93], but reference material regarding the validation of G estimates is still scarce, compared to other SEB fluxes.
Table 6 presents the performance metrics of the G predictions via remote sensing presented by some recent studies. These works used either airborne [
14,
46,
94] or orbiting [
23,
28,
32,
33,
48,
95] platforms associated with meteorological data to estimate G. Validation occurred via comparison to flux tower data in one or more land covers. In most studies, the number of samples was low, ranging between 12 and 698. An exception is the work performed by [
27], with over 100,000 samples. The lowest MAE and RMSE values were found by [
14] in cotton crops, while the highest values were obtained by [
32] over forest. Generally, the highest correlation and lowest error values were obtained with airborne remote sensing data, which have higher spatial resolutions and are used over more homogenous land surfaces. This indicates that using remote sensing data with a finer spatial resolution may better correlate to G measurements.
Compared to the metrics presented in
Table 6, the Gc and Gnc ANNs and the Jackson et al. (1987) and Bastiaanssen (1995) models performed within expectations. However, unlike other studies, the models’ adjustments and the ANN’s training were performed for various land covers and a larger data series. This way, correlation values may be lower, however the resulting models and ANNs provide a higher generalization potential. Overall, the MAE and RMSE values of the ANNs and adjusted models in
Table 5 reside in the lower portion of the ranges presented in
Table 6.
As in our study, G predictions of the studies from
Table 6 showed a lower correlation in densely forested areas. One possible reason for this is the low variance of observed G in forests. Another possible reason is that G is dependent on the soil skin temperature, while remote sensing products provide canopy surface temperature, which can be uncorrelated in dense vegetation. In fact, [
96] verified that, despite the generally high correlation between soil and canopy temperatures, this relationship is weaker in cloud-free images, when radiative heating leads to fast changes in canopy temperature while soil temperature changes slowly. This limitation hinders the improvement of G prediction with orbiting remote sensors on forests. However, the Gc ANN yielded visibly lower errors (MAE, RMSE, and bias), indicating the superiority of this method.
The Bastiaanssen (1995) equation provided better G predictions than the Jackson et al. (1987) model, following observations of [
33] and the assumption by [
27] that models that consider T
s outperform those that do not. When compared to the existing models, both ANNs yielded better performance metrics, with Gc yielding the best predictions and closest seasonal patterns to those observed in the flux towers. Therefore, the introduction of land cover as a numerical input has also proved to be effective.
In this study, the optimal training/validation split ratio contrasts with that recommended by [
81] (30/70 against 50/50 to 70/30). It is supposed that this difference is a result of the data set magnitude. As observed by [
81], ANN training with large data sets is less sensitive to partitioning, since there is enough data to support pattern recognition. Thus, a smaller part of the data can be reserved for the training set.
The Gnc ANN yielded better performance than that of the Bastiaanssen (1995) model, despite both models using the same input data. This indicates that this ANN’s structure is more suited for large-scale G estimation. In addition, ANNs can incorporate many more inputs, potentially achieving better performance. However, including too many inputs requires expanding the complexity of the network, which causes longer computational times. The low correlation among the considered input variables (R
n, T
s, EVI, and α) provides a low redundancy level to the developed ANNs, and each one is essential to the accurate prediction of G, as also indicated by [
46]. The addition of land cover information complemented the input data series and improved performance without significantly impacting computational times. In
Figure 4, systematic biases were identified for Gnc, Jackson et al. (1987), and Bastiaanssen (1995). This indicates that existing models’ datasets do not fully comprehend the dynamics of G, further demonstrating the importance of land cover as a key factor in G prediction. Thus, additional datasets should be considered in the future development of models for G prediction.
Figure 4 shows that the average value and seasonality ranges of G vary significantly between the different land covers. It also shows that, even though this flux is of small magnitude compared to others of the SEB, it is not negligible. To the best of our knowledge, this work is one of the first large-scale model developments for G prediction in South America, comprehending multiple land covers and climates representative of most South American ecosystems. This work can potentially improve ET estimates in South America and similar regions worldwide.
5. Conclusions
In this study, ANNs were developed to calculate G via the integration of satellite remote sensing and meteorological reanalysis data. Compared to that of existing models, the performance of the generated ANNs was better overall, yielding lower errors and higher correlation values. This indicates that the ANN’s structure can better approximate the behavior of G over various land covers and climate conditions.
The inclusion of land cover information into the ANNs as an input improved the accuracy of the G predictions. The superior performance of the ANNs with land cover as an input indicates that the commonly used remote sensing data may be insufficient to fully capture the differences among the surfaces and appropriately predict G. However, it is recommended to include additional remote sensing datasets in such models, especially those used in image land cover classification, instead of land cover data. This would remedy possible issues with the lack of standardization of land cover classification systems. On the other hand, the addition of input data sets increases the complexity of the ANNs and may even reduce accuracy. Thus, parsimony is recommended when selecting predictor data sets.
These findings demonstrate that the developed ANNs can predict G spatiotemporal variability more accurately than existing models. Despite the limitation of the distribution of the available flux towers, the wide variety of land covers considered, encompassing most of South America, and the length of the time series used in the ANN’s training mean that the developed ANNs also yielded a higher generalization ability than the existing models. However, the ANN’s accuracy over high altitude and meridional land covers should also be assessed for greater reliability.
For future studies, we suggest the mapping of G over the whole South America using the ANNs and a comparison of this to existing global products. The investigation of the effects of ANN-based G estimates on error reduction of surface energy balance fluxes and evapotranspiration modelling is also recommended.