MODIS-Based Estimation of Terrestrial Latent Heat Flux over North America Using Three Machine Learning Algorithms

Terrestrial latent heat flux (LE) is a key component of the global terrestrial water, energy, and carbon exchanges. Accurate estimation of LE from moderate resolution imaging spectroradiometer (MODIS) data remains a major challenge. In this study, we estimated the daily LE for different plant functional types (PFTs) across North America using three machine learning algorithms: artificial neural network (ANN); support vector machines (SVM); and, multivariate adaptive regression spline (MARS) driven by MODIS and Modern Era Retrospective Analysis for Research and Applications (MERRA) meteorology data. These three predictive algorithms, which were trained and validated using observed LE over the period 2000–2007, all proved to be accurate. However, ANN outperformed the other two algorithms for the majority of the tested configurations for most PFTs and was the only method that arrived at 80% precision for LE estimation. We also applied three machine learning algorithms for MODIS data and MERRA meteorology to map the average annual terrestrial LE of North America during 2002–2004 using a spatial resolution of 0.05◦, which proved to be useful for estimating the long-term LE over North America.


Introduction
Terrestrial latent heat flux (LE) plays an important role in the global water cycle, carbon, and surface energy exchanges [1][2][3].LE can be measured directly by using lysimeter and eddy covariance (EC) flux towers, but the former method is time-consuming, expensive, and needs carefully planned experiments [4].Although the FLUXNET project has provided EC flux towers to measure LE, it is inherently difficult to accurately estimate LE, especially at large spatial scales due to the heterogeneity in the terrestrial landscape.Moreover, the sparse observations from EC flux towers limit the accurate characterization of spatiotemporal LE patterns over large spatial scales.
Remote sensing, especially from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite, can provide temporally and spatially continuous information for estimating LE, such as: land surface temperature (LST); the fraction of absorbed photosynthetically active radiation (FPAR); the normalized difference vegetation index (NDVI); the enhanced vegetation index (EVI); leaf area index (LAI); and, albedo [5][6][7].Currently, there are numerous satellite-based algorithms available to estimate LE, e.g., surface energy balance (SEB) models [8,9] use remotely sensed products and meteorological data driving a residual SEB equation to estimate LE.Penman-Monteith (PM) and Priestley-Taylor (PT) approaches used PM and PT equations to calculate LE, respectively [10][11][12][13].Statistical and empirical methods calculated LE using empirical equations, which include vegetation parameters and LE observations [6,[14][15][16][17].Data assimilate methods improved LE estimation by assimilating satellite land-surface variables into land surface models [18,19].However, these algorithms show substantial differences in LE estimation [20].Additionally, the existing global LE products, such as MOD16 and the EUMETSAT Satellite Application Facility on Land Surface Analysis (LSA-SAF) product (LSA-SAF MSG), have a 5 km spatial resolution and daily temporal resolution [7], but validation results have revealed that these products often overestimate LE at most AsiaFlux sites [21].
Satellite-based empirical and semi-empirical LE algorithms have been scaled up from site to regional scales by relating observed LE to satellite-based vegetation parameters and other key meteorological variables [14].As the most viable empirical algorithms, machine learning algorithms have the potential to generate robust relationships, and once trained, they are very fast to apply [22].These algorithms can provide accurate LE estimation as long as enough training datasets are representative of all the behaviors in the systems.However, there is still some controversy about which machine learning algorithm is best in LE estimates.Different input variables with different plant functional types (PFTs) over different study regions may lead to different result.For example, Bruton et al. [23] employed precipitation (P), air temperature (Ta), solar radiation (Rs), wind speed (Ws), and relative humidity (RH) as input data for the artificial neural network (ANN) algorithms to estimate LE.The validation results showed that ANN algorithm was able to perform reliable LE estimation with an R 2 of 0.717 and a root mean square error of 1.11 mm.Compared with multiple linear regression and the Priestley-Taylor method, the ANN method estimated the most accurate LE with all of the available variables.El-Shafie et al. [24] used ANN models for prediction evapotranspiration in Iran.The results demonstrated that an ANN model predicted daily E with a significant level of accuracy using only the maximum and minimum temperatures successfully.Shrestha and Shukla [25] used SVM for predicting generic crop coefficient (Kc) and crop evapotranspiration (ETc), using a uniquely large dataset from lysimeters for multiple crop-season combinations under plastic mulch conditions.The results showed that the SVM model was superior to the Artificial Neural Network and Relevance Vector Machine models, two data-driven models used in hydrology.Deo et al. [26] used three machine learning algorithms, including Relevance Vector Machine (RVM), Extreme Learning Machine (ELM), and MARS for the prediction of monthly evaporative loss using limited meteorological data from the Amberley weather station in Australia.All three machine learning models performed well, with RVM proving to be the best.They also found that incident solar radiation and air temperatures are the two most influential factors in determining the performance of machine learning algorithms.
The observation data from EC flux towers are more adequate and accurate over North America, so there already exist some studies of machine learning algorithms for LE estimation over North America.For example, Yang et al. [27] used SVM combined with ground-measured ET from 25 AmeriFlux sites and remotely sensed inputs (land surface temperature, enhanced vegetation index, and land cover) to predict ET over the United States (U.S.) with an average test error of 0.62 mm/day and an R 2 of 0.75.When compared to neural network and multiple regressions, SVM performed better.Adnan et al. [28] developed a model to estimate evapotranspiration with meteorological parameters (temperature, relative humidity, wind speed, and precipitation) using different machine learning techniques.They found that ANN performed better than LLSVM, MARS, and M5Tree models and gave the nearest values as compared with the actual value.Therefore, substantial differences still exist in simulating LE for different machine learning algorithms.Moreover, there is a lack of validating and evaluating different machine learning algorithms for LE estimation at different PFTs over North America.
In this study, we evaluated and applied three machine learning algorithms to estimate terrestrial LE over North America.These machine learning algorithms included artificial neural network (ANN), support vector machines (SVM), and multivariate adaptive regression spline (MARS), which were using air temperature (T), relative humidity (RH), wind speed (Ws), and MODIS-based NDVI as input data.The objectives of this study were as follows: (1) to assess the performance of the three machine learning algorithms for different PFTs based on a series of cross-validations using long-term Ameriflux EC observations from 2000 to 2007, and (2) to generate a daily LE product during 2002-2004 over North America using three machine learning algorithms with well-quantified accuracy based on MODIS data and Modern Era Retrospective Analysis for Research and Applications (MERRA) meteorological data.

Artificial Neural Network
An Artificial neural network (ANN) consists of a set of processing elements, including input, hidden, and output layers.Each layer includes an array of processing elements.There is a connection between each of the neurons in any given layer with each of the neurons in the next layer [4].This connection between neurons can be described as following function: where y i is the output of the neuron i, f i is nonlinear, such as Gaussian function, etc. w ij is the connection weight between neuron i and j. x j is the jth input to the neuron and θ i is the bias of the neuron.The processing steps require aggregating the weighted inputs and the activation function, such as the Sigmoid function and Tanh function.The activation function limits the output values between two asymptotic values, typically 0 and 1 or −1 and +1.In feed-forward artificial neural networks (FFNN), each neuron at input and inner layers receives input data then processes and passes to the next layer.The output result is carried out layer by layer in the forward direction.The weights are adjusted through reverse pass, which calculated error signals propagate backward through the network.The weights of the output neuron layer are adjusted first because the target value of output neuron layer can guide the adjustment of the associated weights.The numbers of input and output parameters determine the numbers of neurons in the input layer and the output layer, respectively [29].The model of FFNN is shown in Figure 1.2.1.2.Support Vector Machine Support vector machine (SVM) were developed by Cortes and Vapnik (1995) to separate the sample data through some nonlinear mapping from the input vectors into the high dimensional feature space, and can find an optimal separating hyper plane between data points of different classes in a high dimension space [27,30,31].When using SVM classification, we seek the optimal classification plane to maximize the distance between the two classes, with the points falling within

Support Vector Machine
Support vector machine (SVM) were developed by Cortes and Vapnik (1995) to separate the sample data through some nonlinear mapping from the input vectors into the high dimensional feature space, and can find an optimal separating hyper plane between data points of different classes in a high dimension space [27,30,31].When using SVM classification, we seek the optimal classification plane to maximize the distance between the two classes, with the points falling within the boundary, called the support vector [30].Mapping technology is implemented through the kernel function.We used the radial basis function (RBF) kernel function type in this study because former studies shows that the RBF kernel performs better than any other kernel functions [32], it can be expressed as: Here, σ is variance.The complexity of SVM does not depend on the dimensions of the feature space, but rather on the number of support vectors (Figure 2).

Support Vector Machine
Support vector machine (SVM) were developed by Cortes and Vapnik (1995) to separate the sample data through some nonlinear mapping from the input vectors into the high dimensional feature space, and can find an optimal separating hyper plane between data points of different classes in a high dimension space [27,30,31].When using SVM classification, we seek the optimal classification plane to maximize the distance between the two classes, with the points falling within the boundary, called the support vector [30].Mapping technology is implemented through the kernel function.We used the radial basis function (RBF) kernel function type in this study because former studies shows that the RBF kernel performs better than any other kernel functions [32], it can be expressed as: Here, σ is variance.The complexity of SVM does not depend on the dimensions of the feature space, but rather on the number of support vectors (Figure 2).The regression estimation with support vector is to estimate a function according to a given data set {(x i , y i )} n .Herein, the input vectors (x i ) refer to T, RH, Ws, and NDVI, whereas the target values (y i ) refer to the ratio of LE values as determined by the ground-measured data and surface net radiation (R n ), and n is the total number of data sets.
The linear regression function uses the following function: Here, f (x) is a nonlinear function, by which x is mapped into a feature space, b is a weight vector, and w denotes a coefficient that has to be estimated from the data.A nonlinear regression function is given by minimizing the sum of the empirical risk and a complexity term, as in the following expression: With a i a * i ≥ 0, (a i and a * i are Lagrange multipliers).i = 1, . . ., N, K(x i , x j ) is the kernel function and b can be calculated as following function: where N SV is the number of support vectors.

Multivariate Adaptive Regression Spline
Multivariate adaptive regression spline (MARS) is a flexible nonparametric modeling tool that was popularized by Friedman (1991) for solving regression-type problems [33].It does not require an assumption about the particular type of relationship between outcome variable and predictor variables [34,35].Instead, MARS use coefficients and basic functions that are decided entirely by regression data to construct this relation.The MARS model can be described as follows: Here, the summation f (x) is over the k non-constant terms in the model.β 0 is the intercept parameter.Each c i is a constant coefficient, B i (x) is a basic function, which takes one of following forms: a constant 1, a hinge function or two or more hinge functions.A hinge function (Figure 3) has the form max(0, x − c) or max(0, c − x), c is a constant, called the knot.
The first step to build a MARS model is partitioning the training data set into several splines in equivalent interval basis [33].Most importantly, the model ensures that there is adequate data in a subgroup in order to avoid over-fitting using the shortest distance between neighboring knots [36,37].The forward pass and the backward pass are two important phases in building a MARS model.To find the best subset, we use generalized cross validation (GCV) to compare the performance of sub-models.Better sub-model has lower GCV values.GCV can be calculated as follows: Here, the n was the number of observations, y i was the response value of object i, ∧ y i was the predicted response value of object i, and C(M) was the penalty factor.
Remote Sens. 2017, 9, 1326 6 of 20 Here, the n was the number of observations, i y was the response value of object i , ∧ i y was the predicted response value of object i , and was the penalty factor.

Eddy Covariance Observations
The machine learning algorithms were validated and evaluated using ground-measured flux data.All of the turbulent flux observations were measured by the EC method and the data cover the period from 2000 to 2007.The data were collected from 85 EC flux tower sites and were provided by

Eddy Covariance Observations
The machine learning algorithms were validated and evaluated using ground-measured flux data.All of the turbulent flux observations were measured by the EC method and the data cover the period from 2000 to 2007.The data were collected from 85 EC flux tower sites and were provided by AmeriFlux.These flux towers are located mainly in North America (50 • -170 • W, 10 • -70 • N, Figure 4).The flux tower sites cover five major global land-surface biomes: cropland (CRO; 7 sites), deciduous broadleaf forest (DBF; 11 sites), evergreen needleleaf forest (ENF; 41 sites), grass and other types (GRA; 17 sites), and shrubland (SHR; 9 sites).These data sets include half-hourly or hourly ground-measured incident Rs, RH, Ta, diurnal air temperature range (DT), Ws, vapor pressure (e), sensible heat flux (H), surface net radiation (R n ), ground heat flux (G), and LE.We acquired daily data from aggregated half-hourly or hourly data without using additional quality control [38][39][40] and removed the zero values.For the unclosed energy problem, we used the following method that was developed by Twine et al. [41] to correct the LE for all flux towers: where LE is the corrected latent heat, H ori and LE ori are the uncorrected sensible heat flux and latent heat, respectively.

MODIS and MERRA Data
We used the NDVI values as the input variables and the daily NDVI values were temporally interpolated from the 16 day MODIS NDVI (MOD13A2) product [42] at 1-km spatial resolution using linear interpolation [43].When 16 day NDVI data are missing or not available, we used the closest reliable 16 day values to replace the missing or original data.
We used the daily Rn, RH, Ta, and WS products with a spatial resolution of 1/2° × 2/3°from MERRA data provided by National Aeronautics and Space Administration (NASA) to evaluate the performance of all the LE algorithms for all the flux tower sites in this study.To match MODIS pixels, we used the method proposed by Zhao et al. [43] which is a spatial interpolation method using a cosine function to interpolate coarse-resolution MERRA data to 1 km 2 pixels.Theoretically, this method uses the four MERRA cells surrounding a given pixel to remove sharp changes from one side of a MERRA boundary to the other to improve the accuracy of MERRA data for each 1 km pixel.

MODIS and MERRA Data
We used the NDVI values as the input variables and the daily NDVI values were temporally interpolated from the 16 day MODIS NDVI (MOD13A2) product [42] at 1-km spatial resolution using linear interpolation [43].When 16 day NDVI data are missing or not available, we used the closest reliable 16 day values to replace the missing or original data.
We used the daily R n , RH, Ta, and Ws products with a spatial resolution of 1/2 • × 2/3 • from MERRA data provided by National Aeronautics and Space Administration (NASA) to evaluate the performance of all the LE algorithms for all the flux tower sites in this study.To match MODIS pixels, we used the method proposed by Zhao et al. [43] which is a spatial interpolation method using a cosine function to interpolate coarse-resolution MERRA data to 1 km 2 pixels.Theoretically, this method uses the four MERRA cells surrounding a given pixel to remove sharp changes from one side of a MERRA boundary to the other to improve the accuracy of MERRA data for each 1 km pixel.
To map the final LE product with 0.05 • spatial resolution using three machine learning algorithms, we used Collection 5 MODIS NDVI (MOD13C1: CMG, 0.05 • ) [44] and the daily 0.05 • MERRA data interpolated using the method proposed by Zhao et al. [43] from the original MERRA data.To evaluate the performance of three machine learning algorithms, we used MODIS LE product (MOD16A2) [7] with 0.05 • spatial resolution.

Criteria of Evaluation
We used three different statistical criteria to evaluate the performance of the machine learning algorithms that were used in this study: coefficient of determination (R 2 ); the average deviation of ground-measured LE value and estimated LE value (Bias); and, root mean square error (RMSE) [45,46].R 2 can be defined as the square of correlation coefficient R. The difference between it and the correlation coefficient is to remove |R| = 0 and 1, as it can prevent an exaggerated interpretation of the correlation coefficient.The closer that R 2 is to 1, the more relevant the observations and estimates.The following equation was used to calculated coefficient of determination: where X i and Y i are the observed and estimated values, respectively, X and Y are the average of X i and Y i , and n is the total number of the data.Bias is the average value of the absolute of differences between the predicted and observed LE values.A low bias indicates a good model performance.The bias can be calculated from the following equation: where X i and Y i are the observed and estimated values, respectively, and n is the total number of the data.RMSE can be defined as the square root of the average value of the differences between the predicted and observed LE values.A perfect match between the two values would yield RMSE = 0, and it implies the best performance model.The larger RMSE is, the lower model performance.RMSE can be calculated from the following function: where X i and Y i are the observed and estimated values, respectively, and n is the total number of the data.

Experimental Setup
For each PFT, we chose half of the sites to train, the remaining sites were used to reverse.As for PFT CRO, three sites were for training and four for reversing.For DBF, five sites were for training and six for reversing.ENF/GRA/SHR has 21/9/5 sites for training and 20/8/4 for reversing, respectively.We put the same PFT data together for the training and validation of three machine learning algorithms.For the training step, we chose Ta, RH, Ws, and NDVI as input variables and the ratio of ground-measured LE and R n as output variables, and built the ANN/SVM/MARS model.Then, we put the input variables of testing data to the model and got the predicted LE/R n .To get the predicted LE, we multiplied the results by the R n of the testing data.Finally, we evaluated the predicted LE with the ground-measured LE of the testing data.Meteorology data including Ta, RH, Ws, and R n were acquired from EC flux tower sites and MERRA data, respectively.
For the ANN algorithm, we trained, validated, and tested the data by using feed-forward artificial neural networks (FFNN), which is the most commonly used ANN model.Levenberg-Marquardt algorithms were applied to train the network and a Sigmoid function was used since it gave the best results of the model.The number of hidden layers and neurons in hidden layers was determined by trial and error, and we finally decided that the number of hidden layers was 10.For the SVM algorithm, we implement the SVM on R platform using R language with package e1071, which provided an interface connects libsvm that was based on the C++ language, developed by Chih Chung-Chang and Chih-Jen Lin [47,48].The process of modeling training and inversion was similar to the ANN process.The present study built the MARS model and estimated testing data LE using ARESlab (Adaptive Regression Splines toolbox) in MATLAB, and set the maxfuncs and maxinteractions to 15 and 4, respectively.

Algorithms Evaluation Based on Specific Site Data
At the flux tower site scale, the three machine learning algorithms exhibited substantial differences in LE modeling for different PFTs.The statistical parameters of training and validation of each model are provided in Table 1. Figure 5 shows that the ANN algorithm performed best among the three algorithms for each PFT, while SVM performed better than MARS for CRO, GRA, and SHR, and made a poor performance for DBF land cover type.For CRO, the validation result showed that the R 2 was very close among the three algorithms, the highest R 2 of ANN was 0.81, while the lowest R 2 was 0.03 lower than that and each of the three algorithms had a large RMSE greater than 20 W/m 2 .Meanwhile, ANN had a lower bias than SVM and MARS.ANN for DBF had the best performance with the highest R 2 (0.89/0.88) (p < 0.01) and lower RMSE (14.48/17.17W/m 2 ) in comparison to other PFTs for training and validation, respectively.This may be caused by the characteristic of the DBF.Although the R 2 of SHR was the lowest among the five PFTs for validation results, the bias and the RMSE of SHR were lower than any other PFTs, with the lowest bias and RMSE reaches of −0.17 and 13.35 W/m 2 .For ENF and GRA sites, the lowest RMSE of the estimated LE versus ground observations was approximately 17.74 and 16.14 W/m 2 , respectively, and the R 2 was approximately less than 0.8.Especially, for GRA sites, the ANN algorithm that was driven by tower-specific meteorological variables and satellite-based NDVI had an absolutely higher R 2 and a lower bias and RMSE than the other machine learning algorithms tested.
Remote Sens. 2017, 9, 1326 10 of 20 observed LE in lower values for all of the PFTs in Figure 6, but for GRA land cover type, the ANN and the SVM kept the same trend, while the results of the MARS were absolutely lower than result of the ANN, the SVM and observed LE in high value area.We also found some abnormal observed LE value in 2004 for SHR, which may lead to the inaccuracy of estimation results.Figure 6 shows that the three algorithms exhibited most features of measured LE seasonality in the observed inversion data for different PFTs.We randomly selected an inversion site that has at least three years of complete ground-measured data for each PFT, and then chose three years of complete ground-measured data and the corresponding estimated LE data using three machine learning algorithms.We plotted the eight-day average LE to replace daily LE data.Five sites were US-Ne3 (CRO), US-WCr (DBF), CA-Qcu (ENF), US-FPe (GRA), and US-SO4 (SHR).We compared the estimated LE with the observed LE in 2002-2004 except at US-SO4, for which we used the period 2004-2006.In comparison to the other two algorithms, the ANN algorithm produced seasonal LE variations that were closest to the observed LE.Estimated LE showed great consistency, with observed LE in lower values for all of the PFTs in Figure 6, but for GRA land cover type, the ANN and the SVM kept the same trend, while the results of the MARS were absolutely lower than result of the ANN, the SVM and observed LE in high value area.We also found some abnormal observed LE value in 2004 for SHR, which may lead to the inaccuracy of estimation results.

Algorithms Evaluation Based on MERRA Data
Table 2 shows the statistical parameters of training and validation of each model that is driven by MERRA meteorology.When compared to the results that are driven by tower-specific meteorology, results driven by MERRA meteorology showed poor performance in all PFTs.This may

Algorithms Evaluation Based on MERRA Data
Table 2 shows the statistical parameters of training and validation of each model that is driven by MERRA meteorology.When compared to the results that are driven by tower-specific meteorology, results driven by MERRA meteorology showed poor performance in all PFTs.This may be related to the uncertainty of MERRA data.
Figure 7 shows the training and validation statistics R 2 , bias, and RMSE of the three algorithms that are driven by MERRA meteorology for five PFTs.The value of R 2 was lower and RMSE was higher when compared to the results that are driven by tower-specific meteorology.With similar results being shown in Figure 5, the ANN demonstrated the best performance among the three machine learning algorithms at both the training and validation stages.The biggest difference, however, was that the ANN performed absolutely higher R 2 than any of the other algorithms in Figure 7, while the R 2 is very close among the three algorithms in Figure 5 for CRO and DBF.Since the only difference in input was meteorology data, the most likely reason that the ANN performed better than than SVM and MARS when using inaccurate input variables (besides MERRA meteorology data being more inaccurate than the tower-specific meteorology data) has to do with the ANN's own characteristics.As for bias, ANN performed much better than the others, while SVM showed the poorest performance for most PFTs.As for RMSE, results of SVM may be a little better than results of MARS for some PFTs, but on the whole, their performances were similar.Figure 7 shows the training and validation statistics R 2 , bias, and RMSE of the three algorithms that are driven by MERRA meteorology for five PFTs.The value of R 2 was lower and RMSE was higher when compared to the results that are driven by tower-specific meteorology.With similar results being shown in Figure 5, the ANN demonstrated the best performance among the three machine learning algorithms at both the training and validation stages.The biggest difference, however, was that the ANN performed absolutely higher R 2 than any of the other algorithms in Figure 7, while the R 2 is very close among the three algorithms in Figure 5 for CRO and DBF.Since the only difference in input was meteorology data, the most likely reason that the ANN performed better than than SVM and MARS when using inaccurate input variables (besides MERRA meteorology data being more inaccurate than the tower-specific meteorology data) has to do with the ANN's own characteristics.As for bias, ANN performed much better than the others, while SVM showed the poorest performance for most PFTs.As for RMSE, results of SVM may be a little better than results of MARS for some PFTs, but on the whole, their performances were similar.

Mapping of Terrestrial LE Using Three Machine Learning Algorithms
We applied the ANN algorithm, the SVM algorithm, and the MARS algorithm with MERRA meteorological data and MODIS product to estimate annual LE at a 0.05 • spatial resolution from 2002-2004 over North America.Figure 8 shows maps of annual terrestrial LE averaged for 2002-2004 over North America.

Mapping of Terrestrial LE Using Three Machine Learning Algorithms
We applied the ANN algorithm, the SVM algorithm, and the MARS algorithm with MERRA meteorological data and MODIS product to estimate annual LE at a 0.05°spatial resolution from 2002-2004 over North America.Figure 8 shows maps of annual terrestrial LE averaged for 2002-2004 over North America.All of the algorithms yielded high annual LE over the east and west coasts of the United States.In the low latitudes, the highest annual LE was approximately larger than 90 W/m 2 , appearing on the area with latitudes lower than 20° N. The biggest difference between the three algorithms was the ANN estimated higher LE values on the east coast of the United States and lower LE values in areas of the western United States when compared to the SVM and the MARS, which returned similar results.All of the algorithms yielded high annual LE over the east and west coasts of the United States.In the low latitudes, the highest annual LE was approximately larger than 90 W/m 2 , appearing on the area with latitudes lower than 20 • N. The biggest difference between the three algorithms was the ANN estimated higher LE values on the east coast of the United States and lower LE values in areas of the western United States when compared to the SVM and the MARS, which returned similar results.

Performance of the Machine Learning Algorithms
Our study, based on the estimated daily LE at 85 EC flux tower sites using different machine learning algorithms that were driven by tower-specific and MERRA meteorology, illustrated that the ANN algorithm yielded the best LE estimate.The ANN had the highest R 2 (0.81 and 0.70) (p < 0.01) and lowest RMSE (17.33 and 20.22 W/m 2 ) in comparison to the SVM and MARS algorithms for tower-specific and MERRA meteorological data, respectively (Figures 9 and 10).The results were consistent with the findings of Lu and Zhuang, 2010, which used remote sensing data from the MODIS, meteorological and eddy flux data, and an ANN technique to develop a daily product for the period of 2004-2005 for the conterminous U.S. [49].They found that the ANN predicted daily LE well (R 2 = 0.52-0.86).Validation for 85 EC flux tower sites also indicated that the three machine learning algorithms used were reliable and robust for major land cover types in North America.Figures 9 and 10 both show that the ANN algorithm had no significant LE bias and yielded the closest LE to tower flux data relative to the SVM and MARS algorithms.We also found large inter-biome differences, with better performance at DBF and CRO sites.Several studies have revealed that some algorithms that exhibit strong seasonality for vegetation indices, such as NDVI for accurate capture of information on seasonal changes in vegetation can yield considerably better LE estimates for seasonal vegetation types such as DBF [10,[50][51][52][53][54].Thus, NDVI as an input variable determined the accuracy of LE quantification.All of the machine learning algorithms yielded poor LE estimation for GRA and SHR sites, which may be attributable in part to the fact that seasonal GRA and SHR variation was less evident when satellite-derived vegetation indices (e.g., NDVI) saturate asymptotically.Performance differences among machine learning algorithms may be caused by the different scope for each algorithm, as well as the quality and quantity of input variables.The SVM and MARS highlight global rather than local optima, while the ANN may ensure local optimization and leads to a better performance when compared to SVM and MARS [24,27,30,55].When using MERRA meteorology data as input variables, SVM and MARS showed even worse results, while ANN was slightly worse as compared with results using tower-specific meteorology data as input variables.On the other hand, some studies have indicated that ANN can be considered more suitable to serve as a tool to estimate LE when input meteorological variables are insufficient [29].Thus, we may conclude that ANN performs better than SVM and MARS, especially when there are insufficient input variables or the accuracy of them was not very high.Validation for 85 EC flux tower sites also indicated that the three machine learning algorithms used were reliable and robust for major land cover types in North America.Figures 9 and 10 both show that the ANN algorithm had no significant LE bias and yielded the closest LE to tower flux data Several previous studies have shown that spatial scale mismatch among different data sources, model input errors, and the limitations of the machine learning algorithms itself all affect the accuracy of LE estimation [7,10,56].We used MERRA products with a spatial resolution of 1/2 • × 1/3 • and MODIS products with a resolution of 1 km, their resolution being greater than the footprint for field measurements, which is usually several hundred meters [57,58].This is one of the reasons that estimated LE using tower meteorology as input data performs better than the estimated LE using MERRA meteorology as input data.Such representation of the field measurement footprint may also lead to bias in the machine learning algorithms.In addition, the accuracy of model input variables can also influence the accuracy of the LE estimates.Many studies have demonstrated that there are large errors in the MERRA meteorology and MERRA data tend to underestimate R n at high values when compared with ground measurements [49,58].Recent studies have also revealed errors in MODIS NDVI when compared with ground measurements [49,57].

Comparison between Different LE Products
When compared with the SVM and the MARS algorithms, the ANN algorithm yielded lower annual terrestrial LE in central and southern North America and higher LE in the northern and eastern portions of this study area (Figure 11a,b).The LE values that were estimated using MARS algorithm were lower than the LE values estimated using the SVM algorithm in most areas, but overall, the difference between them was small and less than 10 W/m 2 (Figure 11c).The relevant differences between the flux estimations at the Mexican highlands, which are surrounded by Sierra Madre, may be mostly caused by the land surface heterogeneity and the different plant functional types.The PFT of the highlands is mainly grassland, while coastal and southeastern parts are covered with rainforest.The spatial differences among three machine learning algorithms may be mostly caused by different PFTs.The ANN showed higher LE for ENF and DBF, which are mostly located at the coasts of North America.In the center and west of the study area, due to the relatively high elevation and its own geographic and geomorphic conditions, the PFTs mostly contain SHR or GRA, so the ANN yielded lower LE than SVM and MARS.However, the results that are presented by SVM and MARS show no obvious difference for different PFTs.
Figure 12 shows spatial differences in annual terrestrial LE (2002LE ( -2004) ) between the MODIS LE product and the LE product using three machine learning algorithms.Relative to MODIS LE product, the ANN, SVM, and MARS results yielded lower LE in the north and central study area, and higher LE on the east coast and in the northeast of North America.This may be caused by the precision and quantity of the test data sample, as well as the characteristics of the machine learning algorithms, so the simulation value of machine learning LE is lower than the value of the MODIS LE product over a high LE area.This finding is consistent with other studies that showed that the MODIS LE product overestimated LE at high LE areas [23].The difference between them was less than 20 W/m 2 in most areas.Given the accuracy of MERRA meteorology data and the MODIS LE product, we can conclude that machine learning algorithms are applicable for terrestrial LE mapping and the inversion results have a relatively small gap when compared to the MODIS LE product.
Remote Sens. 2017, 9, 1326 16 of 20 a high LE area.This finding is consistent with other studies that showed that the MODIS LE product overestimated LE at high LE areas [23].The difference between them was less than 20 W/m 2 in most areas.Given the accuracy of MERRA meteorology data and the MODIS LE product, we can conclude that machine learning algorithms are applicable for terrestrial LE mapping and the inversion results have a relatively small gap when compared to the MODIS LE product.(2002)(2003)(2004) between MODIS LE product and LE product using three machine learning algorithms.

Limitations and Recommendations for Future Research
Although the ANN leads to better performance when compared to other machine learning algorithms, it has several limitations.First, ANN requires a relatively long processing time to train a model, especially for ENF and GRA, the input data of which are inaccurate, and thus require more  (2002)(2003)(2004) between MODIS LE product and LE product using three machine learning algorithms.

Limitations and Recommendations for Future Research
Although the ANN leads to better performance when compared to other machine learning algorithms, it has several limitations.First, ANN requires a relatively long processing time to train a model, especially for ENF and GRA, the input data of which are inaccurate, and thus require more learning cycles.Second, the ANN model may trap in local optimization, leading to worse generalization performance.Further, all machine learning algorithms behave relatively unpredictably when used with input ground-measured LE deviating from those presented during the training stage [24,59].Most importantly, we evaluated and validated LE performance using the same ground-measure LE data from EC sites.The machine learning algorithms do not possess the useful information to directly deliver additional confidence LE maps, and we cannot ensure the accuracy of these ground-measure LE data [60].
To make the training samples more applicable over North America, we should add samples from other PFTs, such as evergreen broadleaf forest and deciduous needleleaf forest [59].The EC data of these specific PFTs is few over North America, however.We may develop the machine learning algorithms coupled with semi-empirical and physical methods that do not require training samples in the future to improve the terrestrial LE at more different PFTs.

Conclusions
The primary objective of this paper was to demonstrate the usage of three machine learning algorithms to estimate terrestrial LE for different PFTs over North America.We used tower-specific and MERRA meteorology data and MODIS NDVI product as input data and the ratio of observed LE and meteorology parameter R n as output data to build training models.All three machine learning algorithms proved to be reliable and robust for major land cover types in North America but the ANN algorithm performed better than the SVM and MARS algorithms based on specific site and MERRA data.
The results revealed that the ANN algorithm was more efficient in accounting for the nonlinear relationship between climatic variables, vegetation indices, and the corresponding LE.We found that ANN performed better in most land cover types, for except ENF and GRA, when using MERRA meteorology data as input data in the study area.The validation results for machine learning algorithms driven by tower-specific meteorology and MERRA meteorology showed that the results of SVM were similar to the results of MARS.For DBF sites, the MARS has a higher R 2 and a lower RMSE than SVM, for other sites, SVM perform a little better than MARS.
We concluded that it is reasonable to use machine learning algorithms to estimate LE by building relationships between input variables and outputs (LE).The accuracy of input variables, such as meteorology data and MODIS NDVI, can also influence the accuracy of the LE estimates.It is also applicable to map the terrestrial LE in different regions for different PFTs using any of these three machine learning algorithms; although, the performance of ANN was the best, the differences among them were small.The inversion results have a relatively small gap compared with MODIS LE product, with the difference between them being less than 20 W/m 2 in most areas, and machine learning algorithms performing better at the area where MODIS LE product overestimated LE.

Figure 1 .
Figure 1.Architecture of the neural network model used in this study.

Figure 1 .
Figure 1.Architecture of the neural network model used in this study.

Figure 1 .
Figure 1.Architecture of the neural network model used in this study.

Figure 2 .
Figure 2. One-dimensional linear regression with ε -insensitive band for the support vector machine (SVM) algorithm.

Figure 2 .
Figure 2. One-dimensional linear regression with ε-insensitive band for the support vector machine (SVM) algorithm.

Figure 3 .
Figure 3.The hinge functions and knot location in the multivariate adaptive regression spline (MARS) model.

Figure 3 .
Figure 3.The hinge functions and knot location in the multivariate adaptive regression spline (MARS) model.

Figure 4 .
Figure 4. Location of the 85 eddy covariance flux towers used in this study.INV means the data of this site were used to inverse, TRA means that the data of this site were used for training.

Figure 4 .
Figure 4. Location of the 85 eddy covariance flux towers used in this study.INV means the data of this site were used to inverse, TRA means that the data of this site were used for training.

Figure 5 .
Figure 5. Bar graphs of the training and validation statistics (R 2 , Bias and root mean square error (RMSE)) of three algorithms driven by tower-specific meteorology for five PFTs at the 85 flux tower site.All R 2 values are significant with a 99% confidence.

Figure 5 .
Figure 5. Bar graphs of the training and validation statistics (R 2 , Bias and root mean square error (RMSE)) of three algorithms driven by tower-specific meteorology for five PFTs at the 85 flux tower site.All R 2 values are significant with a 99% confidence.

Figure 5 . 20 Figure 6 .
Figure 5. Bar graphs of the training and validation statistics (R 2 , Bias and root mean square error (RMSE)) of three algorithms driven by tower-specific meteorology for five PFTs at the 85 flux tower site.All R 2 values are significant with a 99% confidence.

Figure 6 .
Figure 6.Examples of the eight-day terrestrial latent heat flux (LE) average as measured and estimated using different machine learning algorithms for the different PFTs.

Figure 7 .
Figure 7. Bar graphs of the training and validation statistics (R 2 , Bias and RMSE) of three algorithms driven by MERRA meteorology for five PFTs at the 85 flux tower sites.All of the R 2 values are significant with a 99% confidence.

Figure 7 .
Figure 7. Bar graphs of the training and validation statistics (R 2 , Bias and RMSE) of three algorithms driven by MERRA meteorology for five PFTs at the 85 flux tower sites.All of the R 2 values are significant with a 99% confidence.

Figure 8 .
Figure 8.The map of mean annual terrestrial LE from 2002 to 2004 at a spatial resolution of 0.05° using three machine learning algorithms driven by MERRA meteorology over North America.

Figure 8 .
Figure 8.The map of mean annual terrestrial LE from 2002 to 2004 at a spatial resolution of 0.05 • using three machine learning algorithms driven by MERRA meteorology over North America.

Figure 9 .
Figure 9.Comparison of daily LE observations for all 85 flux tower sites and LE estimates using different machine learning algorithms driven by tower-specific meteorology.

Figure 9 .
Figure 9.Comparison of daily LE observations for all 85 flux tower sites and LE estimates using different machine learning algorithms driven by tower-specific meteorology.

Figure 9 .
Figure 9.Comparison of daily LE observations for all 85 flux tower sites and LE estimates using different machine learning algorithms driven by tower-specific meteorology.

Figure 10 .
Figure 10.Comparison of daily LE observations for all 85 flux tower sites and LE estimates using different machine learning algorithms driven by MERRA meteorology.

Figure 10 .
Figure 10.Comparison of daily LE observations for all 85 flux tower sites and LE estimates using different machine learning algorithms driven by MERRA meteorology.

Figure 11 .
Figure 11.Spatial differences in the average annual terrestrial LE (2002-2004) between three machine learning algorithms. .

Figure 12 .
Figure 12.Spatial differences in the average annual terrestrial LE (2002-2004) between MODIS LE product and LE product using three machine learning algorithms.

Figure 12 .
Figure 12.Spatial differences in the average annual terrestrial LE (2002-2004) between MODIS LE product and LE product using three machine learning algorithms.

Table 1 .
Training and validation statistics for the different algorithms based on specific site.All of the coefficient of determination (R 2 ) values are significant with a 99% confidence.

Table 2 .
Training and validation statistics for the different algorithms based on MERRA data.All R 2 values are significant with a 99% confidence.