Comprehensive Evaluation of Machine Learning Techniques for Estimating the Responses of Carbon Fluxes to Climatic Forces in Different Terrestrial Ecosystems

Accurately estimating the carbon budgets in terrestrial ecosystems ranging from flux towers to regional or global scales is particularly crucial for diagnosing past and future climate change. This research investigated the feasibility of two comparatively advanced machine learning approaches, namely adaptive neuro-fuzzy inference system (ANFIS) and extreme learning machine (ELM), for reproducing terrestrial carbon fluxes in five different types of ecosystems. Traditional artificial neural network (ANN) and support vector machine (SVM) models were also utilized as reliable benchmarks to measure the generalization ability of these models according to the following statistical metrics: coefficient of determination (R2), index of agreement (IA), root mean square error (RMSE), and mean absolute error (MAE). In addition, we attempted to explore the responses of all methods to their corresponding intrinsic parameters in terms of the generalization performance. It was found that both the newly proposed ELM and ANFIS models achieved highly satisfactory estimates and were comparable to the ANN and SVM models. The modeling ability of each approach depended upon their respective internal parameters. For example, the SVM model with the radial basis kernel function produced the most accurate estimates and performed substantially better than the SVM models with the polynomial and sigmoid functions. Furthermore, a remarkable difference was found in the estimated accuracy among different carbon fluxes. Specifically, in the forest ecosystem (CA-Obs site), the optimal ANN model obtained slightly higher performance for gross primary productivity, with R2 = 0.9622, IA = 0.9836, RMSE = 0.6548 g C m−2 day−1, and MAE = 0.4220 g C m−2 day−1, compared with, respectively, 0.9554, 0.9845, 0.4280 g C m−2 day−1, and 0.2944 g C m−2 day−1 for ecosystem respiration and 0.8292, 0.9306, 0.6165 g C m−2 day−1, and 0.4407 g C m−2 day−1 for net ecosystem exchange. According to the findings in this study, we concluded that the proposed ELM and ANFIS models can be effectively employed for estimating terrestrial carbon fluxes.


Introduction
Terrestrial ecosystems play a vital role in sequestering the carbon dioxide in the atmosphere [1,2].In general, the amount of carbon sequestration in different terrestrial ecosystems varies on seasonal, annual, and inter-annual time scales [3].To account for this, the responses of terrestrial carbon exchanges to global environmental change involving climate extremes such as windstorms, heat waves, frosts and droughts, and a variety of disturbances, mainly including land use changes, fires, nitrogen deposition, and CO 2 elevation, have recently attracted much attention [4][5][6].These drivers may alter the composition, structure, and function of terrestrial ecosystems [7] and subsequently influence the magnitude and behavior of carbon cycles and their feedbacks to global change [8,9].The inability of sufficiently characterizing the ongoing evolution of carbon fluxes using different techniques has been a primary impediment in our understanding of the underlying mechanisms in terrestrial carbon exchange processes [10][11][12].For this reason, it is crucial to accurately estimate the carbon budgets in terrestrial ecosystems ranging from flux towers to regional or global scales for diagnosing past and future climate change.
To date, most prior studies that examine the roles of the aforementioned drivers and assess their induced uncertainties in the terrestrial carbon flux estimation have relied primarily upon the process-based models through describing complex biological and physical processes [13,14].In general, the precision and reliability of these models are severely hindered due to the underlying nonlinear and nonstationary characteristics of terrestrial ecosystems.In contrast to process-based models, Schindler and Hilborn [15] suggested that mechanism-free models constitute the best alternative technology; such models depend upon constantly changing statistical characteristics of pooled data for short-term projections.Ye et al. [16] demonstrated the superiority of empirical dynamic modeling over static equilibrium-based models with deterministic rules and parameters, due to the fact that the former is viewed as an equation-free mechanistic forecasting technology, with the capability of adjusting both non-equilibrium dynamics and nonlinearities in natural ecosystems.The potential and versatility of machine learning techniques has been widely acknowledged with respect to solving the complex nonlinear problems [17,18] in the field of eco-informatics in the context of data-intensive ecological sciences [19].Motivated by this and the recent availability of large amounts of accumulated data measured by the eddy covariance technique from flux tower networks across different vegetation types, the present research was undertaken to imitate the nonlinear relationships between carbon fluxes and environmental variables using machine learning approaches.
Over the past decades, machine learning techniques have gained great attention for dealing with different nonlinear issues in the ecological, climatological, and environmental fields, such as above-ground biomass estimation [20], methane emission prediction [21], meteorological anomaly detection [22], and biophysical parameter retrieval from remote sensing data [23].In fact, the carbon exchanges of terrestrial ecosystems are complex nonlinear processes, which are predominantly controlled by environmental variables (e.g., solar radiation, air temperature, wind speed, and relative humidity).For this reason, the use of machine learning techniques was taken into consideration to characterize the carbon exchange processes between land and the atmosphere.In recent years, several lines of evidence have illustrated that machine learning techniques are able to perform better than some physically based models in terms of estimating the terrestrial carbon fluxes at different spatial and temporal scales [14,24,25].Among machine learning techniques, two prominent approaches are the artificial neural network (ANN) and the support vector machine (SVM).They have been broadly utilized by previous studies for estimating terrestrial carbon fluxes [26][27][28].
Despite the increased use of machine learning techniques for simulating carbon fluxes, it is particularly important to note that the operability, robustness, and predictability of these techniques, especially for the ubiquitous ANN and SVM, may be limited by their respective disadvantages.Specifically, the ANN method still suffers from several drawbacks [29], such as over-fitting, local minima, and many inner parameters that need to be determined (e.g., the number of nodes in the hidden layer and the selection of training function), while the major drawbacks of SVM include a long computing time during the training process as well as a high memory requirement for solving problems of large-scale quadratic programming [30].To address the shortcomings of these supervised learning-based approaches, a number of novel methods have been proposed in recent years.Among these methods, two advanced modeling techniques, namely the extreme learning machine (ELM) and the adaptive neuro-fuzzy inference system (ANFIS), are considered here due to their innovative design features, extremely fast convergence rate, and high performance.Some recent applications of these two relatively new methods in a wide variety of fields include solar radiation prediction [31,32], wind power prediction [33][34][35], stream-flow forecasting [36,37], and reference evapotranspiration prediction [38,39].To the best of our knowledge, no research has yet been conducted that applies both the ELM and AFNIS methods to predict terrestrial carbon fluxes.
Many existing practical applications of the aforementioned methods in other different fields have suggested that the modeling and predictive capability of these approaches strongly depends on their respective internal parameters, despite the fact that they may provide some satisfactory results for a given problem.It seems quite evident that comparing and evaluating the ability of different parameters or functions for each approach is an outstanding prerequisite for removing their influences.For example, Tezel and Buyukyildiz [40] evaluated the effects of four different training algorithms used in the ANN method for predicting monthly pan evaporation that make use of several climatic variables-(1) resilient backpropagation, (2) scaled conjugate gradient (SCG), (3) Levenberg-Marquardt and gradient descent with momentum, and (4) adaptive learning rule backpropagation-and found that obvious differences existed among these training algorithms and that the SCG algorithm provided the best estimates.Moosavi et al. [41] compared the ability of an ANN with six different training algorithms and the ANFIS with four different membership functions to forecast monthly groundwater level and reported that the Bayesian regularization algorithm and the bell-shaped function generated the best results for the ANN and ANFIS methods, respectively.In a word, there is still great interest in identifying optimal inner algorithm for ensuring a method's accuracy and stability for a given application.In particular, the predictive ability of these methods responding to their respective intrinsic parameters has not yet been appraised.
Taking the above-mentioned considerations into account, the objectives of the present research are fourfold: (1) to explore the adaptability and potential of the ELM and ANFIS methods in estimating daily carbon fluxes across five different types of terrestrial ecosystems; (2) to compare the predictive accuracy of our newly developed approaches with conventional ANN and SVM methods; (3) to evaluate the effects of internal parameters of each method on their respective generalization ability; and (4) to probe into the modeling difference among three primary components of carbon exchanges, i.e., gross primary production (GPP), ecosystem respiration (R), and net ecosystem exchange (NEE).It should be pointed out that further investigation on the ability of the models used in this study in quantifying the carbon fluxes at a sub-daily time scale can lead to deeper insights and ideas about the versatility and potential of these models.However, this topic is beyond the scope of this article and will be undertaken in our follow-up work.

Study Sites
This study selected five flux tower sites across different vegetation types from the FLUXNET database (http://fluxnet.ornl.gov/).Details of these sites are given in Table 1.Specifically, AT-Neu site was established at a meadow in the vicinity of the village Neustift in the Stubai Valley, Austria.The site is mainly dominated by graminoid and forb species and represents a typical temperate mountain grassland ecosystem.The wetland site (SE-Deg) is located in a mixed acid mire system in the Kulbäcksliden Experimental Forest in Västerbotten, Sweden.The climate is defined as cold temperate humid.The observed part of the mire is a poor fen dominated by lawn and carpet plant communities.The cropland site (US-Bo1) is situated near Champaign, Illinois, in the Midwestern part of the United States.The site was established in 1996 based on the periodically cultivating no-till agricultural ecosystems with alternating years of maize and soybean crops.The two mature forest sites, CA-Oas and CA-Obs, are located near the southern boundary of the sub-humid boreal plain ecological region of Western Canada.The two sites include the dominant boreal tree species and are the most representative forest ecosystems.The two sites are exposed to similar meteorological conditions.Notwithstanding, there are appreciable differences between these two sites in their tree stem density, topography, and soil texture.The mature trembling aspen site (CA-Oas), situated in Prince Albert National Park, Saskatchewan, Canada, was regenerated from a fire in 1919.This boreal deciduous broadleaf forest ecosystem had a mean stand age of 75 years in 2004 and a mean canopy height of 22 m [42].The CA-Obs site, located about 80 km east-northeast of CA-Oas, is predominately covered by black spruce with a mean canopy height of 7.2 m.

Data Preparation
At all sites, continuous measurements of half-hourly carbon fluxes have been undertaken using the eddy covariance technique [48].Other observed variables mainly comprise energy fluxes (latent heat and sensible heat) and meteorological variables, including air temperature (T a ), soil temperature (T s ), net radiation (R n ), relative humidity (R h ), photosynthetically active radiation, volumetric soil moisture content, and wind speed.In general, the NEE data at the AT-Neu, SE-Deg, and US-Bo1 sites were generally gap-filled on the basis of the ANN technique and the marginal distribution sampling (MDS) approach.In this study, we selected the former because of its slight superiority over MDS reported by Moffat et al. [49].GPP and R data were calculated from the partitioning of NEE.Positive values of NEE signify a carbon source, while negative values signify a carbon uptake by the ecosystem.For the CA-Oas and CA-Obs sites, the detailed data processing procedures, including quality control, flux correction, and NEE partitioning, can be found in Griffis et al. [46].
Statistical parameters of the used data during the 6-year period at the sites are shown in Table 2.It can be seen that the annual mean T a changes from 1.00 • C at CA-Obs to 11.52 • C at US-Bo1; the annual mean T s is between 3.19 • C at CA-Obs and 12.30 • C at US-Bo1; the annual mean R n ranges from 2.43 mol m −2 day −1 at SE-Deg to 6.71 mol m −2 day −1 at CA-Obs; the annual mean R h varies from 69.88% at CA-Oas to 81.04% at SE-Deg.Other statistical features of each variable are also given in the table.Moreover, Table 2 also shows the correlation coefficients between environmental variables and each carbon flux.At the five sites, the correlation values for both GPP and R are generally greater than those of values for NEE, which may lead to difficulty in accurately estimating NEE.Among the five sites, on the whole, the lowest correlation values for the carbon fluxes occur at US-Bo1.

Adaptive Neuro-Fuzzy Inference System
The ANFIS method has been widely acknowledged as a promising alternative tool to traditional ANN and SVM techniques in recent years, especially for predicting hydrological time series [50][51][52].The popularity of the ANFIS modeling technique predominantly stems from its powerful generalization ability and robustness in terms of elucidating nonlinear processes by combining the advantages of both an adaptive ANN and a fuzzy inference system (FIS).In the present study, the most commonly used Sugeno FIS [53] was adopted to calculate the final network outputs from our developed ANFIS models.When a specific ANFIS model is trained, determining the related parameters is of particular importance for ensuring the model's generalization ability.The hybrid-learning algorithm is devoted to optimizing the premise parameters according to the gradient descent procedure and the consequent parameters according to the least squares algorithm.In addition, it is worth noting that the applicability and generalization performance of ANFIS strongly relies on the methods of generating the FIS.Therefore, the present study aims to examine not only the effects of different membership functions under the grid partitioning method on the ability of developed ANFIS models, but also the impacts of different FIS identification algorithms, including fuzzy c-means clustering, grid partitioning, and subtractive clustering algorithms.For these purposes, the predictive ability of the ANFIS method was explored with respect to eight different membership functions under the grid partitioning method, including Gaussian curve (gaussmf), Gaussian combination (gauss2mf), π-shaped (pimf), difference between two sigmoidal (dsigmf), trapezoidal-shaped (trapmf), product of two sigmoidal (psigmf), triangular-shaped (trimf), and generalized bell-shaped (gbellmf) algorithms.For the sake of convenience, the ANFIS models with the membership functions gaussmf, gauss2mf, pimf, dsigmf, trapmf, psigmf, trimf, and gbellmf are hereinafter referred to as ANFIS1, ANFIS2, ANFIS3, ANFIS4, ANFIS5, ANFIS6, ANFIS7, and ANFIS8, respectively.Moreover, the ANFIS models with two other FIS identification methods, subtractive clustering and fuzzy c-means clustering, are abbreviated as ANFIS9 and ANFIS10, respectively.

Extreme Learning Machine
ELM as a comparatively new machine learning approach was proposed by Huang et al. [54].In the last few years, this state of the art approach has been extensively applied in solving different nonlinear problems in other research fields [55,56].These applications have demonstrated that the ELM method with a single hidden-layer feedforward neural network architecture has many remarkable strengths against the conventional machine learning techniques.A noteworthy strength of the ELM method is that the number of hidden nodes as well as their related parameters (weights and biases), which can be randomly generated, are commonly independent of the training dataset and thus do not need to be tuned.In contrast, for a traditional ANN model, these parameters and other additional ones, such as the number of hidden layers, the learning epochs, and the learning rate, should be delicately set or updated by a commonly used, time-consuming trial and error algorithm.Moreover, another noteworthy point about the ELM method is that its learning speed is found to be thousands of times faster than that of conventional ANN and SVM approaches.
The selection of an appropriate activation function is essential for guaranteeing the generalization performance of the ELM method.The sigmoid, sine, and hard limit algorithms are the three most widely used activation functions for the hidden layer.The detailed impacts of activation functions based on these three algorithms were evaluated in the present study.For convenience, the ELM models with the sigmoid, sine, and hard limit algorithms, were abbreviated as ELM1, ELM2, and ELM3, respectively.In addition, the linear algorithm was consistently used as the activation function for the output layer for the developed ELM models.

Artificial Neural Network
The ANN method is a classical supervised-based machine leaning technique, and much attention has been paid to the modeling and prediction in nonlinear time series in a broad range of fields over the past few decades [57][58][59][60].Several lines of evidence have demonstrated that a feed-forward ANN with a single hidden layer can universally approximate any target continuous function over a compact set [61,62].Hence, this type of network architecture was adopted by the current study.The backpropagation algorithm was used to adjust the weights of trained networks.The neurons in the hidden layer were operated through a hyperbolic tangent sigmoid transfer (tansig) function, while the neurons in the output layer were performed through a linear (purelin) function.More theoretical information and practical applications as to the ANN method can be found in Maier and Dandy [29] and Abrahart et al. [63].
The present study also attempts to examine the effects of different training functions on the ability of the proposed ANN model.MATLAB-based training functions are generally grouped into three categories: quasi-Newton, gradient descent, and conjugate gradient.The quasi-Newton methods include Broyden-Fletcher-Goldfarb-Shanno quasi-Newton backpropagation (trainbfg) and Levenberg-Marquardt backpropagation (trainlm) algorithms.The gradient descent approaches are based on gradient descent with momentum and adaptive learning rate backpropagation (traingdx), and resilient backpropagation (trainrp) algorithms.The conjugate gradient methods consist of the conjugate gradient backpropagation algorithm with the Fletcher-Reeves update (traincgf), the conjugate gradient backpropagation algorithm with the Polak-Ribiere update (traincgp), the conjugate gradient backpropagation algorithm with the Powell-Beale restart (traincgb), and the scaled conjugate gradient backpropagation (trainscg) algorithm.In addition, another two training algorithms, namely, Bayesian regulation backpropagation (trainbr) and one step secant backpropagation (trainoss), were also considered in this study.As a matter of convenience, the ANN models with these extensively used training algorithms, trainlm, trainbr, trainbfg, traincgb, traincgf, traincgp, traingdx, trainoss, trainrp, and trainscg, are abbreviated as ANN1, ANN2, ANN3, ANN4, ANN5, ANN6, ANN7, ANN8, ANN9, and ANN10, respectively.

Support Vector Machine
The SVM method was proposed by Cortes and Vapnik [64], and details of its fundamental principle can be found in Vapnik [65] and Smola and Schölkopf [66].It has become a widely accepted fact that the proper selection of kernel function is a relatively important aspect in terms of the use of SVM approach [67], when addressing a variety of problems, such as classification and regression [30,68].In view of the practical application for the simulation and prediction of carbon fluxes, the present study firstly emphasized the importance of different kernel functions for the SVM method.Three widely used kernel algorithms, i.e., the radial basis function (RBF), the polynomial function, and the sigmoid function, were applied and evaluated together.For convenience, the SVM models with these kernel functions are hereinafter referred to as SVM1, SVM2, and SVM3, respectively.
Another noteworthy point is the optimization of relevant parameters during the training procedure.These parameters primarily included the cost of training error, the width of the insensitive error band, and the value of the parameter of the respective kernel algorithm, which were carefully considered in this work.In order to ensure the performance and robustness of the SVM models, both the grid search and threefold cross-validation approaches were employed during the process of seeking the best parameters [69].

Model Development and Software Availability
Four different machine learning models (ANN, ELM, ANFIS, and SVM) were developed and compared for the purpose of simulating the daily carbon fluxes in five different ecosystems.In order to achieve a reasonable comparison, not only among different carbon fluxes but also among different ecosystems, the same environmental variables (T a , T s , R n , and R h ), which have relatively strong correlation with each carbon flux, were used as inputs to develop the proposed models.At the five sites, daily available data during the period of six years are divided into three parts.The first four years dataset was utilized for training, the fifth-year dataset was used for validation with the intention of preventing over-training or over-fitting of the training dataset, and the final year dataset was employed for testing.Before the training of each applied model, the inputs and outputs were normalized to a range between 0 and 1.
In developing aforementioned ANN and ANFIS models, MATLAB software (version 8.2, The Mathworks, Inc., Natick, MA, USA) and its toolboxes, including Neural Network Toolbox 8.1 and Fuzzy Logic Toolbox 2.2.18, were utilized.In addition, the LIBSVM package written by Chang and Lin [70] was used to develop the SVM model, which is one of the most popular packages.It can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvm/.The ELM model was developed based on the software package written by Huang et al. [54], which can be downloaded from http://www.ntu.edu.sg/home/egbhuang/.

Model Performance Evaluation
In this study, the performance of all of the developed models in the three periods was evaluated according to the following statistical indices: the coefficient of determination (R 2 ), index of agreement (IA), root mean square error (RMSE), and mean absolute error (MAE).These performances indicators can be calculated by the following equations: where Y o and Y m denote the observed and modeled values of daily carbon fluxes, respectively; Y o and Y m are the means of observed and modeled values, respectively; N is the number of observed values.

Evaluation of Model Performance for the GPP Flux in Different Ecosystems
In order to assess the influences of intrinsic parameters on the machine learning models (ANN, ELM, ANFIS, and SVM) in the prediction of carbon fluxes (GPP, R, and NEE), a total of 130 models for each flux were developed, trained, validated, and tested in the present study according to the application of the 26 models (10 ANN, 3 ELM, 10 ANFIS, and 3 SVM models) to the five sites.The accuracy of these applied models was evaluated by using the performance indices, R 2 , IA, RMSE, and MAE.Specifically, R 2 and IA with larger values and RMSE and MAE with smaller values imply a higher model performance.Based on this criterion, each performance index of the applied models was ranked.Overall performance rankings for the employed models were obtained by summing the rankings of the four performance indices and by then sorting the added ranking values.
Two types of statistics for overall performance rankings for GPP prediction in the testing period are illustrated in Figure 1.The numbers shown in the figure reveal the overall rankings of model efficiency that were calculated based on the aforementioned approach.A smaller number indicates higher model efficiency.Figure 1a shows the overall model performance rankings of GPP estimation for five different sites and four different machine learning methods with their respective internal functions (130 models).It can be clearly seen in the figure that the performance of all models depends on the intrinsic parameters and sites.The ANN1, ANN2, ANN5, ANFIS9, and SVM1 models at the CA-Obs site provide better model performance, while the ANFIS8 and SVM3 models at AT-Neu, as well as the ANFIS2, ANFIS7, and SVM3 models at US-Bo1, produce worse model performance.Moreover, except for the SVM2 and SVM3 models, other models generally yield the highest model efficiency at CA-Obs among the five sites, and all models achieve the second highest efficiency at SE-Deg but generate the lowest efficiency at US-Bo1.
The performance rankings of the GPP prediction for four different machine learning methods with their respective internal functions (26 models) at each site are displayed in Figure 1b.The numbers represent the rankings of model efficiency at each stand.It is clear from the figure that, among the 26 models, ANN1, ANN2, ANFIS10, and SVM1 perform better than the others at the AT-Neu site.Similarly, the first four best models at other stands can be concluded as follows: ANN3, ANN5, ANN6, and ELM2 at SE-Deg; ANN1, ANN3, ANN4, and ANN9 at US-Bo1; ANN1, ANFIS9, ANFIS10, and SVM1 at CA-Oas; ANN1, ANN2, ANN5, and ANFIS9 at CA-Obs.In addition, it should be pointed out that the ANN1 model gives the best model performance at both AT-Neu and CA-Obs, and the second and fourth best at US-Bo1 and CA-Oas, respectively.However, the ANN6 model yields the highest model efficiency at SE-Deg.
On the other hand, it can be seen in Figure 1b that the performance of all methods appreciably depends on their internal parameters, when they were used for the GPP prediction in different stands.The optimal models for each approach are briefly summarized as follows: for the ANN, the ANN1 model generates the highest model efficiency at AT-Neu, CA-Oas, and CA-Obs, and the second highest at US-Bo1; for the ELM, the ELM1 and ELM2 models achieve similar model efficiency and are both substantially superior to the ELM3 model at all stands; for the ANFIS, ANFIS9 obtains the best model performance at CA-Obs, and the second best at US-Bo1, while ANFIS10 attains the best performance at AT-Neu, SE-Deg, and CA-Oas; for the SVM, SVM1 performs considerably superior to the SVM2 and SVM3 models at all stands.
The performance of the used models with the best internal functions for GPP forecasting among the training, validation, and testing periods in the five stands is presented in Table 3.It is apparent from the table that the ANN1 model in the testing period gains higher R 2 and IA values and lower RMSE and MAE values than those of other optimal models at both AT-Neu and CA-Obs, which is similar to the ANN6, ANN9, and ANFIS10 models at SE-Deg, US-Bo1, and CA-Oas, respectively.This is also confirmed by Figure 1b.Additionally, it can be seen in Table 3 that the applied models at CA-Obs generally have higher R 2 and IA values and lower RMSE and MAE values than those of the optimal models at the other sites.The performance rankings of the GPP prediction for four different machine learning methods with their respective internal functions (26 models) at each site are displayed in Figure 1(b).The numbers represent the rankings of model efficiency at each stand.It is clear from the figure that, among the 26 models, ANN1, ANN2, ANFIS10, and SVM1 perform better than the others at the AT-Neu site.Similarly, the first four best models at other stands can be concluded as follows: ANN3, ANN5, ANN6, and ELM2 at SE-Deg; ANN1, ANN3, ANN4, and ANN9 at US-Bo1; ANN1, ANFIS9, ANFIS10, and SVM1 at CA-Oas; ANN1, ANN2, ANN5, and ANFIS9 at CA-Obs.In addition, it should be pointed out that the ANN1 model gives the best model performance at both AT-Neu and CA-Obs, and the second and fourth best at US-Bo1 and CA-Oas, respectively.However, the ANN6 model yields the highest model efficiency at SE-Deg.
On the other hand, it can be seen in Figure 1(b) that the performance of all methods appreciably depends on their internal parameters, when they were used for the GPP prediction in different stands.The optimal models for each approach are briefly summarized as follows: for the ANN, the ANN1 model generates the highest model efficiency at AT-Neu, CA-Oas, and CA-Obs, and the second highest at US-Bo1; for the ELM, the ELM1 and ELM2 models achieve similar model efficiency and are both substantially superior to the ELM3 model at all stands; for the ANFIS, ANFIS9 obtains the best model performance at CA-Obs, and the second best at US-Bo1, while ANFIS10 attains the best performance at AT-Neu, SE-Deg, and CA-Oas; for the SVM, SVM1 performs considerably superior to the SVM2 and SVM3 models at all stands.The observed values of the daily GPP and corresponding simulated values by the optimal models at the five sites are compared in Figures 2 and 3, in the form of scatter plot and time series graph, respectively.As can be seen in Figure 2, the fit lines of the four best models at the CA-Oas site are closer to the ideal line (1:1 line) in comparison with those of the optimal models at other sites, while the four best models at the CA-Obs site have a higher R 2 value, indicating that it provides less scattered GPP estimates for the CA-Obs site than the optimal models for other sites.It is evident from Figure 3 that most of the estimated values from the optimal models at each site closely follow the corresponding measured ones.A noteworthy point is that the ANN1 model at CA-Obs underestimates the peaks during the growing season in the validation and testing periods.However, the ANN6 model at the SE-Deg site overestimates the peaks in the testing period.

Evaluation of Model Performance for the R Flux in Different Ecosystems
The overall model performance rankings for the R prediction for five different sites and four different machine learning methods with their respective internal functions in the testing period are displayed in Figure 4(a).As shown from the figure, the ANN1, ANN4, ANN6, ANFIS8, and ANN10 models at the CA-Obs site gain higher model efficiency, while the SVM3 models at AT-Neu, and the ANFIS2, ANFIS6, ANFIS7 and SVM3 models at US-Bo1 obtain lower model efficiency.In addition,

Evaluation of Model Performance for the R Flux in Different Ecosystems
The overall model performance rankings for the R prediction for five different sites and four different machine learning methods with their respective internal functions in the testing period are displayed in Figure 4a.As shown from the figure, the ANN1, ANN4, ANN6, ANFIS8, and ANN10 models at the CA-Obs site gain higher model efficiency, while the SVM3 models at AT-Neu, and the ANFIS2, ANFIS6, ANFIS7 and SVM3 models at US-Bo1 obtain lower model efficiency.In addition, except for the SVM2 and SVM3 models, the models generally achieve the best model performance at CA-Obs among the five sites, and all models achieve the second best efficiency at CA-Oas but provide the worst performance at Us-Bo1.
Atmosphere 2017, 8, x FOR PEER REVIEW 3 of 31 except for the SVM2 and SVM3 models, the models generally achieve the best model performance at CA-Obs among the five sites, and all models achieve the second best efficiency at CA-Oas but provide the worst performance at Us-Bo1.Furthermore, as illustrated in Figure 4(b), the selection of the internal parameters has significant effects on the performance of the corresponding method in the prediction of R among the five stands.The optimal models for each machine learning method are briefly overviewed as follows: for the ANN, the ANN4 model achieves the best model performance at US-Bo1 and CA-Obs and achieves the second best at CA-Oas, while the ANN2 and ANN8 models obtain the best model efficiency at AT-Neu and SE-Deg, respectively; for the ELM, ELM1 generally performs the best at all  Similarly, at the other four sites, the first four best models can be summarized as follows: ANN8, ANN9, ANN10, and ELM1 at SE-Deg; ANN2, ANN5, ANN9, and ELM1 at US-Bo1; ANN1, ANN3, ANN4, and ANN5 at CA-Oas; ANN1, ANN4, ANN6, and ANN10 at CA-Obs.Moreover, it should be noted that the ANN4 model gains the highest model efficiency at CA-Obs and the second highest at US-Bo1 and CA-Oas.However, the ANN2 and ANN8 obtain the highest model efficiency at AT-Neu and SE-Deg, respectively.Furthermore, as illustrated in Figure 4b, the selection of the internal parameters has significant effects on the performance of the corresponding method in the prediction of R among the five stands.The optimal models for each machine learning method are briefly overviewed as follows: for the ANN, the ANN4 model achieves the best model performance at US-Bo1 and CA-Obs and achieves the second best at CA-Oas, while the ANN2 and ANN8 models obtain the best model efficiency at AT-Neu and SE-Deg, respectively; for the ELM, ELM1 generally performs the best at all sites, and the ELM2 model achieves an inferior performance; for the ANFIS, the ANFIS9 and ANFIS10 models achieve similar model performance and both surpass the other ANFIS models in all stands; for the SVM, SVM1 substantially outperforms the SVM2 and SVM3 models at all sites.
The estimated accuracies of the applied methods with the optimal internal functions in the forecasting of daily R among the training, validation, and testing periods in the five stands are displayed in Table 4.As shown in the table, during the testing period, the ANN2 model with higher R 2 and IA values and lower RMSE and MAE values performs better than the other optimal models at AT-Neu, which is similar to the ANN8, ELM1, ANN5 and ANN4 models at SE-Deg, US-Bo1, CA-Oas, and CA-Obs, respectively.This is confirmed by Figure 4b.In addition, as presented in Table 4, similar to GPP forecasting among the five stands, the models at CA-Obs generally achieve higher prediction precision in terms of R 2 , IA, RMSE, and MAE metrics than those of the optimal models at the other four sites.
Figures 5 and 6 show the measured and predicted daily R by the methods with the best internal functions at the five sites in the form of a scatter plot and time series graph, respectively.It is clear from Figure 5 that the fit lines of these optimal models at the four sites (except for the models at AT-Neu) are reasonably close to the ideal lines (1:1 lines), considering the corresponding equations of these fit lines.The four best models at CA-Obs generally generated less scattered estimates than the optimal models at other sites.Additionally, it can be clearly seen in Figure 6 that most of the predicted values from each optimal model at the five sites closely match the corresponding measured ones.Nevertheless, the underestimations of the peaks by both ANN2 at AT-Neu and ANN4 at CA-Obs site and the overestimations of the peaks by ANN8 at SE-Deg are clearly illustrated.

Evaluation of Model Performance for NEE flux in Different Ecosystems
Figure 7a presents the overall rankings of model efficiency in the prediction of daily NEE for five different sites and four different machine learning methods with their respective internal functions in the testing period.As illustrated in the figure, the ANN1, ANN2, ANN9, ANFIS2, and SVM1 models at CA-Obs obtain higher model efficiency, while the ANFIS4, ANFIS6, and SVM3 models at AT-Neu, and the ANFIS7 and SVM3 models at US-Bo1 achieve lower model efficiency.Moreover, the applied models generally achieve similar performance at both CA-Obs and SE-Deg, and the overall performances at these two sites are superior to those at the other three sites.The worst model performance occurs at AT-Neu.
Atmosphere 2017, 8, x FOR PEER REVIEW 3 of 31 Moreover, the applied models generally achieve similar performance at both CA-Obs and SE-Deg, and the overall performances at these two sites are superior to those at the other three sites.The worst model performance occurs at AT-Neu.The model efficiency rankings of the applied methods with their respective internal functions at each site in predicting NEE in the testing period are shown in Figure 7(b).At AT-Neu, among the 26 models, the ANN1, ANN2, ANN9 and ANFIS10 models obtain higher model efficiency than those at the other models.Similarly, for the other four stands, the first four highest efficiency models can be generalized as follows: ANN1, ANN2, ANN3, and ANN7 at SE-Deg; ANN1, ANN3, ANN5, and SVM1 at US-Bo1; ANN1, ANFIS1, ANFIS9, and ANFIS10 at CA-Oas; ANN1, ANN2, ANN9, and SVM1 at CA-Obs.Furthermore, it should be particularly emphasized that the ANN1 model attains the highest model efficiency at AT-Neu and US-Bo1, the second highest at CA-Obs, and the third highest at SE-Deg and CA-Oas.
It can be clearly seen in Figure 7(b) that the selection of internal parameters is of importance for the performance of the used models in forecasting daily NEE in the five sites.The best models for each data-driven approach are briefly generalized as follows: for the ANN, the ANN1 model gains the best model efficiency for AT-Neu and US-Bo1, the second best for CA-Obs and the third best for The model efficiency rankings of the applied methods with their respective internal functions at each site in predicting NEE in the testing period are shown in Figure 7b.At AT-Neu, among the 26 models, the ANN1, ANN2, ANN9 and ANFIS10 models obtain higher model efficiency than those at the other models.Similarly, for the other four stands, the first four highest efficiency models can be generalized as follows: ANN1, ANN2, ANN3, and ANN7 at SE-Deg; ANN1, ANN3, ANN5, and SVM1 at US-Bo1; ANN1, ANFIS1, ANFIS9, and ANFIS10 at CA-Oas; ANN1, ANN2, ANN9, and SVM1 at CA-Obs.Furthermore, it should be particularly emphasized that the ANN1 model attains the highest model efficiency at AT-Neu and US-Bo1, the second highest at CA-Obs, and the third highest at SE-Deg and CA-Oas.
It can be clearly seen in Figure 7b that the selection of internal parameters is of importance for the performance of the used models in forecasting daily NEE in the five sites.The best models for each data-driven approach are briefly generalized as follows: for the ANN, the ANN1 model gains the best model efficiency for AT-Neu and US-Bo1, the second best for CA-Obs and the third best for both SE-Deg and CA-Oas; for the ELM, except for SE-Deg, ELM1 and ELM2 generally yield similar model performance, and these two models outperform the ELM3 model at the other four sites; for the ANFIS, except for SE-Deg, the ANFIS9 outperforms the other ANFIS models at the other four sites; for the SVM, SVM1 appreciably surpasses the SVM2 and SVM3 models at all sites, and the SVM2 and SVM3 models have similar model performance in all five stands.
Table 5 exhibits the estimated precision of the employed approaches with the optimal internal functions in the prediction of daily NEE in all stands.It can be clearly seen in the table that the ANN1 model with higher R 2 and IA values and lower RMSE and MAE values in the testing period is superior to the other optimal models at AT-Neu, which is similar to the ANN3, ANN1, ANFIS10 and ANN2 models at SE-Deg, US-Bo1, CA-Oas, and CA-Obs, respectively.This is confirmed by Figure 7b.In addition, as displayed in Table 5, all optimal models obtain similar accuracy at both CA-Oas and CA-Obs sites and these models achieve higher accuracy than those of the other models at AT-Neu, SE-Deg, and US-Bo1 sites, with respect to R 2 , IA, RMSE, and MAE statistics.
The observed and estimated daily NEE by the methods with the best internal functions at the five sites are illustrated in Figures 8 and 9.As can be seen in the scatter plots in Figure 8, the fit lines of the ANN3 model at SE-Deg and the ANFIS10 model at CA-Oas are closer the exact fit lines (1:1 lines) than those of the optimal models at other sites, taking into account the corresponding equations of fit lines.The four best models at AT-Neu yield the most scattered estimates compared with the optimal models at other sites, followed by the models at US-Bo1.In addition, it is found from Table 5 and Figure 9 that the estimated accuracy for daily NEE in the testing period is substantially inferior to those for GPP and R in the five stands.Nonetheless, all of these selected optimal models at SE-Deg, CA-Oas, and CA-Obs in the prediction of daily NEE as shown in Table 5 provide satisfactory estimates with R 2 values ranging from 0.6747 to 0.8345 and IA ranging from 0.7910 to 0.9540.As illustrated in Figure 9, most of the predicted values from the best models at the five sites closely follow the corresponding measured ones.However, at CA-Obs, during the growing season, the estimate values from the ANN2 model seem to fail to well match the corresponding measured ones.Specifically, the predicted NEE error during the growing season contributes to approximately 59% of the annual total NEE error.Additionally, the ANN2 model during the growing season obtained reasonable results with R 2 = 0.71 and IA = 0.85, and the error estimated by this model during the growing season is equal to 47 g C m −2 day −1 , in comparison with the inter-annual variability of observed NEE from 2003 to 2008 (from 30 to 70 g C m −2 day −1 ).

Modeling Capability of Machine Learning Approaches and Their Comparison
The results of this study demonstrated that the proposed machine learning methods were able to imitate the dynamic processes of carbon exchanges between terrestrial ecosystems and the atmosphere.Though these models were developed using only four different climatic forces (Ta, Ts,

Modeling Capability of Machine Learning Approaches and Their Comparison
The results of this study demonstrated that the proposed machine learning methods were able to imitate the dynamic processes of carbon exchanges between terrestrial ecosystems and the atmosphere.Though these models were developed using only four different climatic forces (T a , T s , R n , and R h ) as inputs, it was found that a significant amount of variation in daily carbon fluxes, especially for GPP and R, was explained by these independent variables (Tables 3-5).For example, among the optimal SVM models at CA-Obs, SVM1 provided the most accurate estimates for all carbon fluxes, with R 2 = 0.9574, IA = 0.9818, RMSE = 0.6915 g C m −2 day −1 , and MAE = 0.4290 g C m −2 day −1 for GPP, 0.9471, 0.9809, 0.4692 g C m −2 day −1 , and 0.3385 g C m −2 day −1 for R, and 0.8175, 0.9255, 0.6426 g C m −2 day −1 , and 0.4746 g C m −2 day −1 for NEE.In addition, at CA-Oas, for each carbon flux, these superior models (Tables 3-5) achieved similar estimates of the annual total flux, and the errors of mean annual total are equal to 5.74%, 2.15%, and 11.40% for GPP, R and NEE, respectively.Therefore, it can be concluded that these developed machine learning models successfully reproduced daily carbon fluxes in terrestrial ecosystems.
Furthermore, comparing the modeling accuracy of multiple machine learning models can enable us to determine the most appropriate model for solving a given application.In recent years, the predictive abilities of both ANN and SVM methods for carbon flux prediction have been substantiated by many previous studies [27,71,72].In addition, since the relatively new ELM and ANFIS methods, used here for the first time to forecast daily carbon fluxes, exhibited comparatively satisfactory predictive accuracy in terms of various performance criteria, they can be viewed as promising alternative tools to traditional ANN and SVM approaches.Nevertheless, according to our estimates, it should be pointed out that, among the evaluated approaches, the ANN method generally performed the best for each carbon flux at all five sites.

Effects of Different Inner Parameters on Their Corresponding Approaches
It is further discovered that different internal parameters considerably affected the modeling ability of their corresponding methods for the carbon fluxes at all sites.In spite of the fact that our proposed methods presented a strong ability to reproduce carbon fluxes measured by the eddy covariance technique, their generalization performance was still restricted by the corresponding inner functions.To be specific, regarding the ANN method, it is well acknowledged that the Levenberg-Marquardt algorithm (trainlm) is the most widely utilized training function.However, according to our estimates, the ANN model with the Levenberg-Marquardt algorithm (ANN1) was not the universally optimal model for each carbon flux at all sites (Figures 1, 4 and 7).Therefore, it seems to be difficult to find a single common training function that guarantees the best calculation accuracy for all situations.Similarly, for the ELM method, a noteworthy point is that both the ELM1 (sine) and ELM2 (sigmoid) models were appreciably and consistently superior to the ELM3 (hard limit) model in all cases.Moreover, it was found that the ELM1 model did not always perform better than the ELM2 model in terms of the performance metrics for each flux at all sites.Consequently, the modeling capability of both ELM1 and ELM2, to some degree, depended on their given applications.For this reason, in order to accurately estimate the daily carbon fluxes, both the sine and sigmoid algorithms should be used as alternative transfer functions.
In addition, it was observed that the SVM1 model (RBF kernel) substantially surpassed the SVM2 (polynomial kernel) and SVM3 (sigmoid kernel) models in terms of mapping the complex carbon exchange processes at the five sites, which concurs with the previous findings for the applications of other fields [73,74].In addition, in most cases, the SVM2 and SVM3 models offered the worst modeling accuracy in comparison with the other three methods (ANN, ELM, and ANFIS) with different internal parameters.Therefore, when dealing with a nonlinear regression problem, it is particularly important to determine a proper kernel function when a SVM model is developed.
Lastly, the potential and effectiveness of the ANFIS technique was explored to estimate the daily carbon fluxes.Additionally, investigating the potential impacts of the inner parameters on the predictive performance of the ANFIS was also taken into consideration as a second important task for this method.Specifically, we not only examined the effects of different types of membership functions under a grid partitioning algorithm but also investigated the influences of different types of FISs.Regarding the former, it was demonstrated that appreciable differences in the forecasting ability of the ANFIS models existed among the eight various membership functions for each carbon flux at all sites.The prediction accuracy of these ANFIS models depended on the membership functions.No single membership function was universally optimal for the grid partitioning algorithm for this study.Based on this finding, the trial and error method was recommended in order to determine the most appropriate membership function.On the other hand, based on the comparison of three different identification methods for generating FISs, most of the ANFIS models with both the subtractive clustering and fuzzy c-means algorithms provided similar modeling precision for each flux at all sites and performed substantially better than the ANFIS models with the grid partitioning algorithm (ANFIS1-8).Therefore, both the subtractive clustering and fuzzy c-means algorithms should be viewed as the preferable alternative methods for developing an optimal ANFIS model for predicting daily carbon fluxes.
Given the findings above, our research has confirmed the importance of evaluating the effects of inner parameters on their respective methods in order to more accurately characterize the terrestrial carbon fluxes.These effects should be taken into consideration with respect to the estimation of terrestrial carbon and water fluxes and the interpolation of missing carbon flux data at the ecosystem level.When these issues are to be addressed, the optimal inner parameters can be determined by trial and error.

Difference in Forecasting Accuracy among Various Fluxes and Sites
Adopting the same inputs for all fluxes with the continuous six-year measurement data can offer a reliable and valuable means for benchmarking the prediction of different carbon fluxes.Among the three carbon fluxes at the five sites, our proposed models generally provided the best estimates for both GPP and R, but obtained inferior estimates for NEE, which is in good agreement with previous results found by Dou et al. [27].Despite the fact that higher accuracy for both GPP and R was obtained by only using four climatic factors as explanatory variables, which made it more convenient and easy to develop the relevant models, it turned out that omitting certain driving factors as model inputs, especially for NEE, may lead to slight under-fitting in the model training and generalization periods.In addition to the variables used in this study, numerous recent studies have focused on the responses of carbon fluxes to other factors, such as drought, precipitation, vapor pressure deficit, and soil moisture content [75][76][77][78][79]. Therefore, identifying the important factors controlling the processes of terrestrial carbon exchanges and adding some key factors should be able to enhance the predictive capability of the models.
Furthermore, the forecasting accuracy of various carbon fluxes is constrained by the quality of carbon flux measurement data induced by the eddy covariance method.This is because, when using these data for diverse applications, there are still many problems and challenges, such as data uncertainty mainly from random and systematic errors [80], energy balance closure [81], and the partitioning of NEE flux into GPP and R [82].Moreover, appropriately selecting an effective gap-filling method for flux data may be capable of enhancing the predictive accuracy of carbon fluxes [49,83].It should be noted that the use of daily flux data obtained from the gap-filled half-hourly flux data may lead to additional error.However, based on the daily time scale, relatively less time is required to train the applied models, and their estimates can provide helpful insight into carbon budget studies.
Considerable difference among various sites was found in the simulated accuracy of each flux.More specifically, our developed models generally performed the best for the three fluxes in forest ecosystems, including CA-Oas and CA-Obs sites, but provided the worst estimates for both GPP and R in a cropland ecosystem (US-Bo1) and for NEE in a grassland ecosystem (AT-Neu).These estimates highlighted the importance of discussing the differences among different terrestrial ecosystems in the estimation of carbon fluxes.It has been acknowledged that many biophysical factors affect the photosynthetic and respiration capacity in terrestrial ecosystems, such as stomatal conductance [84] and light use efficiency [85,86].However, these factors are either often poorly described or omitted in the estimation of carbon fluxes.
In addition, the superior generalization ability in the two mature forest ecosystems may be attributed predominantly to the obvious periodic fluctuation in the carbon fluxes within the continuous multi-year measurements and their relatively stable interactions with the change and variation in environmental factors [87].In contrast, uncontrolled grazing, vegetation removal, and fire can cause severe damage to grassland ecosystems [88][89][90].During the long-time human management activities, a variety of practices, such as irrigation, fertilization, harvesting, and rotation, substantially affect the ecological processes and ecological functions of cropland ecosystems [91][92][93].The above-mentioned disturbed factors should be taken into account and incorporated into the applied machine learning modes with the intention of enhancing predictive precision, especially for grassland and cropland ecosystems.

Conclusions
The use of two state-of-the-art machine learning techniques (ELM and ANFIS) was undertaken in this investigation as a major contribution to the study of carbon flux estimation in terrestrial ecosystems.This contribution is expected to exert a considerable effect upon elucidating the biophysical mechanisms that dominate the exchanges of carbon dioxide between the land surface and atmosphere.More specifically, in this study, four modeling techniques were used and compared for estimating daily carbon fluxes in five different types of ecosystems.Moreover, the effects of internal algorithms on each method were also appraised based on four performance indices (R2, IA, RMSE, and MAE).
Our results demonstrated that machine learning models were able to effectively quantify the nonlinear processes that control the variation of carbon fluxes, especially for GPP and R, between the land surface and atmosphere, using the limited climatic factors.The estimates from the novel ELM and ANFIS models were highly satisfactory.Their estimates were comparable to those of the conventional ANN and SVM models.On the whole, although the ANN method performed the best for each flux at the five sites, the ELM and ANFIS methods should be strongly recommended mainly due to their robustness and flexibility.In addition, in practical terms, the ELM and ANFIS approaches, in comparison with traditional ANN and SVM methods, require fewer parameters to be selected and optimized.In particular, for the ELM method, the computational time was substantially reduced.Moreover, it was found that the generalization ability of each method was dependent upon their respective internal parameters considered in this study.Specifically, the most widely utilized Levenberg-Marquardt algorithm in the ANN method was not the universally optimal choice for any of the carbon fluxes at the study sites.The ELM models with the sine and sigmoid transfer functions provided similar modeling accuracy and were consistently superior to the ELM model with the hard limit function.Among the three kernel functions for the SVM method, the RBF kernel function generated the most accurate estimates and substantially outperformed the polynomial and sigmoid functions.When the grid partitioning algorithm was used to generate the FISs, the modeling performance of the ANFIS approach appreciably differed among the different membership functions.In addition, the subtractive clustering and fuzzy c-means algorithms in the ANFIS method generally achieved similar estimates and performed substantially better than the grid partitioning algorithm.Among the five sites, considerable differences were found in the estimated accuracy of each flux.In general, the models performed the best in forest ecosystems for the three fluxes, including those of the CA-Oas and CA-Obs sites, and provided the worst estimates of both GPP and R in cropland ecosystem fluxes and of NEE in the grassland ecosystem.
Our findings show that the two newly proposed ELM and ANFIS methods can be considered as desirable complements to conventional machine learning methods for characterizing land-atmosphere carbon fluxes.However, it is likely that these intelligent approaches will be criticized, predominantly owing to the lack of clarity regarding the intrinsic mechanisms of well-trained models.This criticism reduces, to some extent, their reliability and then hinders their applications and popularity.Therefore, in theory, more efforts should be made to uncover the hidden and useful knowledge from these black boxes in future work.Furthermore, another alternative solution can be considered: machine learning methods can be utilized in conjunction with physically based models through correction in the latter models.This combination is inspired by the fact that machine learning techniques and physically based methods can complement each other.The former have a strong ability to deal with nonlinear problems, while the latter are established based on detailed mechanisms controlling the physical processes of carbon exchanges and hence can produce more reliable estimates.For this reason, this incorporation will be carried out in our future studies with the aim of achieving more accurate carbon flux estimates, on ecosystem or on global scales, and eventually providing for the scientific community reliable knowledge leading to diagnoses of present and future climate change.

31 Figure 1 .
Figure 1.Model performance rankings of gross primary productivity (GPP) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure 1 .
Figure 1.Model performance rankings of gross primary productivity (GPP) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure 2 .
Figure 2. Daily GPP observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites: (a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 2 .
Figure 2. Daily GPP observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites: (a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 3 .
Figure 3. Daily GPP observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN1 at AT-Neu; (b) ANN6 at SE-Deg; (c) ANN9 at US-Bo1; (d) ANFIS10 at CA-Oas; (e) ANN1 at CA-Obs.

Figure 3 .
Figure 3. Daily GPP observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN1 at AT-Neu; (b) ANN6 at SE-Deg; (c) ANN9 at US-Bo1; (d) ANFIS10 at CA-Oas; (e) ANN1 at CA-Obs.

Figure 4 .
Figure 4. Model performance rankings of ecosystem respiration (R) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure 4 (
Figure 4(b) presents the overall model performance rankings of R prediction for four different machine learning methods with their respective internal functions at each site.As shown from the figure, among the 26 models at AT-Neu, the ANN1, ANN2, ANFIS9 and ANFIS10 models are superior.Similarly, at the other four sites, the first four best models can be summarized as follows: ANN8, ANN9, ANN10, and ELM1 at SE-Deg; ANN2, ANN5, ANN9, and ELM1 at US-Bo1; ANN1, ANN3, ANN4, and ANN5 at CA-Oas; ANN1, ANN4, ANN6, and ANN10 at CA-Obs.Moreover, it should be noted that the ANN4 model gains the highest model efficiency at CA-Obs and the second highest at US-Bo1 and CA-Oas.However, the ANN2 and ANN8 obtain the highest model efficiency at AT-Neu and SE-Deg, respectively.Furthermore, as illustrated in Figure4(b), the selection of the internal parameters has significant effects on the performance of the corresponding method in the prediction of R among the five stands.The optimal models for each machine learning method are briefly overviewed as follows: for the ANN, the ANN4 model achieves the best model performance at US-Bo1 and CA-Obs and achieves the second best at CA-Oas, while the ANN2 and ANN8 models obtain the best model efficiency at AT-Neu and SE-Deg, respectively; for the ELM, ELM1 generally performs the best at all

Figure 4 .
Figure 4. Model performance rankings of ecosystem respiration (R) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure
Figure 4b presents the overall model performance rankings of R prediction for four different machine learning methods with their respective internal functions at each site.As shown from the figure, among the 26 models at AT-Neu, the ANN1, ANN2, ANFIS9 and ANFIS10 models are superior.Similarly, at the other four sites, the first four best models can be summarized as follows: ANN8, ANN9, ANN10, and ELM1 at SE-Deg; ANN2, ANN5, ANN9, and ELM1 at US-Bo1; ANN1, ANN3, ANN4, and ANN5 at CA-Oas; ANN1, ANN4, ANN6, and ANN10 at CA-Obs.Moreover, it should be noted that the ANN4 model gains the highest model efficiency at CA-Obs and the second highest at US-Bo1 and CA-Oas.However, the ANN2 and ANN8 obtain the highest model efficiency at AT-Neu and SE-Deg, respectively.

Figure 5 .
Figure 5. Daily R values observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites.(a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 5 .
Figure 5. Daily R values observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites.(a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 6 .
Figure 6.Daily R observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN2 at AT-Neu; (b) ANN8 at SE-Deg; (c) ELM1 at US-Bo1; (d) ANN5 at CA-Oas; (e) ANN4 at CA-Obs.

Figure 7 (
Figure 7(a) presents the overall rankings of model efficiency in the prediction of daily NEE for five different sites and four different machine learning methods with their respective internal functions in the testing period.As illustrated in the figure, the ANN1, ANN2, ANN9, ANFIS2, and SVM1 models at CA-Obs obtain higher model efficiency, while the ANFIS4, ANFIS6, and SVM3 models at AT-Neu, and the ANFIS7 and SVM3 models at US-Bo1 achieve lower model efficiency.

Figure 6 .
Figure 6.Daily R observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN2 at AT-Neu; (b) ANN8 at SE-Deg; (c) ELM1 at US-Bo1; (d) ANN5 at CA-Oas; (e) ANN4 at CA-Obs.

Figure 7 .
Figure 7. Model performance rankings of net ecosystem exchange (NEE) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure 7 .
Figure 7. Model performance rankings of net ecosystem exchange (NEE) estimation in the testing period: (a) for five different sites and four different machine learning methods with their respective internal functions (130 models); (b) for four different machine learning methods with their respective internal functions (26 models) at each site.

Figure 8 .
Figure 8. Daily NEE observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites.(a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 8 .
Figure 8. Daily NEE observed via the eddy covariance method and modeled by the four different machine learning models with the best internal functions in the testing period at the five sites.(a) AT-Neu; (b) SE-Deg; (c) US-Bo1; (d) CA-Oas; (e) CA-Obs.

Figure 9 .
Figure 9. Daily NEE observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN1 at AT-Neu; (b) ANN3 at SE-Deg; (c) ANN1 at US-Bo1; (d) ANFIS10 at CA-Oas; (e) ANN2 at CA-Obs.

Figure 9 .
Figure 9. Daily NEE observed via the eddy covariance method and modeled by the optimal machine learning models for the five sites in the whole period including training (the first four years), validation (the fifth year), and testing (the final year) periods.(a) ANN1 at AT-Neu; (b) ANN3 at SE-Deg; (c) ANN1 at US-Bo1; (d) ANFIS10 at CA-Oas; (e) ANN2 at CA-Obs.

Table 1 .
Site characteristics used in this study.The mean annual temperature (MAT, • C) and total annual precipitation (TAP, mm year −1 ) are calculated according to the time period (Period).Vegetation types include grassland (GRA), wetland (WET), cropland (CRO), deciduous broadleaf forest (DBF), and evergreen needle leaf forest (ENF).

Table 2 .
Daily statistical parameters of eddy covariance tower measured environmental variables including air temperature (T a , • C), net radiation (R n , mol m −2 ), relative humidity (R h , %), and soil temperature (T s , • C) in the whole period in all five stands.Note: X mean , X max , X min , X sd , X ku , and X sk refer to the mean, maximum, minimum, standard deviation, kurtosis, and skewness of each variable, respectively; CC GPP , CC R , and CC NEE refer to the correlation coefficient between each variable and GPP, R, and NEE, respectively.

Table 3 .
Performance of machine learning models with the best internal functions for gross primary productivity (GPP, g C m −2 day −1 ) among the training, validation, and testing periods in the five stands.

Table 4 .
Performance of machine learning models with the best internal functions for ecosystem respiration (R, g C m −2 day −1 ) among the training, validation, and testing periods in the five stands.

Table 5 .
Performance of machine learning models with the best internal functions for net ecosystem exchange (NEE, g C m −2 day −1 ) among the training, validation, and testing periods in the five stands.