Comparison of Machine Learning Methods to Up-Scale Gross Primary Production

: Eddy covariance observation is an applicable way to obtain accurate and continuous carbon ﬂux at ﬂux tower sites, while remote sensing technology could estimate carbon exchange and carbon storage at regional and global scales effectively. However, it is still challenging to up-scale the ﬁeld-observed carbon ﬂux to a regional scale, due to the heterogeneity and the unstable air conditions at the land surface. In this paper, gross primary production (GPP) from ground eddy covariance systems were up-scaled to a regional scale by using ﬁve machine learning methods (Cubist regression tree, random forest, support vector machine, artiﬁcial neural network, and deep belief network). Then, the up-scaled GPP were validated using GPP at ﬂux tower sites, weighted GPP in the footprint, and MODIS GPP products. At last, the sensitivity of the input data (normalized difference vegetation index, fractional vegetation cover, shortwave radiation, relative humidity and air temperature) to the precision of up-scaled GPP was analyzed, and the uncertainty of the machine learning methods was discussed. The results of this paper indicated that machine learning methods had a great potential in up-scaling GPP at ﬂux tower sites. The validation of up-scaled GPP, using ﬁve machine learning methods, demonstrated that up-scaled GPP using random forest obtained the highest accuracy.


Introduction
As vegetation productivity is an important part of the terrestrial carbon cycle, accurately estimating this component is significant in research on terrestrial ecosystems, carbon cycles, and climate change [1][2][3][4]. Gross primary productivity (GPP) is one of the main factors to determine the carbon source and sink of the ecosystems, which could reflect the response of the ecosystems to global change [5]. Ground-observed data are most representative in an area around the field instruments; the area that defined the spatial context of the measurement means footprint scale of the field data [6,7]. Remote sensing technology could monitor the land surface carbon exchange and carbon storage in a large area, which is at a regional or global scale. However, it is still a challenge to match the field data with GPP derived from satellite observations data, and to up-scale GPP at flux tower sites to a regional scale [8]. In this condition, studies to exploit the ways of up-scaling ground-observed GPP to regional GPP, and to validate the effectiveness of these methods, is of great significance in revealing vegetation productivity dynamics and carbon budgets [9].
Studies have been carried out to up-scale GPP at flux tower sites to a regional scale or global scale based on remote sensing data in recent years [10][11][12][13]. Some regression Remote Sens. 2021, 13, 2448 2 of 20 models [12][13][14] have been developed based on the assumption that the relationships between field data and the corresponding satellite image pixels are constant. For example, Jung et al. [15,16] up-scaled FLUXNET carbon data to a global scale, with a resolution of 0.5 • , using an integrated regression tree based on the relationship between GPP and fraction of photosynthetically active radiation (FPAR), and validated the results against Moderate Resolution Imaging Spectroradiometer (MODIS) products and GPP, using the Biome bio-geochemical cycle (Biome-BGC) model. Badgley et al. analyzed the relationship between GPP and near-infrared reflectance of vegetation (NIR v ) from FLUXNET eddy covariance sites, and estimated the global annual terrestrial photosynthesis [10]. Virkkala et al. up-scaled CO 2 fluxes at a relatively high spatial resolution (1 km 2 ) across the highlatitude region, using five commonly used statistical models [17]. Gu et al. [18] described a relationship between the satellite-derived growing season averaged normalized difference vegetation index (NDVI) and annual productivity for grasslands in the Greater Platte River Basin of the United States. Gilmanov et al. [19] studied long-term measurements at five Northern Great Plains to obtain relationships between carbon dioxide fluxes and photosynthetically active radiation.
However, studies have also indicated that the regression models based on the relationship between GPP at flux tower sites and other variables (such as field FPAR, vegetation index) had poor robustness when up-scaling to a regional scale [12,13,20]. To solve this problem, some machine learning models, such as the regression tree [21], artificial neural network (ANN) [22], random forest (RF) [23][24][25], and support vector machine (SVM) [26], have been used to retrieve the land surface parameters from remote sensing data. Although the eco-physiological processes of the machine learning models were not clear, and the accuracy relied much on the number and representativeness of the training data. Some researches tried to up-scale GPP from the perspective of data assimilation, which could make the best of the observed data and prior knowledge of the parameters. For example, Desai et al. used the Markov chain Monte Carlo (MCMC) model to up-scale carbon flux data [27]. Xiao et al. up-scaled GPP in the Northern United States, using the diagnostic carbon flux model (DCFM), based on the data assimilation from 17 flux net stations [28]. Chen et al. optimized the parameters of the vegetation photosynthesis model (VPM) to derive GPP from an assimilation of footprint and Landsat data [29]. However, the data assimilation usually occupies a large amount of computation. Some other studies tried to up-scale the inputs of the GPP model firstly, then estimated the regional GPP by using the up-scaled inputs [30,31], but the errors of the input data in the up-scaling process would have an influence on the accuracy of the results, and the uncertainty was difficult to qualify. To sum up, although some up-scaling methods of GPP, using statistical regression, data assimilation and machine learning models, have been proposed in recent years, few studies focused on the difference in the accuracy and feasibility these up-scaled models. In this condition, comparing the precision of these methods and analyzing the applicability of these models is urgent.
It is an effective way to validate the estimated GPP by comparison with the observed data. The flux net observed network, such as FLUXNET, AmeriFLUX, AsiaFLUX, Chi-naFLUX, which contained more than 500 total stations, had been constructed to monitor carbon flux regionally or globally, and to validate the simulative GPP from different kinds of models [32][33][34]. Some other projects, such as Validation of Land European Remote Sensing Instruments (VALERI) [35] and BigFoot [36,37], had also been designed to validate the satellite products. Considering the difference in the footprint between field observed data and the satellite data would bring some uncertainty in direct validation; some indirect validation methods were also proposed to assess the precision of the remote sensing products. For example, Wang et al. validated the estimated biomass and Net Primary Production (NPP) with field data firstly, then up-scaled NPP to 1 km and validated the estimated NPP at a regional scale [38]. Mueller et al. found that the difference in the estimated annual average global GPP could be as high as 50% when validating from field data to regional data [39]. The validation of up-scaled GPP had typically been on training (flux tower) sites, Remote Sens. 2021, 13, 2448 3 of 20 which are limited in number and mostly not independent. Few studies concentrated on the validation of scale transform results at different scales systematically. In this case, studying the validating methods of up-scaled GPP at several scales is of great significance.
The aims of this paper are to up-scale GPP at flux tower sites to a regional scale, by using five machine learning models (Cubist regression tree, RF, SVM, ANN and deep belief network (DBN)), based on remote sensing data (MODIS land surface reflectance products, and derived fractional vegetation cover, land-cover products), and to compare the accuracy of the different up-scaling approaches. The results of this paper demonstrated the applicability and feasibility of machine learning models in up-scaling GPP at flux tower sites.

Study Area
A case study was conducted in Heihe River Basin, which is located in an arid and semi-arid area in Northwestern China (37 • 41 ~42 • 42 N, 96 • 42 ~102 • 00 E), and could be divided into upper basin, mid-basin and lower basin generally according to the natural, ecological and climate characteristics [40]. Land-cover types in this area include deciduous broadleaved forest (DBF), evergreen needle-leaved forest (ENF), mixed forest (MF), cropland, bare land, wetland, shrub land, grass land, water, ice or snow ( Figure 1). Bare lands are mainly distributed in the downstream in the northern areas, forest and cropland constitute the dominant lands in the upstream in southern areas. Additionally, some shrub lands and wetlands are also distributed in the southern areas. The study area could be divided into mainly bare land in the north with an elevation range from 500 m to 1000 m, and mountains in the south with an elevation about 1000 m~5500 m. This area has a temperate climate, with an average annual air temperature about 6.0~8.0 • C, an average annual precipitation about 100~250 mm, and an average annual pan evaporation about 1200~1800 mm [41,42]. The MODIS land surface reflectance (LSR) products (MOD09A1) with a spatial resolution of 500 m were provided on an 8-day basis [43]. In this study, MODIS LSR products from May to September in 2014 were used. These data contained the best possible observation during an 8-day period after atmospheric correction, which could provide an estimate of the surface reflectance as it would be measured at ground level in the absence of atmospheric scattering or absorption [44]. Then 8-day NDVI were derived using the red and near-infrared reflectance from LSR products, and were aggregated to 1 km to train the up-scaling model of GPP. Additionally, 8-day fractional vegetation cover (FVC) from May to September in 2014 were also obtained from 8-day NDVI by using the dimidiate pixel model [45], and were also aggregated to 1 km.

B. Land-Cover Products
The land-cover map, which comprised of 11 land-cover types, including forests, crops, bare lands and wetlands with a spatial resolution of 30 m and temporal resolution of one month in 2014 was adopted in this paper [46]. This map was generated from Chinese HJ-1 satellite images based on the time-series patterns of the land-cover types [47,48]. Studies have shown that the overall classification accuracy of the map could be as high as 92.19% [49]. To make the datasets comparable, the 30 m land-cover data were aggregated to 1 km.

C. MODIS GPP Products
The MODIS primary production products (MOD17A2) were designed to provide an accurate regular measure of the growth of the terrestrial vegetation with a spatial Remote Sens. 2021, 13, 2448 4 of 20 resolution of 1 km and temporal resolution of 8 days [50]. MOD17 GPP outputs, which were generated using a radiation use efficiency model, were useful for global carbon cycle analysis, ecosystem status assessment, and environmental change monitoring. Modification of parameters in the Biome property look-up table (BPLUT) had been made to agree with GPP derived from measurements at eddy flux towers and estimated GPP [51]. The BPLUT contains parameters for temperature and vapor pressure difference (VPD) limits, light use efficiency, specific leaf area and respiration coefficients for representative vegetation in each biome type [51]. In this study, MODIS GPP products from May to September in 2014 were used as the reference data to cross validate the up-scaling results.

A. MODIS Land Surface Reflectance Products
The MODIS land surface reflectance (LSR) products (MOD09A1) with a spatial resolution of 500 m were provided on an 8-day basis [43]. In this study, MODIS LSR products from May to September in 2014 were used. These data contained the best possible observation during an 8-day period after atmospheric correction, which could provide an estimate of the surface reflectance as it would be measured at ground level in the absence of atmospheric scattering or absorption [44]. Then 8-day NDVI were derived using the red

Meteorological Data
Meteorological datasets, including air pressure, precipitation, wind speed and direction, air temperature and humidity, and four radiation components (upward shortwave radiation, downward shortwave radiation, upward longwave radiation, downward longwave radiation) were collected from nine automatic meteorological stations in the study area from May to September in 2014. Meteorological observation instruments were in- Datasets of 2 m surface vapor pressure, vapor-to-liquid ratio, downward shortwave radiation and the precipitation were generated with a spatial resolution of 0.05 • , and temporal resolution of one hour from the weather research and forecasting model (WRF) from May to September in 2014 were also collected [52]. Studies have demonstrated that a good linear relationship (R 2 was higher than 0.90) existed between the datasets and the observed data from China Meteorological Administration and seven Watershed Allied Telemetry Experimental Research (WATER) stations [53,54]. These datasets (2 m surface vapor pressure, vapor-to-liquid ratio, downward shortwave radiation and the precipitation) were interpolated to 1 km firstly, then were used to up-scale GPP from flux tower sites to regional scale.

Field Data
Field carbon flux data used in this study were derived from the Multi-Scale Observation Experiment on Evapotranspiration over heterogeneous land surfaces in 2012 of the Heihe Watershed Allied Telemetry Experiment Research (HiWATER-MUSOEXE) [54][55][56]. Half-hourly carbon flux data from 2 eddy covariance systems in the upstream, 4 eddy covariance systems in the midstream and 3 eddy covariance systems in the downstream (Table 1) were collected. Time period of the flux data was from May to September in 2014. Land-cover types of the 9 eddy covariance systems included cropland (mainly maize), bare land, grass land, shrub land, wetland, and forest. Gaps in the observed datasets, which were caused by the system malfunction, wind and rain were firstly filled. Principles to control data quality and to exclude the outlier of the datasets were as follows: (1) excluding the observed data one hour before and after precipitation; (2) excluding the data that were outside the instruments' measurement range or outside the reasonable range of values; (3) excluding the negative value at night due to there being no assimilation of carbon dioxide in the photosynthesize at night; (4) excluding the datasets when friction velocities were lower than the threshold value [57][58][59] at night. At daytime, gaps in the flux data were filled by using a look-up table [58,59], which was built on flux, air temperature and photosynthetically active radiation. At night, gaps in the flux data were filled based on the relationship between the ecosystem respiration and the air temperature or the soil temperature, which was described as [59] follows: where R eco is ecosystem respiration (mg·m −2 ·s −1 ), a and b are the coefficient, T is the air temperature or the soil temperature ( • C) at night. Then daily carbon flux was obtained by summing the half-hourly carbon flux data. At last, daily GPP were obtained by partitioning the observed net flux into GPP and ecosystem respiration [59][60][61], the process could be described as [59][60][61] follows: where NEE is the net ecosystem carbon dioxide exchange. One advantage of using daily data instead of half-hourly intervals to derive daily GPP is to reduce errors on deriving NEE and GPP using temperature and radiation, because observed fluxes and meteorological conditions such as photosynthetic active radiation have significant variations from hourto-hour [62,63]. Studies indicated that using daily data could reduce the effect of the hour-to-hour variations when deriving NEE and GPP [62,63].

Methods
The flowchart of up-scaling GPP using machine learning method is shown in Figure 2. Firstly, the GPP from eddy covariance systems were up-scaled based on the MODIS NDVI, land-cover types, FVC and the meteorological datasets using five machine learning methods (Cubist regression tree, RF, SVM, ANN, DBN). A footprint source area model (FSAM) [64][65][66][67] was adopted to obtain the footprint of GPP at flux tower sites used in this study. Then the up-scaled GPP were validated at in situ scale, at footprint scale and at regional scale. At last, up-scaling GPP using different machine learning methods were compared, sensitiveness of the input data to the up-scaled GPP was analyzed, and the uncertainty of the methods was discussed. Five machine learning methods were used to up-scale GPP from field to regional scale, then the results were validated at field scale, at footprint scale and at regional scale.

Up-Scaling GPP Using Machine Learning Methods
Studies have shown that GPP is influenced by shortwave radiation (SWR), air temperature (Ta), vapor pressure deficit (VPD), soil moisture and nitrogen availability at canopy scale. While at ecosystem scale, GPP was well related to leaf area index (LAI), NDVI and canopy phenology [68]. In this paper, NDVI and FVC, which were selected as two of the inputs to describe the terrestrial photosynthetic vegetation activity to train the machine learning models to up-scale GPP at flux tower sites. Moreover, considering that SWR, Ta and RH characterized the condition of carbon storage under natural environment [9,69], these factors were also taken into account when up-scaling GPP at flux tower sites. Daily GPP from the field carbon flux data, and corresponding remote sensing data (NDVI, FVC, land cover) and meteorological data (SWR, Ta, RH) were selected as the training datasets of machine learning methods. To make the input parameters comparable, the inputs of the training data were normalized firstly. Then five machine learning methods (Cubist regression tree [70,71], RF [72], SVM [26], ANN [73], DBN [74]) were adopted to up-scale the GPP at flux tower sites.

Footprint of GPP at Flux Tower Sites
To analyze the spatial representativeness of GPP from ground eddy covariance systems, FSAM [64][65][66][67] was adopted to study the real-time footprint of GPP at flux tower Five machine learning methods were used to up-scale GPP from field to regional scale, then the results were validated at field scale, at footprint scale and at regional scale.

Up-Scaling GPP Using Machine Learning Methods
Studies have shown that GPP is influenced by shortwave radiation (SWR), air temperature (Ta), vapor pressure deficit (VPD), soil moisture and nitrogen availability at canopy scale. While at ecosystem scale, GPP was well related to leaf area index (LAI), NDVI and canopy phenology [68]. In this paper, NDVI and FVC, which were selected as two of the inputs to describe the terrestrial photosynthetic vegetation activity to train the machine learning models to up-scale GPP at flux tower sites. Moreover, considering that SWR, Ta and RH characterized the condition of carbon storage under natural environment [9,69], these factors were also taken into account when up-scaling GPP at flux tower sites. Daily GPP from the field carbon flux data, and corresponding remote sensing data (NDVI, FVC, land cover) and meteorological data (SWR, Ta, RH) were selected as the training datasets of machine learning methods. To make the input parameters comparable, the inputs of the training data were normalized firstly. Then five machine learning methods (Cubist regression tree [70,71], RF [72], SVM [26], ANN [73], DBN [74]) were adopted to up-scale the GPP at flux tower sites.

Footprint of GPP at Flux Tower Sites
To analyze the spatial representativeness of GPP from ground eddy covariance systems, FSAM [64][65][66][67] was adopted to study the real-time footprint of GPP at flux tower sites. FSAM was a two-dimensional advection diffusion equation based on the K-theory and the analysis of advection diffusion. Inputs of FSAM included friction velocity (u * ), Obukhov length (L), standard deviation of cross wind velocity fluctuations (δ v ), measurement height (z m ), zeroplane displacement height (z d ) and surface roughness length (z 0 ). We used the meteorological datasets (air pressure, precipitation, wind speed and direction, air temperature and humidity, upward shortwave radiation, downward shortwave radiation, upward longwave radiation, downward longwave radiation) from the automatic meteorological stations to run FSAM and to obtain the near real-time (every 30 min) footprint of the observed data. Then a weighted model was used to calculate the climate footprint (footprint in the growing season from May to September) of GPP in 2014, and the model was described as follows: where i is the step (30 min), N is the number of 30-min intervals of observed GPP in the time step, (x,y) is the location of ground-level point sources, x is in the up-wind direction, and y is in the cross wind direction, f climatology (x, y, z m ) is the climate footprint of GPP, f (x, y, z m ) is the real-time footprint of GPP, Flux(i) is the observed GPP from ground eddy covariance systems.

Validation of Up-Scaled GPP
To assess the performance of the up-scaling GPP using machine learning methods, the up-scaled GPP were validated using GPP at flux tower sites, using weighted GPP in the footprint, and were compared with MODIS GPP products. The determination coefficient (R 2 ) and root mean square error (RMSE) were used to quantify the accuracy of the results.
Firstly, up-scaled GPP pixels were directly compared with corresponding GPP derived from ground eddy covariance systems. Moreover, time series (from May to September in 2014) of up-scaled GPP and GPP at flux tower sites were compared.
Additionally, pixels in the footprint were selected as the validation pixels. Then the GPP at flux tower sites were weighted average in the footprint and were compared with corresponding up-scaled GPP pixels from five machine learning methods. In theory, the footprint of GPP from ground eddy covariance systems is infinite, as the ground-observed flux is the total integral of over an infinite region in the windward direction. In this paper, a 6 km × 6 km area was used to compare the GPP at flux tower sites in the footprint and corresponding up-scaled GPP results. The weight of GPP at flux tower sites in the footprint was calculated by the following: where R is the weight of the pixel, f (x, y, z m ) i is the footprint of i-th pixel, N is the total number of the pixels. At last, the up-scaled GPP was compared to regional MODIS GPP products. Comparing pixel by pixel, Pearson coefficients and difference of up-scaled GPP with MODIS GPP were analyzed. were generally consistent. However, GPP using SVM were higher than those using other models (Cubist, RF, ANN and DBN) in upstream areas. While the opposite situation occurs in the downstream areas; GPP from SVM were lower than those using other models (Cubist, RF, ANN and DBN). In the middle stream, GPP from RF and DBN were higher than those using other models (Cubist, SVM and ANN).
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 22 Bajitan (bare land), RMSE were the highest using ANN. On the whole, up-scaled GPP using Cubist and DBN could most accurately reflect the temporal dynamic patterns of GPP at flux tower sites.  In general, a good linear relationship exists between up-scaled GPP and GPP at flux tower sites, as shown in Figure 4. R 2 was as high as 0.86 and RMSE was only 0.99 g C m −2 d −1 of the up-scaled GPP using RF, which demonstrates that up-scaled GPP could obtain the highest accuracy using RF. Followed by up-scaled GPP using Cubist, R 2 was 0.86 and RMSE was 1.08 g C m −2 d −1 . Up-scaled GPP using DBN gained the lowest precision, R 2 was 0.79 and RMSE was 1.45 g C m −2 d −1 . The accuracy of up-scaled GPP using RF were the highest.

Time Series of the Up-Scaled GPP Using Machine Learning Models
Temporal dynamic patterns of the up-scaled GPP using machine learning methods in nine eddy covariance systems are shown in Figure 5. In general, up-scaled GPP using five machine learning methods could reflect the time series of GPP at flux tower sites. The time series of up-scaled GPP agreed well with GPP in Daman (cropland), Shidi (wetland) and Arou (grassland). Up-scaled GPP using RF agree well with GPP in Sidaoqiao (shrub land), Huyanglin (DBF), Arou (grassland) and Dashalong (grassland). In Daman (cropland, mainly maize), RMSE was the least using SVM. In Huazhaizi (bare land) and Bajitan (bare land), RMSE were the highest using ANN. On the whole, up-scaled GPP using Cubist and DBN could most accurately reflect the temporal dynamic patterns of GPP at flux tower sites.

Footprint of GPP at Flux Tower Sites
The climate footprints of GPP at flux tower sites in the study area are shown in Figure 6. In the upstream area, we found that the footprint distances in the northwest to southeast directions were about 500 m to 1000 m, while the footprint distance was less than 300 m in the other directions at Arou (grassland). Moreover, the GPP footprint distances were less than 500 m, and were mainly distributed in the northwest, southeast, and southwest at Dashalong (grassland). In the middle stream area, the prevailing winds were southwest in Huazhaizi (bare land), southeast and northwest in Shidi (wetland), west and northwest in Daman (cropland), and north and southwest in Bajitan (bareland). The GPP footprint distances ranged from 500 m to 1000 m. In the downstream area, the GPP footprint distances at Huyanglin (DBF) and Hunhelin (MF) ranged from 1200 m to 1500 m, and the GPP footprints ranged from 800 m to 1200 m. As the prevailing wind directions were west and northwest in the downstream, the footprints were also mainly distributed in those directions.
The height of the eddy covariance systems was one of the main influence factors of the GPP footprint [7,9]. Heights of the eddy covariance systems were the highest in Huyanglin

Validation of Up-Scaled GPP at Footprint Scale
The weighted GPP in the footprints were obtained according to the contribution rate, and were compared with the up-scaled GPP using machine learning methods (Figure 8). We found that a good linear relationship existed between the up-scaled GPP and GPP in the footprint at flux tower sites (R 2 ranged from 0.80 to 0.88, and RMSE ranged from 0.89 g C m −2 d −1 to 1.24 g C m −2 d −1 ). Compared with the validation against filed GPP directly, we found that the accuracy was higher when validating the up-scaled GPP at the footprint scale. R 2 could be as high as 0.88 and RMSE was only 0.89 g C m −2 d −1 of the up-scaled GPP using RF, which also demonstrates that up-scaled GPP could obtain the highest accuracy using RF. Dashalong, (f) Daman, (g) Shidi, (h) Bajintan, (i) Arou. The x-coordinate is the distance of footprint at east-west direction, y-coordinate is the distance of footprint at north-south direction. The gray level means cumulative weight the contribution of the footprint.

Validation of Up-Scaled GPP at Footprint Scale
The weighted GPP in the footprints were obtained according to the contribution rate, and were compared with the up-scaled GPP using machine learning methods (Figure 8). We found that a good linear relationship existed between the up-scaled GPP and GPP in the footprint at flux tower sites (R 2 ranged from 0.80 to 0.88, and RMSE ranged from 0.89 g C m −2 d −1 to 1.24 g C m −2 d −1 ). Compared with the validation against filed GPP directly, we found that the accuracy was higher when validating the up-scaled GPP at the footprint scale. R 2 could be as high as 0.88 and RMSE was only 0.89 g C m −2 d −1 of the up-scaled GPP using RF, which also demonstrates that up-scaled GPP could obtain the highest accuracy using RF.

Cross Comparison with MODIS Products
We compared the averaged up-scaled GPP in the growing season (from May to September 2014) with corresponding MODIS GPP products, pixel by pixel, as shown in Figure 9. In general, a good linear relationship existed between the up-scaled GPP and MODIS GPP (R 2 ranged from 0.81 to 0.84, and RMSE ranged from 0.68 g C m −2 d −1 to 0.89 g C m −2 d −1 ). The up-scaled GPP using RF had the strongest relationships with MODIS GPP (R 2 was 0.83 and RMSE was 0.68 g C m −2 d −1 ). However, we could also found that most the plots in Figure 9 were distributed above the 1:1 line, which indicated that up-

Cross Comparison with MODIS Products
We compared the averaged up-scaled GPP in the growing season (from May to September 2014) with corresponding MODIS GPP products, pixel by pixel, as shown in Figure 9. In general, a good linear relationship existed between the up-scaled GPP and MODIS GPP (R 2 ranged from 0.81 to 0.84, and RMSE ranged from 0.68 g C m −2 d −1 to 0.89 g C m −2 d −1 ). The up-scaled GPP using RF had the strongest relationships with MODIS GPP (R 2 was 0.83 and RMSE was 0.68 g C m −2 d −1 ). However, we could also found that most the plots in Figure 9 were distributed above the 1:1 line, which indicated that up-scaled GPP using machine learning methods were higher than MODIS GPP products in most cases. To illustrate this, we compared the MODIS GPP with GPP at flux tower sites, as shown in Figure 10. Although a good linear relationship existed between the MODIS GPP and GPP at flux tower sites, most the plots were distributed under the 1:1 line, which demonstrated that the MODIS GPP products were underestimated in the study area, and lead to a high error (RMSE = 3.66 g C m −2 d −1 ). Previous studies [30] have also shown that in the Heihe River Basin, MODIS GPP products were underestimated. Therefore, in this paper, up-scaled GPP using machine learning methods were higher than MODIS GPP in most cases.
Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 22 in the Heihe River Basin, MODIS GPP products were underestimated. Therefore, in this paper, up-scaled GPP using machine learning methods were higher than MODIS GPP in most cases. Figure 11 demonstrated the difference of up-scaled GPP and MODIS GPP. We found that spatial distribution of up-scaled GPP was consistent with MODIS GPP. The difference of up-scaled GPP with MODIS GPP ranged from 2.00 g C m −2 d −1 to 4.00 g C m −2 d −1 in most of the upstream study area. We also studied the Pearson coefficients between the 8day up-scaled GPP and 8-day MODIS GPP products from May to September in 2014 (Figure 12). The absolute values of Pearson coefficients were higher than 0.60 in most of the upstream study area.

Sensitivity of the Input Data to the Accuracy of Up-Scaled GPP
The uncertainty of the input datasets would have some influence on the accuracy of the up-scaling process. To study how the uncertainty in the output of the up-scaled GPP from machine learning methods can be apportioned to different sources of uncertainty in the inputs data (NDVI, FVC, SWR, RH, Ta), global sensitivity analysis was carried out by using an extended Fourier amplitude sensitivity test (EFAST) [75,76] in this paper. From Table 2, we found that NDVI was the most sensitive variable to the up-scaled GPP, using the five machine learning methods; it was followed by FVC and SWR. RH had the least sensitivity to the up-scaled results. Local sensitivity analysis of the input data to the upscaled GPP was shown in Table 3. A 0.60-0.86% change would happen in the up-scaled GPP when RH increased by 10% or decreased by 10%, and a 5.77-7.35% change would happen when RH increased by 50% or decreased by 50%. More than 5% change would happen with the corresponding change of 10% in NDVI, SWR and FVC, and about 16.21-36.81% change would happen with the corresponding change of 50% in NDVI, SWR and FVC. From Tables 2 and 3, we found that up-scaled GPP was sensitive to NDVI and FVC, which were considered to be the indicators to describe the vegetation phenology, and to monitor the terrestrial photosynthetic vegetation activity. SWR had an impact on the photosynthetic intensity; therefore, it was well related to the temporal changes in GPP. While RH and Ta were not sensitive to the changes in GPP compared to NDVI, FVC and SWR.

Sensitivity of the Input Data to the Accuracy of Up-Scaled GPP
The uncertainty of the input datasets would have some influence on the accuracy of the up-scaling process. To study how the uncertainty in the output of the up-scaled GPP from machine learning methods can be apportioned to different sources of uncertainty in the inputs data (NDVI, FVC, SWR, RH, Ta), global sensitivity analysis was carried out by using an extended Fourier amplitude sensitivity test (EFAST) [75,76] in this paper. From Table 2, we found that NDVI was the most sensitive variable to the up-scaled GPP, using the five machine learning methods; it was followed by FVC and SWR. RH had the least sensitivity to the up-scaled results. Local sensitivity analysis of the input data to the up-scaled GPP was shown in Table 3. A 0.60-0.86% change would happen in the up-scaled GPP when RH increased by 10% or decreased by 10%, and a 5.77-7.35% change would happen when RH increased by 50% or decreased by 50%. More than 5% change would happen with the corresponding change of 10% in NDVI, SWR and FVC, and about 16.21-36.81% change would happen with the corresponding change of 50% in NDVI, SWR and FVC. From Tables 2 and 3, we found that up-scaled GPP was sensitive to NDVI and FVC, which were considered to be the indicators to describe the vegetation phenology, and to monitor the terrestrial photosynthetic vegetation activity. SWR had an impact on the photosynthetic intensity; therefore, it was well related to the temporal changes in GPP. While RH and Ta were not sensitive to the changes in GPP compared to NDVI, FVC and SWR.

Uncertainty Analysis
In the process of up-scaling GPP and assessing the accuracy of the results, the uncertainty of the observed data and the reference data would have some influence on up-scaling and validating the results. Firstly, the ways to derive GPP from the eddy covariance systems would bring some uncertainties in the up-scaling process. The method to process the missing carbon flux data, such as the look-up table and ANN, would induce some errors when interpolating the eddy covariance data. Second, the footprint of field observation data was related with the heights of the eddy covariance instruments, air conditions, and spatial heterogeneity. The errors of calculating the footprint of GPP at the flux sites would certainly bring some uncertainty in validating the up-scaled results. Third, the errors of the input data (NDVI, FVC, SWR, RH, Ta) used in the machine learning models to up-scaled GPP would also have some influence on the precision of the results. The high accuracy of the machine learning methods relied on the high quality of the training data. Although MODIS LSR products, and land-cover products and meteorological datasets have high accuracy, the errors of the MODIS LSR products, land-cover products and meteorological datasets would have some impact on the accuracy of the training data, and also have some influence on the up-scaling GPP using machine learning models.

Limitations and Future Work
The bias of the training datasets to train the machine learning methods may have introduced some errors into the up-scaling of GPP. In this paper, NDVI, FVC, SWR, RH and Ta were used to train the machine learning method. In the future, some other variables, such as soil moisture, may also be taken into account in the model training. Moreover, we may carry out some research about the error transfers in GPP up-scaling when using the machine learning models.
The errors induced by scale transformation of the input data would bring some uncertainty to the up-scaled results. Interpolation of the meteorological data, aggregation of the land-cover data, and the method to match the field observation data and satellite data would cause some errors in the up-scaling of GPP. Improving the method to match the scales of the training data in the up-scaling of GPP, would be another way to improve the accuracy of the up-scaling results.
Moreover, it is still challenging to assess the performance of the scale transform models. In this paper, up-scaled GPP were validated at the field scale, footprint scale and regional scale. Some cross validation methods would also be adopted in the future to analyze the effects of the up-scaling models.

Conclusions
Up-scaling methods in remote sensing provide a new observational approach to extend the scale of GPP estimates. In this paper, GPP from field eddy covariance systems were up-scaled to the regional scale using five machine learning methods (Cubist, RF, SVM, ANN, DBN). Then, footprints of the GPP at flux tower sites were obtained by using an FSAM model. At last, the up-scaled GPP were validated at the field scale, footprint scale and regional scale. The results of this paper demonstrated the applicability and reliability of up-scaling GPP at flux tower sites, with machine learning methods, and up-scaled GPP using RF could obtain the highest accuracy.
Generally, the spatial distributions of up-scaled GPP using five machine learning methods were consistent. Direct validation with GPP at flux tower sites demonstrated that, although up-scaled GPP using machine learning methods (Cubist, RF, SVM, ANN, DBN) could obtain a high accuracy (R 2 ranged from 0.79 to 0.86, and RMSE ranged from 0.99 g C m −2 d −1 to 1.45 g C m −2 d −1 ), up-scaled GPP using RF obtained the highest accuracy (R 2 = 0.86, RMSE = 0.99 g C m −2 d −1 ). Second, compared with validating with GPP at flux tower sites data, better linear relationships were obtained when validating the up-scaled GPP at the footprint scale (R 2 ranged from 0.80 to 0.88, and RMSE ranged from 0.89 g C m −2 d −1 to 1.24 g C m −2 d −1 ). Moreover, the precision of up-scaled GPP using RF was the highest (R 2 = 0.88, RMSE = 0.89 g C m −2 d −1 ). Third, cross comparison with MODIS GPP products demonstrated that a good linear relationship existed between up-scaled GPP and MODIS GPP (R 2 ranged from 0.81 to 0.84, and RMSE ranged from 0.68 g C m −2 d −1 to 0.89 g C m −2 d −1 ). However, up-scaled GPP from machine learning methods were higher than MODIS GPP in most areas.