Three Decades of Gross Primary Production (GPP) in China: Variations, Trends, Attributions, and Prediction Inferred from Multiple Datasets and Time Series Modeling

: The accurate estimation of gross primary production (GPP) is crucial to understanding plant carbon sequestration and grasping the quality of the ecological environment. Nevertheless, due to the inconsistencies of current GPP products, the variations, trends and short-term predictions of GPP have not been sufﬁciently well studied. In this study, we explore the spatiotemporal variability and trends of GPP and its associated climatic and anthropogenic factors in China from 1982 to 2015, mainly based on the optimum light use efﬁciency (LUEopt) product. We also employ an autoregressive integrated moving average (ARIMA) model to forecast the monthly GPP for a one-year lead time. The results show that GPP experienced an upward trend of 2.268 g C/m 2 per year during the studied period, that is, an increasing rate of 3.9% per decade since 1982. However, these trend changes revealed distinct heterogeneity across space and time. The positive trends were mainly distributed in the Yellow River and Huaihe River out of the nine major river basins in China. We found that the dynamics of GPP were concurrently affected by climate factors and human activities. While air temperature and leaf area index (LAI) played dominant roles at a national level, the effects of precipitation, downward shortwave radiation (SRAD), carbon dioxide (CO 2 ) and aerosol optical depth (AOD) exhibited discrepancies in terms of degree and scope. The ARIMA model achieved satisfactory prediction performance in most areas, though the accuracy was inﬂuenced by both data values and data quality. The model can potentially be generalized for other biophysical parameters with distinct seasonality. Our ﬁndings are further veriﬁed and corroborated by four widely used GPP products, demonstrating a good consistency of GPP trends and prediction. Our analysis provides a robust framework for characterizing long-term GPP dynamics that shed light on the improved assessment of the environmental quality of terrestrial ecosystems.


Introduction
Gross primary production (GPP), the summation of carbon fixed by plants during photosynthesis, is of great significance in a variety of ecosystem functions such as growth and respiration [1]. As the main driving force of terrestrial carbon sequestration, GPP has a profound impact on global carbon balance and can partially regulate the atmospheric carbon dioxide (CO 2 ) emissions from fossil fuel consumption [2,3]. Meanwhile, GPP is a key element for ecological regulatory behaviors and an important indicator of ecosystem health [4]. It affects food, fiber, energy, and provides ecosystem services that alleviate the increasingly serious greenhouse effects [5,6].
Characterizing the spatiotemporal variations and trends of GPP is vital for a deeper understanding of the carbon cycle between terrestrial ecosystems and the atmosphere [7]. Due to the profound impact of GPP on terrestrial vegetation, its variations and trends reflect ecological environmental quality [8]. More importantly, GPP dynamics integrate geochemical, ecological and human impacts into terrestrial ecosystems. As a result, the continuous monitoring of GPP will enhance our ability to explain the complex mechanisms of the biosphere [9]. However, since current GPP products rely on assumptions and are based on different estimation models, there is not a set of GPP products that can perform better consistently in different ecosystems and external conditions. As a result, the variations and trends obtained from different GPP products are often differential [10], and thus the research scope of GPP dynamics is limited to the local scale. In previous studies, Xie et al. [11] analyzed the GPP dynamics in China's Three-North Region over the last forty years by comparing the trend differences and the underlying causes in four sub-regions. Ge et al. [12] evaluated the spatiotemporal pattern of the phenological index and the GPP from 2001 to 2017 in the Yungui Plateau, and explored its response to droughts. The aforementioned research has improved the understanding of GPP dynamics in specific areas to varying degrees. However, the external conditions of these study areas were relatively monotonic, i.e., they were at a local scale. The climatic and hydrothermal environments of a study area which can be called large scale are generally not unified, and the research results obtained from single GPP products are also unconvincing [13]. With that being said, local-scale analysis is not conducive to understanding the macro patterns of vegetation productivity, and it is therefore urgent to investigate the variations and trends of GPP at a large scale. At the same time, in order to ensure the accuracy of the analysis results, multiple sets of GPP products should be used to verify each other.
Climate change affects ecosystem productivity by changing the absorption of light, heat, and water during plant growth and development [14,15]. The effects of climate factors on vegetation productivity have significant spatiotemporal differences [16], and the response of the productivity of different vegetation types to climate factors is also inconsistent [17]. In the past few decades, the global climate has experienced major changes represented by frequent extreme weather catastrophes and continuous warming [18]. Ecosystems and terrestrial carbon sinks will be significantly affected by climatic variation. For example, the plant productivity of Europe has decreased by about 30% because of droughts and heat waves [19]. On the other hand, GPP is substantially impacted by human activities [20]. Although a massive implementation of afforestation is conducive to the restoration of vegetation, which could increase GPP, population growth and rapid economic development have intensified urbanization and undermined the balance of terrestrial ecosystems. These human interventions have a significant impact on the formation of GPP dynamics [21,22]. Nevertheless, a thorough impact analysis of climatic variation and human interventions on GPP is necessary and still worth studying, owing to the complex interactions between the carbon cycle and environmental factors.
Since GPP provides important ecological services for human survival [23,24], grounded insights into the short-term prediction of GPP lays the basis for untangling how the terrestrial carbon cycle will evolve in the near future. The autoregressive integrated moving average (ARIMA) model is recognized as a functional statistical tool for analyzing and predicting time series data. The model is valued for its powerful performance [25], with its strength mainly reflected in the recognition and consideration of seasonality and sequence correlation in data [26]. At present, the ARIMA model has been applied in many disciplines such as aerosols [27,28], energy [29], climatology [30], and hydrology [31]. However, the currently available GPP products are inconsistent in terms of periodicity change, space distribution, and data availability [32], and such inconsistencies makes it a challenge to obtain a coherent prediction analysis. Accordingly, whether the ARIMA model can reasonably delineate and predict spatial-temporal patterns of GPP remains unknown.
In recent decades, China's ecological construction has been increasingly conducted because of the rapid growth of the economy and population [33,34]. In view of the important role of GPP in the ecosystem, the study of GPP in China is of great significance in deepening the understanding of the terrestrial ecosystem and providing basic support for China's ecological construction and policy-making. In light of this, our study aims to analyze the variations, trends, and impact factors of GPP dynamics over more than three decades (from 1982 to 2015) in China, and further predict short-term GPP dynamics. The specific objectives of our study are: (1) to characterize the GPP dynamics in nine major river basins in China at both seasonal and annual scales; (2) to assess the response of GPP tendencies to climatic variations and human activities; and (3) to examine the feasibility of the ARIMA model in simulating and forecasting GPP profiles and to pinpoint the potential sources of uncertainty associated with model performance.

Study Area
China is located in the middle and low latitudes of the eastern hemisphere, with a huge land span and a vast territory. In eastern China, the continental monsoon that is dry and cold prevails in winter, while the maritime monsoon that is rainy and humid prevails in summer. As for western China, the climate is drought due to the distance from the sea [35].
Due to the complex geographical condition of China, the distribution of GPP has substantial spatiotemporal variability. To better examine such variability, we divide China into nine sub-regions based on its river systems [36] and watershed ranges. They are the Songhua and Liaohe River Basin (SRB), the Continental Basin (CB), the Haihe River Basin (HRB), the Huaihe River Basin (HuRB), the Yellow River Basin (YERB), the Yangtze River Basin (YRB), the Southwest Basin (SWB), the Southeast Basin (SEB) and the Pearl River Basin (PRB). Figure 1a shows the geographical distribution of each basin and the corresponding vegetation types.

GPP Product
As mentioned in the introduction, since large inconsistencies exist in current GPP products, five sets of GPP products were collected to ensure the robustness of our research. The available data included the optimum light use efficiency (LUEopt) product, the GPPNIRv product, the Global Land Surface Satellite (GLASS) product, the FluxCom product, and the global Orbiting Carbon Observatory-2 solar-induced chlorophyll fluorescence (GOSIF) product. The individual product information for each product used in this study can be found in Table 1, as detailed below.
The LUEopt product is obtained from The Oak Ridge National Laboratory (https: //earthdata.nasa.gov/, accessed on 17 October 2021). Based on the light use efficiency (LUE) models, this product is improved by the optimum LUE originating from the global FLUXNET network [37] and is proven to obtain a better understanding of the response of primary productivity to climatic variation [38]. Zhang et al. [10] analyzed and compared 45 global GPP products and found that the estimated value of LUEopt on the seasonal scale was closer to the average value of the 45 sets of products compared with the other products we collected. Considering that analysis on a seasonal scale is an important part of this study, and LUEopt itself has relatively high spatio-temporal resolution, it was mainly used for analysis. The spatial distribution of the annual average GPP of this product during the period 1982-2015 is shown in Figure 1b.
Among the remaining GPP products for verifying the reliability of the analysis results, the GPPNIRv product was obtained from the National Tibetan Plateau Data Center (http: //data.tpdc.ac.cn/en/, accessed on 17 October 2021). This dataset provides global longterm GPP based on the strong correlation between near-infrared reflectance (NIRv) and GPP, demonstrating the ability of NIRv in capturing the long-term trends of GPP [39].

GPP Product
As mentioned in the introduction, since large inconsistencies exist in current GPP products, five sets of GPP products were collected to ensure the robustness of our research. The available data included the optimum light use efficiency (LUEopt) product, the GPPNIRv product, the Global Land Surface Satellite (GLASS) product, the FluxCom product, and the global Orbiting Carbon Observatory-2 solar-induced chlorophyll fluorescence (GOSIF) product. The individual product information for each product used in this study can be found in Table 1, as detailed below. The GLASS product was provided by the National Earth System Science Data Center (http://www.geodata.cn/, accessed on 25 October 2021). With the characteristics of satisfactory precision, high resolution, and a wide range of spatial-temporal coverage, this product provides powerful assistance for the study of global ecological and environmental changes [40].
The FluxCom product was downloaded from the FLUXCOM initiative (http://fluxcom. org/, accessed on 29 October 2021). It combines remote sensing and meteorological data based on a variety of machine learning algorithms, such as artificial neural network and Remote Sens. 2022, 14, 2564 5 of 22 random forest, and generates global flux tower GPP products through eddy covariance technology [41]. The GOSIF product is available from the Global Ecology Group (https://globalecology. unh.edu/data/GOSIF.html, accessed on 2 November 2021). This product is generated based on the solar-induced chlorophyll fluorescence (SIF) retrieved from the Orbiting Carbon Observatory-2 (OCO-2). SIF is regarded as an indicator of the functional state of actual plant photosynthesis [42]. The GOSIF product is suitable for studying carbon cycle and ecosystem function at various temporal and spatial scales with high spatial resolution, a good time range, and globally continuous coverage [43].

Climate and Anthropogenic Variables
The impact factors considered in this study included climatic and anthropogenic factors. Because of the impact of the hydrothermal environment and solar radiation on plant growth, the influencing factors of climate variation were selected as air temperature, precipitation, and downward shortwave radiation (SRAD). In terms of human activities, the influencing factors included carbon dioxide (CO 2 ), aerosol optical depth (AOD), and leaf area index (LAI). Among them, CO 2 and AOD can partly reflect industrial production activities and LAI can partly reflect the degree of vegetation restoration and protection.
Air temperature, precipitation and SRAD data were provided by the Terraclimate dataset (https://www.climatologylab.org/terraclimate.html, accessed on 4 November 2021). Using climatically aided interpolation, the Terraclimate dataset combines the World-Clim dataset with multiple sources of coarse resolution data, generating monthly climate data at a 4 km spatial resolution. These datasets provide an important basis for large-scale ecological environment research [44].
LAI data are available from the Advanced Very-High-Resolution Radiometer. The data are based on the National Oceanic and Atmospheric Administration's climate data record (CDR) program and are produced by an artificial neural network algorithm, providing high-quality time series data for climate research [45]. CO 2 data are available from the greenhouse gases observing satellite (GOSAT) "IBUKI" (https://www.gosat.nies.go.jp/en/, accessed on 7 November 2021). Launched in 2019, the satellite is specially designed for the observation of greenhouse gases. The data obtained by the satellite were compared with the ground station data and proven to be reliable [46]. Robotics Network (AERONET), the Multi-angle Imaging SpectroRadiometer (MISR), and the Advanced Very-High-Resolution Radiometer (AVHRR) [47].

Trend Analysis
The significance of time series trends can be detected by parametric or non-parametric methods. Since GPP generally does not follow normal distribution, commonly used parametric methods, such as the t-test, were not applicable [48]. In light of this, we used non-parametric methods that only required the data to be independent to detect the trends of GPP.

Mann-Kendall Test
The Mann-Kendall test reflects the variation of the latter part of the observed value compared with the previously observed value [49]. It can be described as follows: where n is the number of research years, x i and x j are the GPP values in year i and j (j > i), respectively, and the definition of sgn x j − x i is: The variance Var(S) is calculated as: where n is the length of years, m represents the number of tied groups, and k i is the number of ties in extent i. The standard normal deviate (Z statistics) is calculated as: A positive value of Z S means an upward trend while a negative Z S value indicates a downward trend. We tested the trend at the significance α level. If |Z S | > Z 1−α/2 , it means rejecting the null hypothesis at the α level and that there is a significant trend in the GPP time series.

Sen's Slope Estimator
Sen proposed a non-parametric method for calculating the slope of trend in time series data with the sample of N pairs [50]: where x j and x j represent the GPP values at year j and k (j > k), respectively. If each period has only one datum, then N = n(n − 1)/2, where n represents the number of data points.
If there are multiple values in periods, then N < n(n − 1)/2, where n represents the total number of values.
Values of Q i are ranked from large to small, and the median of the slope estimator is calculated as: , if N is even The Q med sign reflects the variation of data, and its value shows the gradient of the trend. We need to get the confidence interval of Q med at the α level to judge whether the values of Q med are statistically different from zero.
The confidence interval of the slope of time series can be calculated as: We define that M 1 = (N − C α )/2 and M 2 = (N + C α )/2. Q max and Q min represent the range of the confidence interval corresponding to the (M 2 + 1)th largest and the M 1 th largest values in the N order slope estimation, respectively. The slope Q med is different than zero statistically if Q min and Q max have the same sign.

Correlation Analysis
The GPP is a coupling process of multiple influencing factors in its growing environment. To analyze the relationship between GPP and these influencing factors, the correlation coefficients between GPP and air temperature, precipitation, SRAD, LAI, CO 2 , and AOD are calculated separately.
The correlation value is [-1, 1], and the calculation formula is as follows: where R xy is the correlation coefficient between factor x and y, indicating the correlation degree between the two factors; n is the number of research years; x i is the GPP of vegetation in year i; and y i is the mean value of correlation factor in year i. R xy > 0 indicates a positive correlation whereas R xy < 0 indicates a negative correlation, and x and y are the average values of the sample values of the two elements, respectively. The significance of the correlation is further tested afterwards.

Autoregressive Integrated Moving Average (ARIMA) Model
The time series ARIMA model is used for analysis and prediction. The time series data formed by samples changing with time are regarded as a random series. Its profile is simulated by fitting a best-fit ARIMA model according to the past and current values of the time series, and the built model is used to forecast future GPP values. Broadly, the establishment of the ARIMA model mainly has three processes: model identification, parameter and diagnostic checking, and prediction. After completing the three processes, we carefully selected the most suitable model from the above stages to predict the future values. Figure 2 shows the schematic workflow of the ARIMA model.

Identification
The first step of the ARIMA model is to estimate the stationarity of the time series, which can be examined by the autocorrelation function and partial autocorrelation function. The attenuation rate of the autocorrelation function plot reflects whether the data are stable or not. Non-stationary time series data sets require a different process to make them stationary. The ARIMA model can be defined by three operators (p, d, q), where p, d, q denotes the orders of the auto regressive (AR) process, the differencing process, and the moving average (MA) process, respectively, and is given by: where X t is the GPP at time point t; X t is the backshift operator that follows BX t = X t−1 , is the MA operator and θ q (B) = 1 + θ 1 B + θ 2 B 2 + . . . + θ q B q ; and ε t represents the random errors that resemble a white noise process ε t ∼ W N 0, σ 2 . Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 24

Analysis Framework
In this study, the trends of GPP at annual and seasonal scales were first analyzed in China during 1982-2015 by using the Mann-Kendall test and Sen's method. The Mann-Kendall test was used to evaluate the reliability of the results, and Sen's slope method was used to quantify the trends of GPP. We reorganized the original GPP products into seasonal time series data and annual time series data, which were used as the input of the Mann-Kendall test and Sen's method, respectively. We divided the twelve months into four seasons according to the following definitions: March-April-May representing spring; June-July-August representing summer; September-October-November representing autumn; and December-January-February representing winter.
The Pearson correlation coefficient was calculated to ascertain the correlation degree between GPP and the impacting factor. We calculated the annual mean of each impact factor and then computed its correlation with the annual mean of GPP. Potential causes that accounted for the trend results were proposed from the perspectives of climatic variation and human activities.
The ARIMA model was applied to forecast the monthly GPP in 2016. Taking RSME, MAE, rRMSE and rMAE as indicators, the predicted values were compared against the reference values. We also assessed the model's uncertainty from the perspectives of data range and data availability.
To investigate the robustness of our framework, we further compared the variations, trends, and prediction results from LUEopt GPP against those from the other three products, i.e., the FluxCom product, the GPPNIRv product, and the GLASS product. The GOSIF product was not involved in the trend analysis because it did not contain the data before 2001. However, all sets of the products were used for the input of the ARIMA model. Similarly, we took RMSE and MAE as indicators to assess the reliability of the ARIMA prediction results.

Parameter and Diagnostic Checking
After passing the stationarity test of time series data, we fit the ARIMA model by estimating the values of p and q. For a model that can adequately describe the trends of the GPP sequence, the distribution of the residuals should be similar to white noise. Meanwhile, the goodness of fit statistics of the model should be the most accurate among the candidate models. Based on this, we considered the following test statistics: mean absolute error (MAE), root mean squared error (RMSE), relative mean absolute error (rMAE), and relative root mean square error (rRMSE), the specific definitions of these statistics can be found in the Appendix A. Taken together, we used these metrics to assess the fitting effect of the ARIMA model on the GPP series.

Prediction and Evaluation
The last procedure is to establish a prediction model. Specifically, the model that provides the best diagnostic statistics (the values of MAE, RMSE, rMAE, and rRMSE are the lowest) is used to predict the GPP values of the next 12 months. In our study, we divided China into 410 × 684 pixels, and each pixel of GPP data was regarded as a separate model. The above three stages were used to iteratively fit the model to the individual grid cells and finally generate the national prediction map.

Analysis Framework
In this study, the trends of GPP at annual and seasonal scales were first analyzed in China during 1982-2015 by using the Mann-Kendall test and Sen's method. The Mann-Kendall test was used to evaluate the reliability of the results, and Sen's slope method was used to quantify the trends of GPP. We reorganized the original GPP products into seasonal time series data and annual time series data, which were used as the input of the Mann-Kendall test and Sen's method, respectively. We divided the twelve months into four seasons according to the following definitions: March-April-May representing spring; June-July-August representing summer; September-October-November representing autumn; and December-January-February representing winter.
The Pearson correlation coefficient was calculated to ascertain the correlation degree between GPP and the impacting factor. We calculated the annual mean of each impact factor and then computed its correlation with the annual mean of GPP. Potential causes that accounted for the trend results were proposed from the perspectives of climatic variation and human activities.
The ARIMA model was applied to forecast the monthly GPP in 2016. Taking RSME, MAE, rRMSE and rMAE as indicators, the predicted values were compared against the reference values. We also assessed the model's uncertainty from the perspectives of data range and data availability.
To investigate the robustness of our framework, we further compared the variations, trends, and prediction results from LUEopt GPP against those from the other three products, i.e., the FluxCom product, the GPPNIRv product, and the GLASS product. The GOSIF product was not involved in the trend analysis because it did not contain the data before 2001. However, all sets of the products were used for the input of the ARIMA model. Similarly, we took RMSE and MAE as indicators to assess the reliability of the ARIMA prediction results. Figure 3 exhibits the spatial distribution of the annual and seasonal trends of GPP nationwide while Figure 4 shows the GPP trends at the basin scale. During the last thirtyfour years, the GPP of terrestrial vegetation in China has shown an increasing trend, consistent with the studies of Anav et al. [32] that focused on multiple sets of GPP data. This can be explained by the effect of CO 2 concentration and air temperature rise on vegetation production in recent years [51]. On the other hand, in the 21st century, the Chinese government has issued a series of policies to restore the ecological environment, such as the Grain to Green Program [52,53]. These policies may facilitate an increase in GPP [54,55]. However, specific to basin scale, certain differences are detected in the trends of varying river basins at different time scales.

Annual and Seasonal Trends of GPP Calculated from LUEopt Products
Annually, except for the SEB which showed a decreasing trend of −0.19 g C/m 2 /year), the remaining basins all exhibited an increasing trend. Among them, the HRB showed the largest average ascending trend (3.43 g C/m 2 /year), and the YRB ranked the second highest (2.67 g C/m 2 /year), followed by the SRB, the HuRB, the YERB, and the SWB, with the four being nearly equal (2.19, 2.07, 2.06 and 1.82 g C/m 2 /year). Because of the large variability within pixels (the standard deviation is 5.48 g C/m 2 /year), the PRB showed an increasing trend of only 1.05 g C/m 2 /year. With an inherently small mean GPP value, the CB only showed a slight increase (0.24 g C/m 2 /year). Relatively significant upward trends were mainly distributed in the HRB, the YERB, and the border areas of the YRB, the SWB, and the PRB. Meanwhile, the significant downward trends were mainly distributed in the eastern region of the PRB. Our results are consistent with the conclusions of Ge et al. [12].
Seasonally, the four northern basins (SRB, CB, HRB and YERB) revealed an increasing trend in spring, summer and autumn, despite a stagnant period in winter, which was probably because of the low air temperature and SRAD in winter that leads to the stagnation of plant photosynthesis [56]. In the HuRB and the YRB, which are located in the central part of China, a significant increasing trend was detected in spring. Although there was a downward trend in summer, the ascending trends in the other seasons rendered the annual trend of the two basins to be rising. In the SEB and the PRB, statistically significant decreasing trends were found in summer. This can be attributed to the fact that high air temperature is more likely to occur in summer, which inhibits vegetation growth and is not conducive to increases in GPP [57]. Due to the increasing trend of GPP in spring and winter, the annual trend of the PRB exhibited an increasing state. The SEB was the only basin with a downward annual trend because of the decreasing trend in summer (−2.56 g C/m 2 /year). Nevertheless, the SWB maintained an upward trend consistent with the other three seasons in summer when the other basins in the south of China generally showed a downward trend. This can be explained by the fact that the SWB area has a high altitude and the terrain is mainly mountainous, which is not favorable to having extremely high air temperatures in summer [58]. On the whole, the noticeable discrepancies of the GPP trends in the nine river basins mainly occurred in summer, which mostly dominated the annual trend pattern of the basin.    Figure 5 displays the spatial pattern of primary impacting variables and Figure 6 shows the Pearson correlation coefficient between the individual factor and GPP annual variation. At a national scale, the correlation coefficients (R) were 0.318 (LAI), 0.225 (air temperature), 0.16 (CO2), 0.053 (precipitation), −0.012 (SRAD) and −0.056 (AOD), respectively. At the basin scale, different factors had obvious regional discrepancies in the impacts of the annual variation of GPP [9,59]; these regional discrepancies were mainly reflected in the differences in hydrothermal conditions and the strength of human activities. The average value and standard deviation of the Pearson correlation coefficient in each basin are summarized in Table 2.

Attributions of Long-Term Changes in GPP
The hydrothermal conditions of water and air temperature and their interaction are generally considered the main climate factors affecting vegetation growth and distribution [60]. In our study, GPP and air temperature did show a relatively positive correlation in most parts of China, except the PRB and the SEB, which can be related to the fact that high air temperature in summer is beyond the optimal growth range of plants, leading to an increase in evapotranspiration and a decrease in soil water availability for vegetation, and thus inhibiting the growth of plants [57,59]. Notably, the positive correlation between GPP and air temperature in the SRB was not as substantial as in the other basins, probably owing to the reduction in precipitation that weakens the positive effect of warming on GPP [61]. A discrepancy in the GPP-precipitation correlation between the north and south regions was observed (Figure 6c), i.e., the correlation decreased from north to south. Areas with positive correlation were mainly distributed in the CB, the HRB and the YERB, which possess relatively little precipitation. This implies that, in arid areas, the increase in precipitation had a positive effect on GPP, while the sensitivity of GPP to precipitation was less prominent in humid areas. Unlike the driving force of photosynthesis [62], our study did not show a strong positive correlation between SRAD and GPP, and some basins in the north, such as the CB and the HRB, even showed a negative correlation (R < −0.2). One explanation for this phenomenon is that strong solar radiation often reflects a lack of cloud cover, resulting in a reduction in precipitation and the enhancement of soil water evaporation [63]. As mentioned above, in arid regions such as northern China, vegetation is very sensitive to water changes, and the promotion of photosynthesis by radiation is not sufficient to offset the limitations brought by drought.  Figure 5 displays the spatial pattern of primary impacting variables and Figure 6 shows the Pearson correlation coefficient between the individual factor and GPP annual variation. At a national scale, the correlation coefficients (R) were 0.318 (LAI), 0.225 (air temperature), 0.16 (CO 2 ), 0.053 (precipitation), −0.012 (SRAD) and −0.056 (AOD), respectively. At the basin scale, different factors had obvious regional discrepancies in the impacts of the annual variation of GPP [9,59]; these regional discrepancies were mainly reflected in the differences in hydrothermal conditions and the strength of human activities. The average value and standard deviation of the Pearson correlation coefficient in each basin are summarized in Table 2. Table 2. Area-averaged (mean) correlation coefficient and its standard deviation (STD) in each basin.

Region
Air The hydrothermal conditions of water and air temperature and their interaction are generally considered the main climate factors affecting vegetation growth and distribution [60]. In our study, GPP and air temperature did show a relatively positive correlation in most parts of China, except the PRB and the SEB, which can be related to the fact that high air temperature in summer is beyond the optimal growth range of plants, leading to an increase in evapotranspiration and a decrease in soil water availability for vegetation, and thus inhibiting the growth of plants [57,59]. Notably, the positive correlation between GPP and air temperature in the SRB was not as substantial as in the other basins, probably owing to the reduction in precipitation that weakens the positive effect of warming on GPP [61]. A discrepancy in the GPP-precipitation correlation between the north and south regions was observed (Figure 6c), i.e., the correlation decreased from north to south. Areas with positive correlation were mainly distributed in the CB, the HRB and the YERB, which possess relatively little precipitation. This implies that, in arid areas, the increase in precipitation had a positive effect on GPP, while the sensitivity of GPP to precipitation was less prominent in humid areas. Unlike the driving force of photosynthesis [62], our study did not show a strong positive correlation between SRAD and GPP, and some basins in the north, such as the CB and the HRB, even showed a negative correlation (R < −0.2). One explanation for this phenomenon is that strong solar radiation often reflects a lack of cloud cover, resulting in a reduction in precipitation and the enhancement of soil water evaporation [63]. As mentioned above, in arid regions such as northern China, vegetation is very sensitive to water changes, and the promotion of photosynthesis by radiation is not sufficient to offset the limitations brought by drought.  Human activities can have substantial effects on vegetation productivity. Influenced by the policies of the Chinese government, for example, during 1999-2010, 16,000 square kilometers of farmland were turned into forest or grass by the Grain to Green Program [64]. LAI in most parts of China have shown an upward trend, which has a significant correlation with the trends of GPP. In the YERB, the northern HRB, the western SRB, and the eastern CB, the correlation was particularly evident (R > 0.3), implying that the government's efforts in improving the environmental quality of ecosystems have been effective.

Short-Term Prediction of GPP by ARIMA Model
Time series analysis provides an effective tool for GPP modeling. In this study, the ARIMA model was used to simulate and predict GPP based on the monthly dataset during 1982-2015. Figure 7 shows the comparison between the annual average of reference GPP and the ARIMA-predicted GPP in 2016. The spatial pattern of the predicted value is basically consistent with that of the reference values, especially in the northern regions such as the SRB, the HRB and the YERB. Areas with a large degree of uncertainty were mainly located in the YRB and the SEB. Overall, the model has a strong capacity for predicting short-term GPP time series. Another factor regarding human activities is the increase in CO 2 emissions. For example, in 2011, the use of fossil fuels related to human activities resulted in 33.2 billion tons of CO 2 emissions worldwide [65]. The increase in atmospheric CO 2 concentration can improve vegetation productivity. Devaraju et al. [66] found that global primary productivity increases by about 2.3 Pg C each year. Schimel et al. [6] reported that about 60% of the carbon absorbed by land comes from CO 2 emissions. Our results show that the rise in CO 2 emissions in most parts of China can promote the growth of GPP. However, the correlation between CO 2 and GPP is not significant in the SWB, the southern CB, the western YRB, or the northern SRB. Among them, the SWB was even negatively correlated (R = −0.101). This may have been due to the air temperature, which is low in these areas, and the GPP dynamic is more affected by hydrothermal conditions rather than CO 2 .
Furthermore, due to the human activities caused by rapid urbanization and industrial expansion, the air pollution level of more than 90% of China's cities exceeds the standards of the World Health Organization [67]. Since the amplitude of aerosols reflects the degree of air pollution to a certain extent, AOD is used to quantify aerosols [68,69]. In this study, we analyzed the correlation between AOD and GPP. A negative correlation between AOD and GPP was observed on a national scale. This is acceptable, since the absorption of atmospheric aerosols reduces the photosynthetically active radiation in plants [70,71], which affects vegetation productivity. However, this correlation was weak, regardless of whether it was positive or negative. For example, even where there was a positive correlation in some areas, such as the HRB, the correlation was still very weak. This is probably because factors such as air temperature and precipitation often have a more obvious impact on vegetation growth, which dampens the response of vegetation productivity to AOD. Another possible explanation is the complex climatic effects of aerosol species. The net effect might be small due to their counteract responses to GPP. In future work, the multivariate linear regression that can be used to control other factors will be considered for critical analysis.

Short-Term Prediction of GPP by ARIMA Model
Time series analysis provides an effective tool for GPP modeling. In this study, the ARIMA model was used to simulate and predict GPP based on the monthly dataset during 1982-2015. Figure 7 shows the comparison between the annual average of reference GPP and the ARIMA-predicted GPP in 2016. The spatial pattern of the predicted value is basically consistent with that of the reference values, especially in the northern regions such as the SRB, the HRB and the YERB. Areas with a large degree of uncertainty were mainly located in the YRB and the SEB. Overall, the model has a strong capacity for predicting short-term GPP time series.   24.8%, and 16.5%, respectively. Regionally, the ARIMA model showed relatively stable results for the northern basins such as the SRB, the HRB, the CB and the YERB (the averaged values of RMSE and MAE in these four basins were 0.383 g C/m 2 /d and 0.248 g C/m 2 /d, respectively), whereas its performance for southern basins with high GPP values such as the YRB and the PRB tended to be more volatile (the average values of RMSE and MAE in these two basins were 0.822 g C/m 2 /d and 0.606 g C/m 2 /d, respectively). This is probably related to the high correlation between low accuracies and high GPP values.  Regionally, the ARIMA model showed relatively stable results for the northern basins such as the SRB, the HRB, the CB and the YERB (the averaged values of RMSE and MAE in these four basins were 0.383 g C/m 2 /d and 0.248 g C/m 2 /d, respectively), whereas its performance for southern basins with high GPP values such as the YRB and the PRB tended to be more volatile (the average values of RMSE and MAE in these two basins were 0.822 g C/m 2 /d and 0.606 g C/m 2 /d, respectively). This is probably related to the high correlation between low accuracies and high GPP values.
respectively. Regionally, the ARIMA model showed relatively stable results for the northern basins such as the SRB, the HRB, the CB and the YERB (the averaged values of RMSE and MAE in these four basins were 0.383 g C/m 2 /d and 0.248 g C/m 2 /d, respectively), whereas its performance for southern basins with high GPP values such as the YRB and the PRB tended to be more volatile (the average values of RMSE and MAE in these two basins were 0.822 g C/m 2 /d and 0.606 g C/m 2 /d, respectively). This is probably related to the high correlation between low accuracies and high GPP values.  Earlier studies have demonstrated that the availability of data greatly affects the prediction ability of the ARIMA model [72]. The uncertainty of the ARIMA model regarding the data value range and data availability is shown in Figure 9. It is observed that pixels with poor availability are mainly distributed in the CB and the SWB, and the GPP values of these two regions are relatively low. The results indeed show that the model capability is closely related to data availability. The more available the data are, the better the forecast result will be, and vice versa. Moreover, the ability of the ARIMA model is noticeably affected by GPP values (Figure 9a,b). The curves of Figure 9c,d both show an upward trend in the availability of 100%. This is because the pixels with availability below 100% have lower GPP values than those with almost 100% data coverage, leading to their comparatively high accuracy. In other words, despite some places with poor data quality, the precision of GPP prediction is relatively high because of the low GPP values. All in all, the precision of the model is affected by the value and quality of the data together [73].
the forecast result will be, and vice versa. Moreover, the ability of the ARIMA model is noticeably affected by GPP values (Figure 9a,b). The curves of Figure 9c,d both show an upward trend in the availability of 100%. This is because the pixels with availability below 100% have lower GPP values than those with almost 100% data coverage, leading to their comparatively high accuracy. In other words, despite some places with poor data quality, the precision of GPP prediction is relatively high because of the low GPP values. All in all, the precision of the model is affected by the value and quality of the data together [73].

Comparison of Multi-Source Data
Considering the inconsistencies in the values of current GPP products among different regions, seasons and vegetation types [10], multiple sets of GPP products were selected to further corroborate our results. It should be mentioned that our study merely focuses on consistencies and discrepancies in GPP variation, trends and predication, rather the intrinsic difference between the GPP products. Figure 10 shows the temporal and spatial comparison of the monthly mean values of different GPP products. It is observed that five sets of GPP products are consistent in seasonality, i.e., the value is smaller in winter while the value reaches the peak in summer. Spatially, the coincident patterns occur in the SRB, the HRB and the YRB regions.

Comparison of Multi-Source Data
Considering the inconsistencies in the values of current GPP products among different regions, seasons and vegetation types [10], multiple sets of GPP products were selected to further corroborate our results. It should be mentioned that our study merely focuses on consistencies and discrepancies in GPP variation, trends and predication, rather the intrinsic difference between the GPP products. Figure 10 shows the temporal and spatial comparison of the monthly mean values of different GPP products. It is observed that five sets of GPP products are consistent in seasonality, i.e., the value is smaller in winter while the value reaches the peak in summer. Spatially, the coincident patterns occur in the SRB, the HRB and the YRB regions. Remarkable differences are found in the northwest inland region represented by the CB and the southeast coastal region represented by the SEB. The differences in the CB may be related to the lesser vegetation coverage and the inconsistencies of the GPP anomalies among different products. As for the SEB, the discrepancy in GPP values can be explained by the fact that different products are discordant in the quantification of the impact of the hydrothermal environment on GPP.
The trends of GPP in each basin obtained by different products are shown in Figure 11. The overall trends calculated by each set of products are consistent, which demonstrates the growing trend of GPP in China. However, there are certain differences among basins. Such differences mainly appear in the SEB and the PRB where the GPPNIRv product shows a positive trend (0.112 g C/m 2 /year and 0.447 g C/m 2 /year), and the GLASS product a negative one (−0.338 g C/m 2 /year and −0.028 g C/m 2 /year). The LUEopt product shows a negative trend in the SEB (−0.186 g C/m 2 /year), and a positive trend in the PRB (1.051 g C/m 2 /year). Compared with the above products, the results generated by FluxCom products were more conservative, i.e., the maximum value occurred in the SEB (0.07 g C/m 2 /year) and the minimum value happened in the HuRB (−0.014 g C/m 2 /year). Remarkable differences are found in the northwest inland region represented by the CB and the southeast coastal region represented by the SEB. The differences in the CB may be related to the lesser vegetation coverage and the inconsistencies of the GPP anomalies among different products. As for the SEB, the discrepancy in GPP values can be explained by the fact that different products are discordant in the quantification of the impact of the hydrothermal environment on GPP. The trends of GPP in each basin obtained by different products are shown in Figure  11. The overall trends calculated by each set of products are consistent, which demonstrates the growing trend of GPP in China. However, there are certain differences among basins. Such differences mainly appear in the SEB and the PRB where the GPPNIRv product shows a positive trend (0.112 g C/m 2 /year and 0.447 g C/m 2 /year), and the GLASS product a negative one (−0.338 g C/m 2 /year and −0.028 g C/m 2 /year). The LUEopt product shows a negative trend in the SEB (−0.186 g C/m 2 /year), and a positive trend in the PRB (1.051 g C/m 2 /year). Compared with the above products, the results generated by FluxCom products were more conservative, i.e., the maximum value occurred in the SEB (0.07 g C/m 2 /year) and the minimum value happened in the HuRB (−0.014 g C/m 2 /year).  Figure 12 shows the accuracy of the predicted GPP using ARIMA obtained by the four products. The prediction results of each set of products have good accuracy. It is observed that the FluxCom product showed better accuracy, which may be related to its low GPP value. The GPPNIRv product had the most variable accuracy. Consistent with the above conclusions that the performance of the ARIMA model is affected by the GPP values, the predicted value of the model had high uncertainty in areas with high GPP values, such as the PRB (the averaged RMSE and MAE of the four sets of products were 0.36 g C/m 2 /d and 0.289 g C/m 2 /d, respectively). In contrast, in areas with low GPP values such as the CB, the predicted value of the model had better accuracy (the averaged RMSE and MAE of the four sets of products were 0.03 g C/m 2 /d and 0.025 g C/m 2 /d, respectively).  Figure 12 shows the accuracy of the predicted GPP using ARIMA obtained by the four products. The prediction results of each set of products have good accuracy. It is observed that the FluxCom product showed better accuracy, which may be related to its low GPP value. The GPPNIRv product had the most variable accuracy. Consistent with the above conclusions that the performance of the ARIMA model is affected by the GPP values, the predicted value of the model had high uncertainty in areas with high GPP values, such as the PRB (the averaged RMSE and MAE of the four sets of products were 0.36 g C/m 2 /d and 0.289 g C/m 2 /d, respectively). In contrast, in areas with low GPP values such as the CB, the predicted value of the model had better accuracy (the averaged RMSE and MAE of the four sets of products were 0.03 g C/m 2 /d and 0.025 g C/m 2 /d, respectively). Figure 12 shows the accuracy of the predicted GPP using ARIMA obtained by the four products. The prediction results of each set of products have good accuracy. It is observed that the FluxCom product showed better accuracy, which may be related to its low GPP value. The GPPNIRv product had the most variable accuracy. Consistent with the above conclusions that the performance of the ARIMA model is affected by the GPP values, the predicted value of the model had high uncertainty in areas with high GPP values, such as the PRB (the averaged RMSE and MAE of the four sets of products were 0.36 g C/m 2 /d and 0.289 g C/m 2 /d, respectively). In contrast, in areas with low GPP values such as the CB, the predicted value of the model had better accuracy (the averaged RMSE and MAE of the four sets of products were 0.03 g C/m 2 /d and 0.025 g C/m 2 /d, respectively).

Implications for Other Biophysical Studies
Our previous work proposed the use of the ARIMA model in estimating atmospheric parameters such as aerosols at local and global scales [27,28,72]. In a broader sense, we are motivated to apply the time series model to biophysical parameters such as GPP. The reasonable forecasts of the present study demonstrate the generalization of the model, and that it was able to capture the trends and periodicity inherent in the time series. We hope that the stochastic ARIMA model will benefit the simulation and prediction of biophysical parameters that are characterized by strong seasonality, such as those related to the vegetation patterns and dynamics. To make it more appropriate for planning and regulation purposes, as well as suitable for reducing uncertainties in global vegetation

Implications for Other Biophysical Studies
Our previous work proposed the use of the ARIMA model in estimating atmospheric parameters such as aerosols at local and global scales [27,28,72]. In a broader sense, we are motivated to apply the time series model to biophysical parameters such as GPP. The reasonable forecasts of the present study demonstrate the generalization of the model, and that it was able to capture the trends and periodicity inherent in the time series. We hope that the stochastic ARIMA model will benefit the simulation and prediction of biophysical parameters that are characterized by strong seasonality, such as those related to the vegetation patterns and dynamics. To make it more appropriate for planning and regulation purposes, as well as suitable for reducing uncertainties in global vegetation models [32], our future work plans to capitalize on cutting-edge computing platforms such as the Google Earth Engine (GEE), which enables automatic and mass production.
On the other hand, it should also be noted that this trend analysis of GPP was mainly based on long-term variations of time series, and the ability to capture short-term mutations caused by extreme phenomena was insufficient. Gampe et al. [74] identified local extremes at the grid level and joined them with a flood-filling method to capture extreme GPP events, which provides a good strategy for our further work. In addition, the correlation analysis between impacting factors and GPP was conducted only at an annual scale, but not a seasonal scale, and there was no specific quantification of the relative contribution of each factor. Another deficiency in the correlation analysis was that, subject to the problem of spatio-temporal resolution, we selected LAI, AOD, and CO 2 as the influencing factors to reflect human activities. However, these data can only partially reflect human activities. How to quantify the importance and contribution of each impact factor seasonally and how to extract information that can fully reflect human activities will be another focus of our future work. Moreover, in terms of the discrepancies in the results obtained by different GPP products, an exploration of the reasons for these inconsistencies could be given more attention. The production principles and input parameters for each set of data should be further considered to clarify the source of these discrepancies.

Conclusions
This study used the Mann-Kendall and Sen's methods to analyze the trends of GPP dynamics in China from 1982 to 2015 and explore the impacting factors behind these variations. Our aim was to contribute to understanding the response of vegetation growth to climatic variation and human activities at a national scale. On this foundation, we further used the ARIMA model to forecast the one-year lead GPP values in China and assess the potential and uncertainty of the model. To corroborate the reliability of our results, we also extended our trend and prediction analyses into multiple sets of GPP products. Our main findings were that: (1) Over the whole of China, an upward GPP trend was observed with an average trend of 2.268 g C/m 2 /year during 1982-2015. However, the temporal pattern of GPP trends showed certain heterogeneity, and a more obvious downward trend occurs in the summer. Spatially, the GPP trends in most regions were mainly increasing, but in the SEB and the PRB the trends calculated by five sets of products were not consistent; (2) Climate factors and human interventions affect long-term GPP jointly. On a national scale, air temperature is a major climatic factor impacting GPP and the rise in LAI is a major human factor. The effect of precipitation on GPP is mainly reflected in the arid areas in the north and the effect of CO 2 on GPP is reflected in areas with better hydrothermal conditions in the east. AOD and SRAD also affect GPP, but not as obviously as the above factors; (3) The ARIMA model performs well in most parts of China, as evidenced by the relatively low values of MAE, RMSE, rMAE and rRMSE, as well as the good consistency with the reference GPP from 2016. Moreover, the performance of the ARIMA model was affected by the values and quality of the GPP data, indicating that the low value and high quality will produce more accurate predictions; (4) Further multiple GPP analysis showed consistent seasonality and trends among four sets of products that contained the data from 1982 to 2015. However, there were certain differences in spatial distribution, especially in the CB, the SEB, and the PRB. Using the ARIMA model, five GPP products all achieved satisfactory prediction results which were consistent in space. The comparison of multi-source products showed certain discrepancies in GPP trends and predictions at the basin scale but, on the whole, the conclusions were consistent.