Exploring the Individualized Effect of Climatic Drivers on MODIS Net Primary Productivity through an Explainable Machine Learning Framework

: Along with the development of remote sensing technology, the spatial–temporal variability of vegetation productivity has been well observed. However, the drivers controlling the variation in vegetation under various climate gradients remain poorly understood. Identifying and quantifying the independent effects of driving factors on a natural process is challenging. In this study, we adopted a potent machine learning (ML) model and an ML interpretation technique with high ﬁdelity to disentangle the effects of climatic variables on the long-term averaged net primary productivity (NPP) across the Amazon rainforests. Speciﬁcally, the eXtreme Gradient Boosting (XGBoost) model was employed to model the Moderate-resolution Imaging Spectroradiometer (MODIS) NPP data, and the Shapley addictive explanation (SHAP) method was introduced to account for nonlinear relationships between variables identiﬁed by the model. Results showed that the dominant driver of NPP across the Amazon forests varied in different regions, with temperature dominating the most considerable portion of the ecoregion with a high importance score. In addition, light augmentation, increased CO 2 concentration, and decreased precipitation positively contributed to Amazonia NPP. The wind speed for most vegetated areas was under the optimum, which beneﬁts NPP, while sustained high wind speed would bring substantial NPP loss. We also found a non-monotonic response of Amazonia NPP to VPD and attributed this relationship to the moisture load in Amazon forests. Our application of the explainable machine learning framework to identify the underlying physical mechanism behind NPP could be a reference for identifying relationships between components in natural processes.


Introduction
To grapple with climate change, mitigating the increasing carbon dioxide (CO 2 ) concentration in the atmosphere becomes of great urgency [1,2].Terrestrial ecosystems greatly benefit climate change alleviation since they absorb about one-third of CO 2 released by fossil fuel emissions and land use [2].Therefore, it is important to broaden our understanding of the spatial and temporal dynamics of terrestrial carbon fluxes for model simulations to optimize policymakers' decisions on forest carbon sequestration measures.Vegetation productivity is the driving force in the terrestrial carbon cycle and a staple regulator of the natural carbon sequestration process of the terrestrial ecosystem.The Amazon rainforest is the largest intact tropical forest worldwide [3] and one of the regions with the most vigorous carbon exchange.It accounts for 50% of carbon stocks in tropical forests [4].Thus, exploring the Amazonia vegetation productivity and the driving mechanism behind it is essential.
Exploring the dominant factors and their driving pattern of Amazonia NPP is important in identifying the future direction and measures to improve productivity and address climate change.Previous studies have shown that under natural conditions, climatic factors are usually the major factors affecting the spatial variability of vegetation growth [5][6][7][8].Generally, temperature, precipitation, and radiation are the dominating climatic factors of ecosystem carbon exchange [9][10][11], but at the regional level, the specific dominant environmental factors may vary [12][13][14][15].For example, radiation dominated the productivity increase in 1982-1999 in the Amazon rainforest due to the reduced cloud cover and the improved light conditions [12].CO 2 concentration is argued to dominate the changes in vegetation productivity worldwide [16], and VPD also appears to have played a significant role [17].However, in recent years, studies have shown that the CO 2 fertilization effects in South America have increased [18], and the VPD has increased in many places worldwide [19].These changes have prompted us to explore the dominant driving factors behind NPP and how it is affected by changes in various climate factors.
In recent years, the effects of atmospheric dryness and soil water dryness on vegetation have been studied in the context of global warming, sparking debate on the dominant factors and driving mechanisms of plant water stress.They present different responses of vegetation to atmospheric dryness in humid ecosystems.For example, Chen et al. and Green et al. [20,21] found that Amazonia vegetation growth responded positively to VPD in seasonal dynamics.Chen et al. [22] also suggested a positive correlation between vegetation productivity and VPD in humid ecosystems on daily data exploration.However, Yuan et al. [17] showed a negative correlation between VPD and vegetation productivity in the southern hemisphere on a longer interannual scale.
Explainable machine learning (XML) techniques may provide a valuable tool to reveal the complex relationship between physical quantities.XML has irresistible advantages that combine the robust data fitting ability of machine learning without shouldering empirical assumptions of biophysical mechanisms [23] while making complex relationships transparent and interpretable [24].At present, many XML technologies have been developed.Some XML models' structures are relatively simple and inherently interpretable such as Cubist.On the contrary, some models' structures are complex.Still, they can be interpreted by model-specific interpretation techniques such as knowledge distillation or by modelagnostic techniques such as Local Interpretable Model-agnostic Explanations (LIME) or Shapley Additive exPlanations (SHAP).So far, XML has had broad applications in Earth science [25][26][27].
In this study, we adopted net primary productivity (NPP) to characterize Amazonia vegetation productivity since it indicates the rate at which carbon accumulates in plants after considering the losses from plant respiration and other metabolic processes [28,29].We employed a machine learning method fed with climate data to establish a prediction model with high accuracy to model the continuous long-term averaged MODIS NPP in the Amazon ecoregion.We then adopted a high-fidelity model interpretation method to attribute NPP to the individualized effect of various driving factors.Our goals were (1) to determine the main drivers of NPP spatial variability in the Amazon region and (2) to reveal the individualized patterns of how each driver impacts NPP in the Amazon ecoregion.Our findings could benefit ecosystem model development through the improvements in determining model drivers and their influence patterns on carbon exchange.

Study Area
The spatial extent of the Amazon ecoregion (Figure 1) is based on the Terrestrial Ecoregions of the World [30].The vegetation of the Amazon is a mostly moist old-growth forest, and the broadleaf forest is the prevailing plant function type across the Amazon Rainforest.The Amazon ecoregion is located at an elevation below 1000 m [31], bounded by the Andes Mountains to the west, the Lanos Mountains and the Atlantic Ocean to the north and east, and the Cerrado (the vast tropical savanna ecoregion in Brazil), and dry forests to the east and south [32], with its annual mean temperature above 20 The spatial extent of the Amazon ecoregion (Figure 1) is based on the Terrestrial Ecoregions of the World [30].The vegetation of the Amazon is a mostly moist old-growth forest, and the broadleaf forest is the prevailing plant function type across the Amazon Rainforest.The Amazon ecoregion is located at an elevation below 1000 m [31], bounded by the Andes Mountains to the west, the Lanos Mountains and the Atlantic Ocean to the north and east, and the Cerrado (the vast tropical savanna ecoregion in Brazil), and dry forests to the east and south [32], with its annual mean temperature above 20 °C and mean annual precipitation more than 2000 mm in the period ranging from 2001 to 2020.

Data
We adopted Moderate Resolution Imaging Spectroradiometer (MODIS) NPP (MOD17A3H, version 6.0) [33] for analysis of vegetation productivity.The product is a cumulative 8-day composite of values with 500 m pixel size and was derived from a light use efficiency photosynthesis model fed with MODIS fPAR (fraction of absorbed photosynthetically active radiation).Taking vegetation maintenance and respiration into account [13,34], it accounted for respiratory costs and environmental stress on vegetation growth separately [35] by a Biome Parameter Lookup Table (BPLUT).MODIS NPP shows consistency (R 2 = 0.77) with an extensive worldwide NPP dataset assembled from more than 1000 field-observed data [34] and also shows close agreement with the Bigfoot NPP products, which were measured at nine different ecosystem flux towers throughout the world [36].

Data
We adopted Moderate Resolution Imaging Spectroradiometer (MODIS) NPP (MOD17A3H, version 6.0) [33] for analysis of vegetation productivity.The product is a cumulative 8-day composite of values with 500 m pixel size and was derived from a light use efficiency photosynthesis model fed with MODIS fPAR (fraction of absorbed photosynthetically active radiation).Taking vegetation maintenance and respiration into account [13,34], it accounted for respiratory costs and environmental stress on vegetation growth separately [35] by a Biome Parameter Lookup Table (BPLUT).MODIS NPP shows consistency (R 2 = 0.77) with an extensive worldwide NPP dataset assembled from more than 1000 field-observed data [34] and also shows close agreement with the Bigfoot NPP products, which were measured at nine different ecosystem flux towers throughout the world [36].
Here, TS is derived from a monthly MODIS product with a 0.05 • × 0.05 • spatial resolution (MOD11C3, version 6.0) [39].We chose land surface temperature as an explainable variable instead of air temperature because the former more approaches the canopy temperature [40].SR, P, and WS were extracted from the TerraClimate (Table 1) product that provides a 1/24-degree gridded monthly meteorological dataset.The TerraClimate dataset was derived from a combination of high-spatial-resolution climatological normal from the WorldClim dataset, the time-varying data from CRU Ts4.0, and the Japanese 55-year Reanalysis (JRA55) by using climatically aided interpolation [41].We adopted this data product mainly due to its high spatial resolution and verified accuracy on a global scale.SM10 and SM200 were collected from FLDAS Noah Land Surface Model L4 Global Monthly 0.1 • × 0.1 • reanalysis dataset for its high resolution and long available period compatible with our study [42].VPD is an indicator illustrating atmospheric dryness and is defined as the difference between the saturation vapor pressure at air temperature and the actual vapor pressure of the air.VPD was calculated from air temperature and specific humidity data from the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS) data in Equation (1) [43,44].CO 2 was extracted from the CarbonTracker dataset (CT2019B), which estimates the land biosphere net CO 2 fluxes on the global 1 • × 1 • grid [45].TN, TP, and CLAY were obtained from the gridded Global Soil Dataset for use in Earth System Models (GSDE) dataset with 30 arc-seconds resolution [46].
where TA is the air temperature (K) and RH is the relative humidity (kg kg −1 ).In addition, we applied land cover data to filter pixels that had plant function types changed to exclude the effects of land use change.Land cover data were extracted from the MODIS dataset (MCD12C1, version 6.0) [47], where we employed the International Geosphere-Biosphere Program (IGBP) classification layer to determine the land cover type.Moreover, we used the quality layer for all the applied MODIS data to filter pixels with poor quality.All the selected driving datasets were reprojected to the World Geodetic System 1984 and resampled to 0.05 • × 0.05 • spatial resolution using the bilinear resampling method (for downsampling) and the average composite method (for upsampling).To eliminate the inter-annual fluctuation and avoid distractions from short-term phenomena, we averaged all data over 20 years from 2001 to 2020, except for CO 2 , of which the available period only ranged from 2001 to 2018.After filtering the raw data according to the conditions applicable for all variables, there were 171,819 samples for model inputs.

Machine Learning Model
Tree-based models consistently outperform neural networks in tackling tabular-style datasets whose features lack strong multiscale structures and have individual meanings [48].Therefore, we developed our prediction model by adopting the tree-based eXtreme Gradient Boosting (XGBoost) algorithm [49].XGBoost is an enhanced version of GBDT (Gradient Boosting Decision Tree) with its core algorithm based on the idea of "boosting"-a stepwise forward addictive strategy.Specifically, the process of "boosting" in XGBoost is explained below.As a powerful estimator (ensembled estimator), XGBoost executes its output prediction by aggregating all the predictions of a set of the weak estimators (single estimator, usually decision trees) that are orderly constructed in the fitting process.The first estimator constructed is fitted to the whole data during the process.A later estimator is then built and fitted to the residuals of the previous estimator prediction.The significant improvement in XGBoost compared to GBDT draws on the regularization term added in the target function, the more precise loss function, and a set of related regularization work which can reduce overfitting and errors, such as shrinkage and column subsampling [49].
We used the XGBoost package and the Scikit-learn library to implement the model construction in the Python 3.8 environment.Among the 171,819 available samples for model construction, 120,273 for training and 51,546 for testing were randomly sampled.Hyperparameter tuning before model training was performed using a 10-fold cross-validation method based on the training set.The performance of the model trained after each round of parameter update was evaluated with the averaged root mean square error (RMSE) of the 10 cross-validation results.
The model hyperparameters were optimized by a stochastic hill-climbing algorithm, a stochastic local search optimization algorithm.Specifically, the hyperparameter adjustment range for each parameter was set first, thus defining the parameter searching space.Then, the initial search value and step size for each hyperparameter were set.When the hyperparameter tuning started, the parameter stepped forward from the initial value in the search space until the local optimum point (the point corresponding to the minimum RMSE) was met, which is the so-called "hill climbing" process.The search for each parameter was performed sequentially.Once the optimal value of the last parameter is obtained, it is fixed, the parameter combination is updated, and then the search traversal of the latter parameter is performed.We mainly tuned nine hyperparameters and confirmed the optimized hyperparameters as follows: n_estimators = 700, learning_rate = 0.15, max_depth = 10, colsample_bylevel = 0.9, colsample_bynode = 0.9, colsample_bytree = 0.9, reg_alpha = 1000, reg_lambda = 150, and gamma = 0.
We also used another two tree-based ML algorithms-Random Forest and Cubist-for modeling as a comparison.The former is also an ensemble tree algorithm based on the bagging concept.On our dataset, the results of Random Forest after hyperparameter optimization are not as good as XGBoost.The essence of Cubist is a piecewise linear decision tree [50,51].Cubist has a simple model structure with inherent interpretability.The main principle is to divide the whole response variables into subsets by different rules expressed by linear formula between the response variables and their explanatory variable.Our test results show that on our data, at least establishing 500 linear regression models (n_rules = 500) can achieve a fitting score comparable to XGBoost.However, excessive rule establishment sacrifices the interpretability of the model.Therefore, we finally adopted XGBoost as the fitting model.

Model Interpretation
Many machine learning interpretation techniques with high fidelity have been developed [52][53][54][55].In this study, we adopted the SHaley Addictive exPlanation (SHAP) method for its mature application technology and series of good qualities such as addictive property and fairness in attribution [56,57].SHAP has its solid theoretical foundation-Shapley value [58], which came from the coalitional game theory [59,60].A SHAP value is the amount of mathematical deformation of the Shapley value within the SHAP framework.SHAP quantifies the contributions of each sample to the predicted value and decomposes the contributions into piece contributions from each feature.When SHAP begins to interpret a prediction model, it first obtains the "base value" of the model, which indicates the value that the model would have predicted if no knowledge of the features was provided for the current output (the average of the predicted values by the model).For each sample, our prediction model predicts the corresponding predicted value.Then, SHAP translates the difference between the base value and the predicted value into the sum of the attribution value for each input feature, where the attribution values are the SHAP values [61].
The explanation could be specified as: where g is the explanation model, ϕ 0 is the average of the prediction, and ϕ i ∈R is the feature attribution for a feature I. z ∈ {0, 1} M is the coalition vector, M is the number of the features input, N is the set of all input features, S is the set of non-zero indexes, and f x is the expected value of the function conditioned on a subset of the input features.Figure 2 visualizes the decomposition of an instance prediction into each feature.Based on the feature attributions by SHAP, we could do some further analyses such as global evaluation of feature importance, exploring the relationship between the value of a feature and the impact on the prediction and the interaction effect between features.reg_alpha = 1000, reg_lambda = 150, and gamma = 0.
We also used another two tree-based ML algorithms-Random Forest and Cubistfor modeling as a comparison.The former is also an ensemble tree algorithm based on the bagging concept.On our dataset, the results of Random Forest after hyperparameter optimization are not as good as XGBoost.The essence of Cubist is a piecewise linear decision tree [50,51].Cubist has a simple model structure with inherent interpretability.The main principle is to divide the whole response variables into subsets by different rules expressed by linear formula between the response variables and their explanatory variable.Our test results show that on our data, at least establishing 500 linear regression models (n_rules = 500) can achieve a fitting score comparable to XGBoost.However, excessive rule establishment sacrifices the interpretability of the model.Therefore, we finally adopted XGBoost as the fitting model.

Model Interpretation
Many machine learning interpretation techniques with high fidelity have been developed [52][53][54][55].In this study, we adopted the SHaley Addictive exPlanation (SHAP) method for its mature application technology and series of good qualities such as addictive property and fairness in attribution [56,57].SHAP has its solid theoretical foundation-Shapley value [58], which came from the coalitional game theory [59,60].A SHAP value is the amount of mathematical deformation of the Shapley value within the SHAP framework.SHAP quantifies the contributions of each sample to the predicted value and decomposes the contributions into piece contributions from each feature.When SHAP begins to interpret a prediction model, it first obtains the "base value" of the model, which indicates the value that the model would have predicted if no knowledge of the features was provided for the current output (the average of the predicted values by the model).For each sample, our prediction model predicts the corresponding predicted value.Then, SHAP translates the difference between the base value and the predicted value into the sum of the attribution value for each input feature, where the attribution values are the SHAP values [61].
The explanation could be specified as: where g is the explanation model, 0 is the average of the prediction, and i∈R is the feature attribution for a feature I. ′ ∈ {0,1} is the coalition vector, M is the number of the features input, N is the set of all input features, S is the set of non-zero indexes, and   is the expected value of the function conditioned on a subset of the input features.Figure 2 visualizes the decomposition of an instance prediction into each feature.Based on the feature attributions by SHAP, we could do some further analyses such as global evaluation of feature importance, exploring the relationship between the value of a feature and the impact on the prediction and the interaction effect between features.The author of SHAP has also developed an efficient implementation of SHAP on treebased models, TreeSHAP [61], where it provides the concept of a SHAP interaction value to assume the dependence relationship of features.A SHAP interaction value characterizes the magnitude of the interaction effect of paired features.According to the same principle of SHAP value, the sum of the SHAP interaction values of the feature with all other variables (including the interaction with itself) constitutes the SHAP value of this feature [61].
For SHAP interaction values, the explanation could be specified as: when i = j.The SHAP interaction value between feature i and feature j is equally split between each feature.Thus ϕ i,j = ϕ j,i and the total interaction effect is ϕ i,j + ϕ j,i .
The main effects for a prediction can be defined as the interaction of the feature with itself, which can be calculated as:

Model Evaluation
Model evaluation was based on a testing subset.The constructed model obtained a coefficient of determination (R 2 ) of 0.923, a mean absolute error (MAE) of 30.85, and a root mean square error (RMSE) of 64.28 on the testing subset (mean value and median value of the testing data were 1152.12g•C/m 2 and 1133.47 g•C/m 2 , respectively). Figure 3a compares the modeled NPP with the MODIS NPP. Figure 3b is a Quantile-Quantile plot (QQ plot) embedded by frequency histograms, which compares their overall distributions.We can see that there are some large deviations at the beginning and the end of the fitted data distribution, while the vast majority of points are consistently fitted.Overall, the constructed model predicted NPP was consistent with most of the testing MODIS NPP with evaluation metrics indicating good performance, but in a local view, the model overestimated small values and underestimated large values.In particular, the deviations were significant for the extreme values.Only samples that have residuals less than three times the standard deviation of the testing data joined the SHAP interpretation.

Relative Importance of Drivers of NPP
SHAP attribution was conducted (Figure 4).Feature importance was quantified as Residual analysis was further performed.Figure 3c shows a scatter plot of residuals (y-axis) versus the observed (x-axis) values of the dependent variable.For a well-performed model, we expect a symmetric spread of points around the horizontal line at zero, indicating random deviations of predictions from the observed values.The residuals are positive for the small observed values of the dependent variable, while for large values, they are negative.Especially the outlier dots that have exceeded the threshold of three standard deviations, mostly distributed at both ends of the distribution.Thus, the plot suggests that the predictions inclined to the average.Figure 3d shows a comparison between the QQ plot of the residual distribution and a set of normally distributed data with zero mean.Theoretically, the residual should be random and unpredictable.One of the manifestations of this randomness could be that the bias fits well with the normal distribution.As seen from the Figure 3d, most of the residuals predicted by the model are close to the normal distribution, but the residual values of the beginning and the end have a relatively large deviation.
Overall, the constructed model predicted NPP was consistent with most of the testing MODIS NPP with evaluation metrics indicating good performance, but in a local view, the model overestimated small values and underestimated large values.In particular, the deviations were significant for the extreme values.Only samples that have residuals less than three times the standard deviation of the testing data joined the SHAP interpretation.

Relative Importance of Drivers of NPP
SHAP attribution was conducted (Figure 4).Feature importance was quantified as each feature's mean absolute SHAP values for all pixels.Then, the features were ranked according to these importance indexes, as shown in Figure 5. TS appeared to be the most crucial driver for the long-term averaged NPP distribution, followed by SR, whose mean SHAP value was less than half of TS's.SM200, VPD, SM10, WS, P, and CO 2 orderly came next, with their feature importance gradually decreasing and being 26.In addition to measuring the magnitude of the force that a feature impacted on the NPP output, the primary drivers of each 0.05-degree pixel were also identified and In addition to measuring the magnitude of the force that a feature impacted on the NPP output, the primary drivers of each 0.05-degree pixel were also identified and mapped in Figure 6.The dominant driver of NPP varied widely across the ecoregion.Therefore, we calculated the percentage of the spatial area dominated by the drivers at all grid cells and ranked the drivers in light of this count.TS was the most significant impact factor of NPP across variable areas.Radiation also has a considerable dominant sphere.Moisture condition has great clout in the west of the ecoregion where precipitation is abundant.
Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 20 mapped in Figure 6.The dominant driver of NPP varied widely across the ecoregion.Therefore, we calculated the percentage of the spatial area dominated by the drivers at all grid cells and ranked the drivers in light of this count.TS was the most significant impact factor of NPP across variable areas.Radiation also has a considerable dominant sphere.Moisture condition has great clout in the west of the ecoregion where precipitation is abundant.Overall, the results indicate that TS was the major driving factor of the spatial variability of NPP in the Amazon ecoregion, both in terms of force and dominant space.In a more local analysis, the dominant driver varied across the Amazon rainforest.SR and moisture conditions (VPD, SM200, and SM10) also contributed to the spatial pattern of Amazonia NPP.WS and CO2 played a minor role.Precipitation almost made a negligible effect.

Impact of Individualized Climatic Drivers on Spatial Variability of NPP
We plotted the dependencies between each factor's values and its corresponding SHAP values for all samples to explore how these drivers affect the final model outputs Overall, the results indicate that TS was the major driving factor of the spatial variability of NPP in the Amazon ecoregion, both in terms of force and dominant space.In a more local analysis, the dominant driver varied across the Amazon rainforest.SR and moisture conditions (VPD, SM200, and SM10) also contributed to the spatial pattern of Amazonia NPP.WS and CO 2 played a minor role.Precipitation almost made a negligible effect.

Impact of Individualized Climatic Drivers on Spatial Variability of NPP
We plotted the dependencies between each factor's values and its corresponding SHAP values for all samples to explore how these drivers affect the final model outputs (Figure 7).The plots show the main effects of individual variables on the NPP, eliminating the interactions between variables (SHAP dependence plots confounded interaction effect are shown in Figure A1).The zero line of SHAP value (horizontal red line) distinguishes the positive and negative driving force exerted on NPP compared to the multi-year average forecast (base value).Overall, almost all of the factors influenced NPP non-monotonically, suggesting that the influence of the environmental factors is regionally variable.Only SM200 exerted a relatively linear and monotonical effect on NPP that enhanced SM200 directly decreased NPP.NPP had a similar non-monotonic relationship with TS, VPD, and P. With the enhancement of the three factors, NPP reversed its original increasing trajectory after reaching a certain level.Areas with extreme values of P even experienced substantial NPP loss.The effects of SR, SM10, and CO 2 on NPP were also similar, except for SR less than 150 W/m 2 , which directly increases the NPP gain, but the threshold effects all occurred until they reached a considerably large value.Most areas of low WS (WS < 2 m/s) failed to contribute NPP to the average level (SHAP value = 0), and high WS directly impairs NPP.The effect of CLAY on NPP mostly fluctuated in a small scope along the gradient of CLAY but exerted a negative impact on NPP when CLAY is relatively low or when CLAY is relatively high.Samples of high TN and high TP were scarce, but it can be seen that high TP brought NPP loss and high TN brought NPP gain.the gradient of CLAY but exerted a negative impact on NPP when CLAY is relatively low or when CLAY is relatively high.Samples of high TN and high TP were scarce, but it can be seen that high TP brought NPP loss and high TN brought NPP gain.

Dominant drivers of Spatial Variability of Amazonia NPP
In a humid forest ecosystem, vegetation productivity is generally dominated by ra-

Dominant Drivers of Spatial Variability of Amazonia NPP
In a humid forest ecosystem, vegetation productivity is generally dominated by radiation or temperature [62,63].Our SHAP analysis suggests that the temperature, dwarfing other limitation elements, is the most potent driver of the spatial heterogeneity of Amazonia NPP in terms of strength and space dominance (Figures 5 and 6).This is reasonable because the temperature is the greatest limiting factor for the photosynthesis rate in well-lit, warm leaves.To be specific, the photosynthetic efficiency of the canopy mainly depends on the gas exchange rate of the sunlit leaves rather than the gas flux of the underlit leaves [64].Moreover, most of the Amazon forests are tall, old-growth forests with large canopies, reasonable light-receiving structures, with abundant water supply.Additionally, in recent decades, as the dry season has extended, cloud cover has decreased, and radiation has increased [12,65], thereby canopy leaves have received more sunlight.Thus, the temperature became the dominant limiting factor.
The response of NPP to temperature was in tune with that in the leaf scale [66][67][68].It was worth noting that the temperatures of most of the vegetation had exceeded the optima for photosynthesis, after which NPP showed negative feedback to increasing temperature (Figure 7a) [40,64].Considering the current upward trajectory of temperature and such a high sensibility of NPP, it should be noted that the nucleus density center may move down and the Amazonia vegetation productivity is likely to wind down soon.
Solar radiation also contributed to the spatial pattern of Amazonia NPP.Most of the Amazonia vegetation complied with the fact that enhanced light intensity directly promotes increased photosynthesis [69,70] (Figure 7b).However, in our SHAP analysis, the west of the Amazon ecoregion covered with high precipitation made an exception for low radiation to play a relatively positive impact on NPP (Figures 7b, 4b, and A2b).This may be because under the low radiation conditions caused by the obstruction of thick clouds brought by the large water vapor in the atmosphere [69,71]; although, the vegetation receives less direct beam and total irradiance, the diffuse irradiance that penetrates canopies more efficiently than direct beam irradiance would be increased, thereby increasing the area of photosynthetic activity [72,73].Therefore, the effect of radiation on vegetation productivity may also depend on the trade-off between total irradiance, diffuse irradiance, and radiation utilization efficiency [72].More positive effects of solar radiation enhancement can be obtained for well-structured stand and canopy structures.
Our SHAP analysis suggested an optimal wind speed of about 1.8 m/s (Figure 7f).The mechanisms of how wind speed influences vegetation are complex.Wind movement adjusted the photosynthesis of plants by affecting the leaf structure of the canopy [74,75].For example, an appropriate increase in wind speed can increase the light receiving area of the blade and the radiation absorption efficiency.However, the lower wind speed had more respiratory CO 2 recycled [76].Generally, wind speed greater than 2 m/s can be called strong wind.Sustained strong winds will hinder local vegetation leaf reproduction and growth [77].Moreover, high wind speed promoted the speed of fire spread [78].It is notable that the Amazon Rainforest suffered frequent fires in recent years in places with high wind speeds.
In addition, an elevated CO 2 concentration promoted Amazonia NPP (Figure 7h), but it also had a threshold.A previous study reported that the CO 2 fertilization effect in Southern America has increased [18].This is beneficial to the NPP growth in Amazon forests.However, the increasing CO 2 was always accompanied by increasing temperature and other variables changes, but the effect of CO 2 on Amazonia NPP seemed relatively limited (Figures 5 and 6).

Amazonia Vegetation Responds to Moisture
Our study demonstrated a non-monotonic response of Amazonian vegetation to VPD, with NPP initially being slightly promoted and then strongly suppressed in response to increasing drying conditions (Figure 7d).Excluding phenological and seasonal dynamic factors such as leaf turnover [21], the moisture load of the humid rainforest itself can better explain the long-term vegetation-VPD response mechanism of the Amazon rainforest.On the one hand, the interaction of VPD with almost all magnitudes of shallow soil water content shows a relatively more positive effect on NPP than that of low VPD (Figure 8b).On the other hand, high VPD showed a suppressive impact on NPP in regions with low deep soil moisture content but relaxed its negative effect in areas with high deep soil moisture content (Figure 8c,d), indicating that the vegetation-VPD relationship may be regulated by soil moisture.Amazon rainforests have a strong water-holding capacity and deep root systems, which determines its water use strategy, in that the physiological activities of vegetation rely more on stable and abundant deep soil water.However, excessive water load may stress vegetation photosynthesis by impairing enzyme activity, chlorophyll, and soluble protein levels [79].Therefore, the reduction in deep soil moisture could alleviate the pressure caused by the oversupply of moisture (Figure 7c).
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 20 level (Figure 7e).Similarly, further consistent precipitation increases may also impair the Amazonia NPP (Figure 7g).In this context, appropriately increasing dryness may help mitigate the water overload by promoting leaf stomata openness and evapotranspiration.Moreover, as dryness increases, Stomatal stomata gradually become more adaptable and less sensitive to VPD [80,81], thereby enhancing photosynthesis ability.However, when VPD increases to a certain extent, the water loss caused by intensified transpiration exceeds the benefit of photosynthesis.At the same time, the reduction in the temperature by evaporative cooling also increases the cost of photosynthesis.Therefore, over-high VPD shifts its positive effect toward the suppressive effect on vegetation productivity [17,82,83].Generally, the optimal VPD for plants ranges from 0.3 to 1.0 kPa [84,85], and water stress occurs in plants when VPD is over 1 kPa [86], which is consistent with our findings (Figure 7d).It is worth noting that most of the Amazonia vegetation is under or exceeding optimal VPD conditions, indicating that further increments of VPD may reduce vegetation productivity.

Benefits and Uncertainty of Explainable Machine Learning
Machine learning has profound applications due to its notable advantages, among which the most notable is the powerful ability to explore complex relationships behind data.However, data-driven methods are intrinsically constrained by data and algorithms [87].Moreover, machine learning fails to handle problems of extreme values well.Thus, carefully screening and processing for data quality and integrity is a crucial step to reducing uncertainty in ML model construction.
A distinct drawback of SHAP values is that they are provided as the property of the Furthermore, shallow soil water could act as a backup reservoir when water supplements become urgent due to its strong resilience.Therefore, enhanced shallow soil water content conduced NPP gain but also brought stress on NPP when it exceeded a certain level (Figure 7e).Similarly, further consistent precipitation increases may also impair the Amazonia NPP (Figure 7g).
In this context, appropriately increasing dryness may help mitigate the water overload by promoting leaf stomata openness and evapotranspiration.Moreover, as dryness increases, Stomatal stomata gradually become more adaptable and less sensitive to VPD [80,81], thereby enhancing photosynthesis ability.However, when VPD increases to a certain extent, the water loss caused by intensified transpiration exceeds the benefit of photosynthesis.At the same time, the reduction in the temperature by evaporative cooling also increases the cost of photosynthesis.Therefore, over-high VPD shifts its positive effect toward the suppressive effect on vegetation productivity [17,82,83].Generally, the optimal VPD for plants ranges from 0.3 to 1.0 kPa [84,85], and water stress occurs in plants when VPD is over 1 kPa [86], which is consistent with our findings (Figure 7d).It is worth noting that most of the Amazonia vegetation is under or exceeding optimal VPD conditions, indicating that further increments of VPD may reduce vegetation productivity.

Benefits and Uncertainty of Explainable Machine Learning
Machine learning has profound applications due to its notable advantages, among which the most notable is the powerful ability to explore complex relationships behind data.However, data-driven methods are intrinsically constrained by data and algorithms [87].Moreover, machine learning fails to handle problems of extreme values well.Thus, carefully screening and processing for data quality and integrity is a crucial step to reducing uncertainty in ML model construction.
A distinct drawback of SHAP values is that they are provided as the property of the additive contribution of explanatory variables [56].In other words, SHAP assumes that the variables are independent of each other, which would make the model mixed with high interaction less credible.However, the situation that the characteristics are entirely independent rarely exists.Especially in Earth Science, various environmental factors have coupling or interaction effects.Despite TreeSHAP [61] assuming less feature independence and explaining some feature dependencies, it still fails to fully explain [57], because, for example, SHAP interaction value is also endowed with the attribute of additive contribution.We carefully observed correlations between features and made tradeoffs before model input.Moreover, the model performance and consistency of the model results with known physical mechanisms (Figure 3) lend us confidence in the constructed model.
Note that XML does not capture the complete behavior of a physical system but rather provides a rough approximation of the system's behavior [24].Therefore, the model's credibility relies more on the rigorous and comprehensive selection of features during model construction to approach the actual situation as closely as possible.Despite all the uncertainty and drawbacks, XML remains a valuable method boasting a large potential to comprehend the recondite functionality behind phenomena in Earth Science, not only for prediction [25,88,89] but also for contrastive explanations [15,90].

Conclusions
As NPP is driven by multiple climatic variables, we adopted an explainable machine learning technique to understand how the long-term averaged MODIS NPP responds to the climate variables in the Amazon rainforest.The specific operation included: first, NPP was learned through a potent machine learning model; then, the model was fed into the SHAP framework to explain the mechanisms.Conclusions based on the SHAP analyses are as follows: 1.
Relative contributions of each driver were identified, showing that the temperature outperformed other climatic variables in contributing to Amazonia NPP variability.Radiation and vapor pressure deficit also made a considerable contribution.Wind speed, CO 2 concentration, and precipitation were also responsible.

2.
Individualized feature attribution was detected.In most areas of Amazon forests, the temperature exceeded the optimal value for NPP growth.Generally, elevated radiation and increased CO 2 concentration promote NPP gain monotonically, while high precipitation impairs NPP.In addition, for most vegetation, the wind speed did not reach the optimum value that benefits NPP, and sustained high wind speed would bring substantial NPP loss.

3.
Amazonia NPP responded to VPD non-monotonically.Considering the distinct response of NPP to soil water content under different layers, the relationship between NPP and VPD was highly connected to the water use policy and moisture overload conditions in Amazon forests.Further increases in VPD largely impaired NPP despite the moisture overload conditions in Amazon forests.
Continuous and high-covered remote sensing data allow us to make large-scale and high-precision vegetation analyses, and the explainable machine learning technology enables us to better understand the response mechanism of MODIS NPP to its climatic factors.Applying a combination of remote sensing data and explainable machine learning in Earth Science could serve as an effective reference for comprehending natural dynamics in the ecosystem.

Figure 1 .
Figure 1.Amazon ecoregion based on the definition from the Terrestrial Ecoregions of the World.

Figure 1 .
Figure 1.Amazon ecoregion based on the definition from the Terrestrial Ecoregions of the World.

Figure 2 .
Figure 2. Decomposed SHAP values for the prediction of an example individual pixel.The base value indicates the averaged output of the predictions.f(x) indicates the specific prediction of this sample.Features that increase the value of the prediction are shown in red; those that lower the prediction value are shown in blue.

Figure 3 .
Figure 3. Model evaluation plots based on the testing data: (a) The scatter plot depicts a detailed comparison of the two data.(b) A quantile-quantile plot of the modeled and MODIS NPP laced with the histogram of distribution, respectively.(c) Residuals versus the observed NPP.(d) A comparison plot of model residual distribution and a normal distribution.σ indicates the standard deviation of the testing data of MODIS NPP.

Figure 3 .
Figure 3. Model evaluation plots based on the testing data: (a) The scatter plot depicts a detailed comparison of the two data.(b) A quantile-quantile plot of the modeled and MODIS NPP laced with the histogram of distribution, respectively.(c) Residuals versus the observed NPP.(d) A comparison plot of model residual distribution and a normal distribution.σ indicates the standard deviation of the testing data of MODIS NPP.

Figure 5 .
Figure 5. Feature importance plot.The feature importance was quantified as the average of absolute SHAP values (mean |SHAP values|) of all 0.05° × 0.05° pixels.Units of SHAP values: g C/m².

Figure 5 .
Figure 5. Feature importance plot.The feature importance was quantified as the average of absolute SHAP values (mean |SHAP values|) of all 0.05 • × 0.05 • pixels.Units of SHAP values: g C/m 2 .

Figure 6 .
Figure 6.Spatial variability of primary drivers for each pixel with a resolution of 0.05° × 0.05°.The heights of the bars indicate the percentage of the cells occupied by the corresponding drivers.Soil indicates the combination of CLAY, TN, and TP.

Figure 6 .
Figure 6.Spatial variability of primary drivers for each pixel with a resolution of 0.05 • × 0.05 • .The heights of the bars indicate the percentage of the cells occupied by the corresponding drivers.Soil indicates the combination of CLAY, TN, and TP.

Figure 7 .
Figure 7. SHAP dependence plots depicting the SHAP main values along the gradient of (a) TS, (b) SR, (c) SM200, (d) VPD, (e) SM10, (f) WS, (g) P, (h) CO2, (i) TP, (j) CLAY, and (k) TN.The lateral axis indicates the gradient of variable values.The vertical axis indicates the magnitude of the SHAP main value.Positive SHAP values indicate the positive force on NPP output while negative SHAP values indicate the opposite.The colors indicate the density.SHAP main values units: g C/m².

Figure 7 .
Figure 7. SHAP dependence plots depicting the SHAP main values along the gradient of (a) TS, (b) SR, (c) SM200, (d) VPD, (e) SM10, (f) WS, (g) P, (h) CO 2 , (i) TP, (j) CLAY, and (k) TN.The lateral axis indicates the gradient of variable values.The vertical axis indicates the magnitude of the SHAP main value.Positive SHAP values indicate the positive force on NPP output while negative SHAP values indicate the opposite.The colors indicate the density.SHAP main values units: g C/m 2 .

Figure 8 .
Figure 8. Coupling effect and interaction effect between soil moisture content and vapor pressure deficit.Units of SHAP and SHAP interaction values: g C/m²: (a) Coupling effect between SM10 and VPD.(b) Interaction effect between SM10 and VPD.(c) Coupling effect between SM200 and VPD.(d) Interaction effect between SM200 and VPD.

Figure 8 .
Figure 8. Coupling effect and interaction effect between soil moisture content and vapor pressure deficit.Units of SHAP and SHAP interaction values: g C/m 2 : (a) Coupling effect between SM10 and VPD.(b) Interaction effect between SM10 and VPD.(c) Coupling effect between SM200 and VPD.(d) Interaction effect between SM200 and VPD.
• C and mean annual precipitation more than 2000 mm in the period ranging from 2001 to 2020.

Table 1 .
Summary of data products used in this study.