Modeling Nutrition Quality and Storage of Forage Using Climate Data and Normalized-Difference Vegetation Index in Alpine Grasslands

: Quantifying forage nutritional quality and pool at various spatial and temporal scales are major challenges in quantifying global nitrogen and phosphorus cycles, and the carrying capacity of grasslands. In this study, we modeled forage nutrition quality and storage using climate data under fencing conditions, and using climate data and a growing-season maximum normalized-difference vegetation index under grazing conditions based on four different methods (i.e., multiple linear regression, random-forest models, support-vector machines and recursive-regression trees) in the alpine grasslands of Tibet. Our results implied that random-forest models can have greater potential ability in modeling forage nutrition quality and storage than the other three methods. The relative biases between simulated nutritional quality using random-forest models and the observed nutritional quality, and between simulated nutrition storage using random-forest models and the observed nutrition storage, were lower than 2.00% and 6.00%, respectively. The RMSE between simulated nutrition quality using random-forest models and the observed nutrition quality, and between simulated nutrition storage using random-forest models and the observed nutrition storage, were no more than 0.99% and 4.50 g m − 2 , respectively. Therefore, random-forest models based on climate data and/or the normalized-difference vegetation index can be used to model forage nutrition quality and storage in the alpine grasslands of Tibet. G.F. and F.H.; methodology, G.F.; software, C.Y.; validation, G.F.; formal analysis, G.F. and F.H.; investigation, G.F.; resources, C.Y.; data curation, G.F.; writing—original draft preparation, G.F. and F.H.; writing—review and editing, G.F. and F.H.; visual-ization, S.W.; supervision, S.W.; administration, C.Y.; C.Y.


Introduction
Forage nutritional quality and pool can affect the quality and size of livestock and wildlife, and the nutrient-carrying capacity in various grassland ecosystems [1][2][3]. Crude protein (CP), ether extract (EE), crude ash (Ash), acid detergent fiber (ADF), neutral detergent fiber (NDF) and water-soluble carbohydrate (WSC) contents and pools are often treated as indicators of forage nutritional quality and nutritional pools, respectively [4,5]. Quantifying their variations at various spatial and temporal scales are major challenges in quantifying the global nitrogen and phosphorus cycles and carrying capacity of grasslands [2,6]. More and more studies have estimated forage nutritional quality and pool at various spatial and temporal scales [2,[7][8][9][10]. However, there are still some uncertainties. First, compared to plant biomass and/or production, forage nutritional quality and pool have been less modeled at various spatial and temporal scales [11,12]. Second, compared to CP, the other five variables (i.e., EE, Ash, ADF, NDF and WSC) have been less quantified were to compare the accuracies of the random-forest models, support-vector machines and recursive-regression trees in predicting forage CP, EE, Ash, ADF, NDF and WSC contents and pools in alpine grasslands.

Plant Sampling and Analyses
In July-August 2018-2020, we clipped the aboveground biomass of all plants from 190 quadrats under fencing conditions and 190 quadrats under grazing conditions in alpine grasslands of Tibet. The sampling sites are illustrated in Figure 1. The quadrat sizes were 0.50 m × 0.50 m and 1.00 m × 1.00 m for alpine meadows and alpine steppes, respectively. The aboveground biomass was weighed after oven-drying at 65 °C for 48 h. Then, we measured the CP, EE, Ash, ADF/NDF and WSC using the Kjeldahl method, Soxhlet extraction method, complete combustion method, Van Soest method and anthrone-based method, respectively [4].

Normalized-Difference Vegetation Index and Climate Data
The growing-season (May-September) maximum normalized-difference vegetation index was obtained using a Moderate-Resolution Imaging Spectroradiometer (MOD13A3, Collection 6, 1 km × 1 km, monthly). Monthly air temperature, precipitation and radiation data were obtained from interpolated climate data with a spatial resolution of 1 km × 1 km [31]. According to previous studies [11,32], we assumed that the CP, EE, Ash, ADF, NDF and WSC contents and pools under fencing conditions were only affected by climate change (air temperature, precipitation and radiation), and they had potential forage nutritional quality and pools. By contrast, we assumed the CP, EE, Ash, ADF, NDF and WSC contents and pools under grazing conditions were simultaneously affected by climate change and human activities, and they had actual nutritional quality and pool. Growingseason mean air temperature, total precipitation and total radiation were used to simulate the potential CP, EE, Ash, ADF, NDF and WSC contents and pools under fencing condi-

Normalized-Difference Vegetation Index and Climate Data
The growing-season (May-September) maximum normalized-difference vegetation index was obtained using a Moderate-Resolution Imaging Spectroradiometer (MOD13A3, Collection 6, 1 km × 1 km, monthly). Monthly air temperature, precipitation and radiation data were obtained from interpolated climate data with a spatial resolution of 1 km × 1 km [31]. According to previous studies [11,32], we assumed that the CP, EE, Ash, ADF, NDF and WSC contents and pools under fencing conditions were only affected by climate change (air temperature, precipitation and radiation), and they had potential forage nutritional quality and pools. By contrast, we assumed the CP, EE, Ash, ADF, NDF and WSC contents and pools under grazing conditions were simultaneously affected by climate change and human activities, and they had actual nutritional quality and pool. Growingseason mean air temperature, total precipitation and total radiation were used to simulate the potential CP, EE, Ash, ADF, NDF and WSC contents and pools under fencing conditions based on multiple linear regressions, random-forest models, support-vector machines and recursive-regression trees, respectively (Tables 1-4). By contrast, growing-season mean air temperature, total precipitation, total radiation and the maximum normalized-difference vegetation index were used to simulate the actual CP, EE, Ash, ADF, NDF and WSC contents and pools under grazing conditions based on multiple linear regressions, random-forest models, support-vector machines and recursive-regression trees, respectively (Tables 1-4).

Statistical Analysis
We used the cross-validations across all 190 samples under fencing or grazing conditions. The 190 samples were randomly divided into two groups. The first group (n = 170) was used to obtain the multiple linear regressions, random-forest models, support-vector machines and recursive-regression trees for each of the six variables under fencing or grazing conditions. The second group (n = 20) was used to validate the multiple linear regressions, random-forest models, support-vector machines and recursive-regression trees. The relative bias, root-mean-square error (RMSE), relative RMSE, determination coefficient (R 2 ) and linear slope between simulated and observed data were treated as indicators of model accuracies. The multiple linear regressions, random-forest models, support-vector machines and recursive-regression trees were performed using R.4.1.2.

Model Building
The key parameters of the models between the nutritional quality and pool variables (i.e., CP, EE, Ash, ADF, NDF and WSC contents and pools) and growing-season climate data (i.e., mean air temperature, total precipitation and total radiation), and/or the maximum normalized-difference vegetation index under fencing or grazing conditions, are shown in Tables 1-4, respectively. Different methods can provide different parameters among the four methods (Tables 1-4). For example, the multiple linear regressions, random-forest models, and recursive-regression trees can directly provide R 2 values (Tables 1, 2 and 4). By contrast, the support-vector machines did not directly provide R 2 values (Table 3). Among the multiple linear regressions, random-forest models, and recursive-regression trees, the multiple linear regressions explained the fewest variations in all the nutritional quality and pools variables, and the random-forest models explained the most variations for most of the nutritional quality and pools variables (Tables 1, 2 and 4). The explanation abilities of the environmental variables of forage nutritional quality and pool can change with different indices of forage nutritional quality and pool, and land use types (Tables 1, 2

and 4).
Climate data can explain about 73-93%, 5-75% and 24-92% of the variation in these variables related to forage nutritional quality and pool, based on the random-forest models, multiple linear regressions and recursive-regression trees under fencing conditions, respectively (Tables 1, 2 and 4). Meanwhile, climate data and growing-season maximum normalized-difference vegetation index can, together, explain about 76-96%, 8-57% and 42-91% of the variation in these variables related to forage nutritional quality and pool, based on the random-forest models, multiple linear regressions and recursive-regression trees under grazing conditions, respectively (Tables 1, 2 and 4). No fixed or default ntree and mtry parameters were used for the random-forest models (Table 2). Additionally, no fixed support-vector parameter was used for the support-vector machines (Table 3).

Model Validation
The RMSE and relative RMSE values between the simulated and observed nutritional quality and pool variables under fencing or grazing conditions are shown in Tables 5 and 6. The RMSE and relative RMSE values between the simulated nutritional quality and pool variables using random-forest models, and the observed nutritional quality and pool variables, were the lowest among the four simulated methods under both fencing and grazing conditions, respectively. The RMSE and relative RMSE values between simulated nutrition quality using random-forest models and observed nutrition quality were no more than 0.99% and 7.23%, respectively. Meanwhile, the RMSE and relative RMSE values between the simulated nutrition storage using random-forest models and observed nutrition storage were no more than 4.50 g m −2 and 35.32%, respectively. The relative biases between the simulated and observed nutritional quality and pool variables under fencing or grazing conditions are shown in Table 7. The absolute values of the relative biases between the simulated-using multiple linear regressions-and observed Ash contents, EE contents, ADF contents and NDF pools were the largest among the four simulated methods under both fencing and grazing conditions. The absolute values of the relative biases between the simulated-using support-vector machines-and observed CP pools and WSC pools were the largest among the four simulated methods under both fencing and grazing conditions. The absolute values of the relative biases between the simulated-using recursive-regression trees-and observed WSC contents were the largest among the four simulated methods under both fencing and grazing conditions. The absolute values of the relative biases between the simulated-using random-forest models-and observed Ash pools, EE contents, WSC pools and NDF pools were the lowest among the four simulated methods under both fencing and grazing conditions. All the relative biases between the simulated and observed nutritional quality and pool variables were lower than 6.00% for the random-forest models, whereas those of the other three methods were greater than 6.00% (even 20.00%) for some cases. The linear slopes between the simulated and observed nutritional quality and pool variables under fencing or grazing conditions are shown in Table 8. Generally, the slopes between the simulated forage nutritional quality and pool variables using random-forest models, and the observed forage nutritional quality and pool variables, were the closest to 1 compared to the other three methods. The linear slopes were within 0.55-1.12, 0.94-1.02, 0.47-1.04 and 0.77-1.30 between the observed and simulated nutritional quality and pool variables from the multiple linear regressions, random-forest models, support-vector machine and recursive-regression trees, respectively. The R 2 values between the simulated and observed nutritional quality and pool variables under fencing or grazing conditions are shown in Figures 2-9. Generally, the R 2 values between the simulated forage nutritional quality and pool variables using random-forest models, and the observed forage nutritional quality and pool variables, were the closest to 100% compared to the other three methods. The simulated nutritional quality and pool variables from multiple linear regressions, random-forest models, support-vector machine and recursive-regression trees can explain 47-100%, 93-100%, 56-100% and 57-100% of the observed nutritional quality and pool variables, respectively.        observed ether extract (EE) pool; (j) between simulated and observed water-soluble carbohydrate (WSC) pool; (k) between simulated and observed acid detergent fiber (ADF) pool; and (l) between simulated and observed neutral detergent fiber (NDF) pool under grazing conditions. The solid lines indicate fitted lines. All the simulated data were based on random-forest model.  observed ether extract (EE) pool; (j) between simulated and observed water-soluble carbohydrate (WSC) pool; (k) between simulated and observed acid detergent fiber (ADF) pool; and (l) between simulated and observed neutral detergent fiber (NDF) pool under fencing conditions. The solid line indicate fitted lines. All the simulated data were based on support-vector machines.  observed neutral detergent fiber (NDF) content; (g) between simulated and observed crude protein (CP) pool; (h) between simulated and observed crude ash (Ash) pool; (i) between simulated and observed ether extract (EE) pool; (j) between simulated and observed water-soluble carbohydrate (WSC) pool; (k) between simulated and observed acid detergent fiber (ADF) pool; and (l) between simulated and observed neutral detergent fiber (NDF) pool under grazing conditions. The solid lines indicate fitted lines. All the simulated data were based on support-vector machines.  observed neutral detergent fiber (NDF) content; (g) between simulated and observed crude protein (CP) pool; (h) between simulated and observed crude ash (Ash) pool; (i) between simulated and observed ether extract (EE) pool; (j) between simulated and observed water-soluble carbohydrate (WSC) pool; (k) between simulated and observed acid detergent fiber (ADF) pool; and (l) between simulated and observed neutral detergent fiber (NDF) pool under fencing conditions. The solid lines indicate fitted lines. All the simulated data were based on recursive-regression trees.

Discussion
The R 2 values of the random-forest models, multiple linear regression and recursiveregression trees were within 0.73-0.96, 0.05-0.75 and 0.24-0.92, respectively (Tables 1, 2 and 4), indicating that generally, random-forest models can have a greater explanatory ability in forage nutritional quality and pools than multiple linear regression and recursive-regression trees. Our findings indicated that the R 2 values of random-forest models under grazing conditions were generally greater than those under fencing conditions ( Table 2). That is, the combination of climate data and growing-season maximum normalized-difference vegetation index may better model CP, EE, Ash, ADF, NDF and WSC contents and pools in the alpine grasslands of Tibet under grazing conditions. However, the data sources of the random-forest models under fencing conditions are different from those under grazing conditions in this study. Moreover, the R 2 values of the multiple linear regression and recursive-regression trees under grazing conditions were lower than those under fencing conditions for some cases (Tables 1 and 4). Therefore, further studies are needed on whether or not the combination of climate data and the growing-season maximum normalizeddifference vegetation index had closer relationships with forage nutritional quality and pool than single climate data in the alpine grasslands of Tibet.
Climate data had closer relationships with forage nutritional quality than nutritional storage under fencing conditions. Moreover, the predicted accuracies of forage nutritional pools were lower than those of forage nutritional quality in most cases. These findings may be related to the fact that forage nutritional pools were equal to the multiplication of forage nutritional quality and aboveground plant biomass. As is well known, aboveground plant biomass is generally directly related to air temperature, precipitation and radiation in alpine grasslands on the Tibetan Plateau [12,20,21,23,33,34]. However, to the best of our knowledge, climate data alone cannot capture 100% of the variation in aboveground plant biomass in alpine grasslands on the Tibetan Plateau [12,31]. On the other hand, both the explained ability of climate data in aboveground plant biomass and the predicted accuracy of aboveground plant biomass models derived from climate data may be lower than those of nutritional quality in alpine grasslands on the Tibetan Plateau [28].
Our findings implied that different methods can have different predicted accuracies, which may be mainly due to their different algorithms. For example, firstly, the correlations between the dependent variables and independent variables are assumed to be linear for the multiple-linear-regression method. In contrast, the random-forest models cannot directly assume the linear or nonlinear relationships among independent variables, nor their linear or nonlinear relationships with dependent variables. Actually, the forage nutritional quality variables cannot always have linear relationships with climate data [4,12,35]; this, in turn, may result in the lower predicted accuracies of forage nutritional quality and pool for the multiple linear regressions, but the higher predicted accuracies of forage nutritional quality and pool for the random-forest models in this study. Secondly, ntree and mtry are two important and key parameters of random-forest models, and they are generally not fixed values. Users can try their best to find the relatively optimal combination of ntree and mtry by adjusting the parameters of ntree and mtry, which can guarantee that a relatively optimal random-forest model is obtained. Similarly, the number of support vectors, as one key parameter of a support-vector machine, is not generally a fixed value. By contrast, there are no adjustable parameters for the multiple-linear-regression method. Thirdly, randomness is one obvious characteristic of the random-forest model, but not for the multiple-linear-regression method.
Our findings suggest that random-forest models can have the greatest potential ability to predict forage nutritional quality and pools in alpine grasslands among the four methods. The predicted accuracies of the CP, EE, Ash, ADF, NDF and WSC contents and pools from the random-forest models in this study were greater than those reported by previous studies [2,25,[36][37][38][39]. For example, a previous simulated study demonstrated that the relative biases of CP, EE, ADF and NDF were 5.97%, 2.30%, 3.82% and 3.82% in alpine grasslands of the Qilian Mountain, respectively [36], which were greater than those in this study (Table 5). The R 2 values were 0.23-0.76 for the CP pool of model building in alpine grasslands of the Haibei region [36], which were lower than the R 2 values (0.83-0.93) in this study. Moreover, in this study, the default parameters of ntree and mtry were not used, whereas the relatively optimal combination of ntree and mtry parameters was used. Therefore, the random-forest models established by this study can be used to predict changes in forage nutritional quality and pool under climate change and grazing conditions in alpine grasslands on the Tibetan Plateau.

Conclusions
In this study, we established and evaluated four methods (i.e., random-forest models, multiple linear regression, support-vector machines and recursive-regression trees) for the CP, EE, Ash, ADF, NDF and WSC contents and pools in the alpine grasslands of Tibet, under fencing or grazing conditions, respectively. The predicted accuracies of these random-forest models were relatively higher than those of the other three methods. The simulated nutritional quality using random-forest models can have high accuracies, with relative biases of <2.00% and an RMSE of <0.99%. The simulated nutritional storage using random-forest models can also have high accuracies, with relative biases of <6.00% and an RMSE of <4.50 g m −2 . The linear slopes were within 0.55-1.12, 0.94-1.02, 0.47-1.04 and 0.77-1.30 between the observed and simulated nutritional quality and pool variables using multiple linear regressions, random-forest models, the support-vector machine and recursive-regression trees, respectively. The simulated nutritional quality and pool variables from multiple linear regressions, random-forest models, the support-vector machine and recursive-regression trees can explain 47-100%, 93-100%, 56-100% and 57-100% of the observed nutritional quality and pool variables, respectively. Therefore, the established random-forest models can be used to model nutritional quality and storage in alpine grasslands on the Tibetan Plateau.

Data Availability Statement:
The datasets presented in this article are not readily available because the datasets generated for this study are only available on request to the corresponding author.