Spatio-Temporal Variation Characteristics of Aboveground Biomass in the Headwater of the Yellow River Based on Machine Learning

: Accurate estimation of the aboveground biomass (AGB) of grassland is a key link in understanding the regional carbon cycle. We used 501 aboveground measurements, 29 environmental variables, and machine learning algorithms to construct and verify a custom model of grassland biomass in the Headwater of the Yellow River (HYR) and selected the random forest model to analyze the temporal and spatial distribution characteristics and dynamic trends of the biomass in the HYR from 2001 to 2020. The research results show that: (1) the random forest model is superior to the other three models (R 2val = 0.56, RMSE val = 51.3 g/m 2 ); (2) the aboveground biomass in the HYR decreases spatially from southeast to northwest, and the annual average value and total values are 176.8 g/m 2 and 20.73 Tg, respectively; (3) 69.51% of the area has shown an increasing trend and 30.14% of the area showed a downward trend, mainly concentrated in the southeast of Hongyuan County, the northeast of Aba County, and the north of Qumalai County. The research results can provide accurate spatial data and scientiﬁc basis for the protection of grassland resources in the HYR.


Introduction
The grassland ecosystem is the largest terrestrial biosystem on the earth's surface, accounting for about 40% of the total land area [1], and plays an important role in global carbon cycle and climate regulation [2]. Grassland plays a role in modulating climate, windresistant sand, maintaining soil and ecological balance, and the economic development of the pastoral area, thus maintaining the sustainable development of animal husbandry [3,4]. The current status and changing trend of grassland aboveground biomass reflects whether the utilization of grassland is scientific and reasonable, which is the focus of ecological environment protection and the sustainable development of animal husbandry [5,6].
The aboveground biomass (AGB) can be predicted by direct methods (by harvesting the biomass) and by indirect methods (including the use of remote sensing tools). The direct harvest method is more accurate in a small area, but it is time-consuming and labor-intensive, difficult to achieve on a large-scale and long-term sequence, and will cause certain ecological damage to the grassland [7]. In contrast, remote sensing technology has low-cost and can monitor the current status and dynamic changes of grassland resources. In the 1980s, NOAA/AVHRR data were used, by previous authors, to estimate grassland production in grasslands. Diallo et al. [8] used AVHRR data to predict natural grasslands in the Sahel region of Africa. In recent years, the method of establishing a biomass statistical regression model through the vegetation index is widely used. The vegetation index most commonly used for aboveground biomass monitoring is the normalized difference vegetation index (NDVI), but it has many problems [9], such as saturation and the impact on the soil background in low vegetation coverage areas. Therefore, scholars have proposed other vegetation indexes, such as EVI, MSAVI, NDVGI, SAVI, etc. Different vegetation indexes have their own characteristics, and it is difficult to prove which vegetation index is superior. Statistical models are divided into parametric and non-parametric models. According to different regression variables, parameter models can be divided into linear and nonlinear regression models. Paruelo et al. [10] combined NDVI data with the measured biomass on the ground to monitor the grassland biomass in the grasslands of the central United States and found that the power function regression model is more suitable for local biomass monitoring than the linear regression model. The simple one-dimensional curve model is usually unable to achieve the high-precision fitting of biomass, while multivariate linear simulations do not allow for large correlations between variables; otherwise, it will cause multicollinearity problems and affect the modeling effect. Because there are often significant correlations between vegetation indices, it is difficult for multiple linear models to accurately estimate biomass.
There are many vegetation indices that can be used for AGB estimation; however, the variation of AGB is not only influenced by a single factor but also by a variety of factors, such as soil, climate, and topography factor, etc. In previous studies, estimating AGB using only a single type of factor may have introduced errors and uncertainty. Idowu et al. [11] found that for models with a non-unique number of variables, machine learning algorithms may be more effective than ordinary regression models. Machine learning methods, such as random forest (RF) regression, can integrate multiple factors and learn highly complex nonlinear mappings for estimating AGB. Wang et al. [12] estimated the AGB of the Loess Plateau using the RF algorithm, combining 233 field observations and their corresponding climate and remote sensing data from 2011-2013, and compared the two methods, showing better accuracy from the neural network, compared to the multiple linear regression model. Xie et al. [13] used neural network model and multiple linear regression model to estimate the aboveground biomass of grassland and compared the two methods. The neural network model is better than the multiple linear. Research by Yang et al. [14] showed that the accuracy of the BP-ANN model for grassland AGB inversion was significantly higher than that of the traditional multi-factor inversion model in the THR. Zeng et al. [4] developed an AGB estimation model suitable for the Qinghai-Tibet Plateau, based on the random forest algorithm. The R 2 of the model is equal to 0.86.
The HYR is located in the northeastern part of the Qinghai-Tibet Plateau, which is called "the water tower of the Yellow River" [15]. In recent years, desertification, due to overgrazing and other grassland problems, has become more and more serious. Effective monitoring of regional grassland is an urgent problem to be solved. Achieving the rapid and effective monitoring of grassland changes is not only an urgent need, to determine a reasonable grassland stock carrying capacity, but also a realistic need to ensure the development of animal husbandry. Measured data are the key parameter for constructing the optimal model in this study. The reasonable setting of sample points will directly affect the results of model simulation [16]. The high altitude, harsh natural conditions, inaccessibility, high mountains, and wetlands in the HYR made the collection of samples difficult. Therefore, the distribution of sample points should be representative of the general characteristics of the HYR, but also take into account the difficulty of sample collection and sampling costs.
The study uses 501 measured AGB data, combined with factors for climate, vegetation indices, soil texture, and topography, to construct a data set of environmental variables affecting AGB in the HYR. The main goals are: (1) comparing and analyzing the accuracy of various machine learning model algorithms to build an inversion model suitable for estimating grassland AGB in the HYR; (2) simulate the spatial distribution and temporal trend of grassland AGB in the HYR from 2001 to 2020.

Study Area
The HYR is located at 32 • 30 -35 • 0 N, 95 • 50 -103 • 30 E (Figure 1), covering an area of approximately 12.37 × 10 4 km 2 [17], with an elevation between 2680-6248 m. The roads are rugged and few, and the average temperature for many years was concentrated between −12.7 • C and 5.6 • C (Figure 2a). The average rainfall from 2001 to 2020 was 579.50 mm, and the rainfall decreased from southeast to northwest. (Figure 2b), mainly concentrated in June to September, accounting for 90% of the total annual rainfall. Grassland is an important land cover type. Alpine meadows account for 61.40% of the entire study area; followed by alpine steppes (12.04%), mountain meadows (9.29%), wetlands (6.56%), bare land (5.52%), and arable land (2.11%); other land use types account for less than 1.0% ( Figure 1). The main characteristics of the soil are thin soil layer, coarse soil quality, more gravel, and coarse sand in the soil (14.50% clay, 39.90% powder, and 38.17% sand) (Figure 2j-l); animal husbandry is the main activity.
curacy of various machine learning model algorithms to build an inversion model suitable for estimating grassland AGB in the HYR; (2) simulate the spatial distribution and temporal trend of grassland AGB in the HYR from 2001 to 2020.

Study Area
The HYR is located at 32°30′-35°0′N, 95°50′-103°30′E (Figure 1), covering an area of approximately 12.37 × 10 4 km 2 [17], with an elevation between 2680-6248 m. The roads are rugged and few, and the average temperature for many years was concentrated between −12.7 °C and 5.6 °C (Figure 2a). The average rainfall from 2001 to 2020 was 579.50 mm, and the rainfall decreased from southeast to northwest. (Figure 2b), mainly concentrated in June to September, accounting for 90% of the total annual rainfall. Grassland is an important land cover type. Alpine meadows account for 61.40% of the entire study area; followed by alpine steppes (12.04%), mountain meadows (9.29%), wetlands (6.56%), bare land (5.52%), and arable land (2.11%); other land use types account for less than 1.0% (Figure 1). The main characteristics of the soil are thin soil layer, coarse soil quality, more gravel, and coarse sand in the soil (14.50% clay, 39.90% powder, and 38.17% sand) (Figure 2j-l); animal husbandry is the main activity.

AGB Data Source
We use the conditional Latin hypercube sampling (cLHS) method [18], which is restricted by the cost layer, to arrange the sampling points as evenly as possible in the HYR. The restricted variables, when using cLHS to lay out sample points, include meteorological data (such as rainfall and temperature), topographic data (such as elevation and slope), and soil data (such as powder and sand); the above variables are uniformly resampled to a spatial resolution of 500 m [18,19].
Road, elevation, and slope were considered the main environmental factors limiting the sample collection; weights of 0.5, 0.3, and 0.2 were assigned to the three variables, respectively, and cost layers were generated ( Figure A1). The cost layers were created using ArcGIS software, and the cLHS sample points were laid out in R Studio software using the data package cLHS. The aboveground biomass data were collected in the growing seasons (July-August) in 2005, 2006, 2015, 2018, and 2020; one or two 0.5 × 0.5 m sample boxes were set up in sample plots with good vegetation community consistency, and three or four 0.5 × 0.5 m sample boxes were averaged in areas with more complex and uneven vegetation distribution. Samples were selected to represent, as much as possible, the vegetation growth of the whole area; all its aboveground biomass was collected, including  Table 1).  Table 1).

Environmental Variable Data Source
Environmental data parameters considered in this study are described in Table 1. The meteorological data comes from the spatial interpolation data calculated by our team. First, we downloaded the daily temperature, daily precipitation, daily wind speed, sunlight, longitude, latitude, and elevation data of 836 weather stations across the country, from 2001 to 2020 (http://data.cma.cn, accessed on 18 February 2021) ( Figure A2). Then, the daily value data are calculated as monthly value data, according to the longitude, latitude, and altitude of each weather station; the weather station point data was interpolated into raster data, with a spatial resolution of 1 km, using ANUSPLIN software [20]. THE NDVI, EVI, and LAI were obtained from MODIS data products, and the data were downloaded through Google Earth Engine (https://code.earthengine.google.com/, GEE, accessed on 27 January 2021). The soil-related data were obtained from the Harmonized World Soil Database (HWSD) (https://data.isric.org/, accessed on 13 January 2021). The ε, PAR, and FPAR data were calculated using Zhu Wenquan's improved CASA model [21]. All raster images in this paper use Krasovsky 1940 Albers projection. The main environmental variable factors are shown in Figure 2. According to the division of environmental variables in the STEP-AWBH model, the environmental variables collected in this study were included in four categories (Table 1)  LASSO is a variable screening method with the advantage of statistical accuracy of variable selection and its computational feasibility. LASSO has the ability to handle multicollinearity data, by automatically selecting the most important independent variables and narrowing down the less important predictor variables to zero, so as to retain only the useful features [22].

Modeling Methods
The four modeling methods include partial least squares regression (PLSR), support vector machines (SVM), RF, and back-propagation artificial neural network (BP-ANN). PLSR is an extension of multiple linear regression. Compared with ordinary least squares regression, PLSR calculations are more reliable [23,24]. SVM was originally used for classification and has been widely used to solve the classification and regression problems of nonlinear and high-dimensional data [25]. In this paper, the radial basis function was used as the kernel function and the genetic algorithm was used to optimize two key parameters (gamma and cost) [26]. The most important parameters in the BP-ANN model are the number of neurons and hidden layers, which need to be repeatedly tested and continuously tuned [27]. RF is a machine learning algorithm that trains classification samples through decision trees and makes predictions based on the results of the classification [28]. The two most important parameters in the RF algorithm are the number of regression trees and the number of predictors at each node [29], which need to be optimized [3]. All machine learning models in this paper were trained, parameter tuned, and simulated in the Matlab 2017a software.

Model Accuracy Evaluation
Five-fold cross validation was used to evaluate the predictive performance of the model results. The coefficient of determination of the training dataset (R 2 train ), root mean square error of the training dataset (RMSE train ), R 2 of the validation dataset (R 2 val ), and the validation dataset RMSE (RMSE val ) were used to evaluate the predictive ability of each model.

Trend Analysis
The Theil-Sen (SEN) median trend analysis and Mann-Kendall test were used to determine the significance of the trend of AGB [30,31]. When SEN slope > 0, it reflected an increasing trend, and there was a decreasing trend for when the opposite was true (SEN slope < 0). The significance of the changing trend of AGB was tested at the confidence level α = 0.05 (Table 2). When Z > |1.96| means confidence level α < 0.05, the change trend was significant, Z < |1.96| means confidence level α > 0.05, the change trend was not significant [32].

Correlation Analysis between AGB and Environmental Variables
The minimum value of all samples was 6.5 g/m 2 (located in Maduo County) and the maximum value was 428.02 g/m 2 (located in Hongyuan County). The average value of all samples was 171.39 g/m 2 . Overall, the average value of all samples decreased from the southeast to the northwest, and the county with the largest average value was Hongyuan County (with 292.59 g/m 2 ), and the county with the smallest average value was Maduo County (with 102.02 g/m 2 ) ( Table 3).

Correlation Analysis between AGB and Environmental Variables
The correlation matrix represents the environmental variables and AGB, shown by the correlation coefficient (Figure 3). Positive correlations are shown in yellow, and negative correlations are shown in green. Among them, FPAR, XMSAVI, and XSAVI had the highest correlation with AGB (R = 0.59), except that there was a good correlation between various vegetation indices and the measured AGB; the correlations were all greater than 0.5, indicating that the vegetation index can better characterize the grassland in the HYR. Among the geographic location factors, longitude (R = 0.44) was more correlated with AGB than latitude (R = −0.28), elevation (R = −0.25), and slope (R = 0.05). Among the meteorological factors, annual rainfall (R = 0.46) and illumination (R = −0.44) better responded to the information of AGB, compared to mean annual TEM (R = 0.32) and K (R = 0.13). Among the soil factors, the correlation between SOC and AGB was significantly higher than that between CLAY and SAND. In addition, there were also strong correlations among the environmental variables; for example, the correlation coefficient between NDVI and coverage was 1, while the correlation between NDVI and EVI also reached 0.89. Therefore, if all variables are included in the machine learning algorithm for simulation, it will lead to the generation of the multicollinearity problem. Based on this problem, we did variable screening to determine the best input feature set, which is crucial to reduce model overfitting and improve model performance.
than that between CLAY and SAND. In addition, there were also strong correlations among the environmental variables; for example, the correlation coefficient between NDVI and coverage was 1, while the correlation between NDVI and EVI also reached 0.89. Therefore, if all variables are included in the machine learning algorithm for simulation, it will lead to the generation of the multicollinearity problem. Based on this problem, we did variable screening to determine the best input feature set, which is crucial to reduce model overfitting and improve model performance.

Model Accuracy Evaluation
LASSO selected 8 environmental variables. Longitude was selected in the geographic location factor, EVI, XMSAVI, and XNDVGI were selected in the vegetation index factor, illumination was selected in the meteorological factor, and CLAY2 and SOC were selected in the soil factor. In the CASA model, the FPAR variable was also selected. In the validation set of the four grassland AGB estimation models (Figure 4), RF had the highest R 2 val and lowest RMSE val of 0.56 and 51.3 g/m 2 , respectively. The second was BP-ANN, which were 0.41 and 67.4 g/m 2 , respectively. The R 2 val of SVM and PLSR were both 0.39, and the RMSE val was 67.35 g/m 2 and 66.78 g/m 2 , respectively. The reason may be that (compared with models, such as BP-ANN and PLSR) the RF model introduces randomization to deal with the decision tree problem, which significantly improves the model's resilience to noise [33,34]. Therefore, the RF method was used to estimate the AGB of the HYR from 2001 to 2020.

Spatial and Temporal Dynamic Distribution of AGB
In terms of spatial distribution, the spatial distribution of AGB in the HYR from 2001 to 2020 showed an obvious heterogeneity (Figure 5), showing a clear trend of gradual decline from southeast to northwest. The 2001-2020 AGB annual mean and AGB annual total values were 176.8 g/m 2 and 20.73 Tg, respectively. The maximum biomass was 306 g/m 2 , mainly concentrated between 50-250 g/m 2 , accounting for 76.13%, distributed in Tongde, Zeku, Henan, Maqu, Jiuzhi, Gand, Dari, and MaQin counties. The percentage of those exceeding 250 g/m 2 was 23.49% were distributed in Hongyuan, Aba, and Ruoerge counties. The smallest AGB values were distributed in Qumarai, Maduo, and Chengduo counties, in the northwestern region of the HYR. The average values from 2001 to 2020 were 82.46 g/m 2 , 98.23 g/m 2 , and 116.71 g/m 2 . The AGB values decreased spatially from southeast to northwest, and this trend was closely related to the rainfall, elevation, and vegetation distribution types in the region.
in the soil factor. In the CASA model, the FPAR variable was also selected. In the validation set of the four grassland AGB estimation models (Figure 4), RF had the highest R 2 val and lowest RMSEval of 0.56 and 51.3 g/m 2 , respectively. The second was BP-ANN, which were 0.41 and 67.4 g/m 2 , respectively. The R 2 val of SVM and PLSR were both 0.39, and the RMSEval was 67.35 g/m 2 and 66.78 g/m 2 , respectively. The reason may be that (compared with models, such as BP-ANN and PLSR) the RF model introduces randomization to deal with the decision tree problem, which significantly improves the model's resilience to noise [33,34]. Therefore, the RF method was used to estimate the AGB of the HYR from 2001 to 2020.

Spatial and Temporal Dynamic Distribution of AGB
In terms of spatial distribution, the spatial distribution of AGB in the HYR from 2001 to 2020 showed an obvious heterogeneity (Figure 5), showing a clear trend of gradual decline from southeast to northwest. The 2001-2020 AGB annual mean and AGB annual total values were 176.8 g/m 2 and 20.73 Tg, respectively. The maximum biomass was 306 g/m 2 , mainly concentrated between 50-250 g/m 2 , accounting for 76.13%, distributed in Tongde, Zeku, Henan, Maqu, Jiuzhi, Gand, Dari, and MaQin counties. The percentage of those exceeding 250 g/m 2 was 23.49% were distributed in Hongyuan, Aba, and Ruoerge

Trend Analysis of AGB Changes
The AGB in the HYR showed an increasing trend in most areas from 2001 to 2020, with 69.51% of the area increasing, 0.35% of the area remaining stable, and 30.14% of the area decreasing in AGB ( Figure 6). Among them, the significantly increased areas accounted for 19.2%, mainly in the northern parts of Tongde, Zeku, Henan, and Maqu coun-

Trend Analysis of AGB Changes
The AGB in the HYR showed an increasing trend in most areas from 2001 to 2020, with 69.51% of the area increasing, 0.35% of the area remaining stable, and 30.14% of the area decreasing in AGB ( Figure 6). Among them, the significantly increased areas accounted for 19.2%, mainly in the northern parts of Tongde, Zeku, Henan, and Maqu counties, as well as the southern parts of Maduo (Table 4). A total of 50.31% of the regional AGB showed a slight increase in trend, mainly distributed in Maqin and Dari counties. In addition, nearly 25.87% of the regional AGB showed a slight downward trend, mainly concentrated in northern Hongyuan county and southern Ruoerge, as well as southern Maqu, northern Quemalai, and Maduo counties. A significant decrease was observed in 4.27% of the areas, which were distributed in the southeastern part of Jiuzhi county, the northwestern part of Maqin, and the northern parts of Qumalai and Maduo counties ( Figure 6).

Compared with Traditional Univariate Model
The aboveground biomass in the HYR has a positive correlation with the rem sensing vegetation index (Table 5), indicating that it is basically feasible to use the vege tion index to monitor grassland biomass. Meanwhile, there were differences in the re

Compared with Traditional Univariate Model
The aboveground biomass in the HYR has a positive correlation with the remote sensing vegetation index (Table 5), indicating that it is basically feasible to use the vegetation index to monitor grassland biomass. Meanwhile, there were differences in the relationship between different vegetation indices and biomass, among which NDVI and AGB had the highest cubic correlation coefficient (R = 0.46); the curvilinear model better reflected the relationship between vegetation indices and measured biomass, compared to the univariate linear regression model. Overall, the correlation coefficient fit of NDVI and AGB was better than that of EVI, MSAVI, NDVGI, and SAVI. It shows that the use of NDVI cubic polynomial model is a simple, effective, and practical method to monitor AGB. Generally speaking, compared with the univariate model, RF improves the simulation accuracy of grassland AGB in the HYR, has higher stability, better predictive ability, strong application prospects, and advantages. Although the random forest model is significantly better than the traditional model, the custom model constructed in this study is only applicable to the HYR, for the time being. Whether it is applicable to other grassland types, such as cold desert, temperate grassland, and tropical grassland, remains to be further verified.

Reasons for AGB Changes
In the past 20 years, the vegetation status in the HYR has shown an overall improvement trend, which is consistent with existing research results [35][36][37]. Existing studies have shown that [38], affected by global warming, the HYR has generally shown a warm and humid trend in recent years, which provides favorable climatic conditions for vegetation restoration in the HYR. In this study, although temperature and rainfall were not screened as important environmental variables for simulating grassland AGB, the correlation coefficients of longitude with rainfall and temperature were 0.71 and 0.82, respectively; rainfall and temperature showed a gradual decrease from west to east, due to the geographical location of the Yellow River source area, so longitude better represented the climate influence factors in the region. The areas with the least AGB are located in Maduo County and the northern part of Chenduo County. The area is less affected by the southwest monsoon, has high altitude, poor water, and heat conditions, and therefore, poor vegetation growth. Lowering temperature will inhibit the growth and development of vegetation in this area, and rising temperature is conducive to the accumulation of dry matter quality of plants, which is a favorable condition for vegetation growth. This is the climatic factor of AGB decreasing from southeast to northwest in the HYR.
Human activities are equally important drivers of changes in grassland dynamics [39][40][41]. Since 2003, the region began to implement the policy of returning grazing to the grasslands. The purpose of the plan is to improve the grassland ecological environment, promote a virtuous cycle of grassland ecology, and maintain national ecological security. In 2011, the first phase of the ecological award policy was implemented in the region, and the second phase of the ecological award policy was started in 2016, which increased the amount of grass storage balance incentives and subsidies for grazing ban, compared to the first phase; in terms of policy ecological effects, the study showed that the ecological supplementation policy played a positive role in the ecological restoration and improvement of the Yellow River source area [42]. Thus, favorable policies may be the artificial reason why 69.51% of the regional AGB in the study area showed an increase. As the AGB in Zeku, Tongde, and Magu have shown an increasing trend in recent years ( Figure 6), this is due to the fact that the area has adopted water and soil conservation prevention and protection projects and comprehensive desertification control projects, through fences, policy closures, water and soil conservation monitoring, and improvement of the legal system. It also carries out ecological restoration treatment of mines, power stations, and road construction projects, so that the regional ecological environment is significantly improved, and the cover of severely degraded areas is significantly increased [43]. On the other hand, areas of AGB degradation, occurring in this study, were mainly concentrated in Aba, Hongyuan, and Maduo counties; according to research by Xu et al. [44], there are overgrazing situations in Aba and Hongyuan, while the total number of livestock in Henan, Tongde, Xinghai, and Zeku counties has shown a downward trend. Therefore, the increase in grazing pressure will reduce the carrying capacity of the grassland, which will degrade the grassland. In degraded areas of AGB, since animal husbandry is the main source of income for local herders, it is obviously unrealistic to implement large-scale grazing prohibitions. We should graze scientifically, determine animals with grass, manage scientifically, use grassland rationally, and restore natural grassland productivity [45], focus on the construction of pasture fences, determine a reasonable pasture carrying capacity, scientifically configure the herd structure, implement zoning and grazing in turns, so that natural pastures can be recuperated, and develop grassland irrigation and fertilization in areas where conditions permit, and improve natural pastures [46].

Advantages and Limitations of Custom Models
First, the machine learning algorithm can incorporate a variety of environmental variables that affect AGB into the model simulation study. In addition, the input variable data are easy to obtain, and the model can independently learn and adjust parameters, which has strong applicability. The model can be adapted to the study of different research areas, and even with the support of sufficiently complete data, the prediction range can be further improved. Although our research provides a comprehensive assessment of the AGB in the HYR, there are still some limitations. First, due to inconvenient transportation, there are relatively few sampling points in the western region; this may result in inaccurate AGB estimates for the region. Secondly, as a black box operation, the learning process of a machine learning model is uncontrollable. Third, when the random forest is performing regression, it cannot make predictions that exceed the range of the training set data; it does not perform as well in classification, because it cannot give a continuous output [47].

Conclusions
The simulation results of the RF algorithm are better than the SVM, BP-ANN, PLSR, and univariate vegetation index model (R 2 val = 0.56, RMSE val = 51.3 g/m 2 ). From 2001 to 2020, the AGB of the HYR showed a spatially decreasing trend from the southeast to northwest, and the proportion of the area with increasing grassland AGB reached 69.51% in the past 20 years, while the proportion of the area with decreasing grassland AGB was 30.14%, mainly in Hongyuan, Ruoerge, Jiuzhi, and Qumalai counties. This study has a high sampling site density and small model deviation, which accurately simulates the spatial distribution pattern of soil erosion in the HYR. The research results can not only provide a scientific basis for the grassland management and protection policies in the HYR, but also extend the application of such modeling methods to the study of grassland AGB in other areas of the Qinghai-Tibet Plateau.

Acknowledgments:
We are very grateful for the constructive comments provided by the three anonymous reviewers and academic editor, who gave us a huge help during the article publication process.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In Figure A1, we showed the cost map of using the cLHS method to lay out the sample points, thus laying the foundation for the scientific collection of samples to make belts. In Figure A2, we showed the distribution of meteorological stations, which is the basis for all meteorological data in this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In Figure A1, we showed the cost map of using the cLHS method to lay out the sample points, thus laying the foundation for the scientific collection of samples to make belts. In Figure A2, we showed the distribution of meteorological stations, which is the basis for all meteorological data in this paper.