Estimation of the Grassland Aboveground Biomass of the Inner Mongolia Plateau Using the Simulated Spectra of Sentinel-2 Images

: An accurate assessment of the grassland aboveground biomass (AGB) is important for analyzing terrestrial ecosystem structures and functions, estimating grassland primary productivity, and monitoring climate change and carbon / nitrogen circulation on a global scale. Multispectral satellites with wide-width advantages, such as Sentinel-2, have become the inevitable choice for the large-scale monitoring of grassland biomass on regional and global scales. However, the spectral resolution of multispectral satellites is generally low, which limits the inversion accuracy of grassland AGB and restricts further application in large-scale grassland monitoring. For this reason, a satellite-scale simulated spectra method was proposed to enhance the spectral information of the Sentinel-2 data, and a simulated spectrum (SS) was constructed using this algorithm. Then, the raw spectrum (RS) of Sentinel-2 and the SS were used as data sources to calculate the vegetation indices (RS-VIs and SS-VIs, which represent vegetation indices calculated using RS and SS data, respectively), and the multi-granularity spectral segmentation algorithm (MGSS) was employed to extract spectral segmentation features (RS-SF and SS-SF, which represent segmentation features extracted by RS and SS data, respectively). Following this, these spectral features (RS-SF, SS-SF, RS-VIs, and SS-VIs) were used to estimate AGB by partial least-squares regression (PLSR) and multiple stepwise regression (MSR) models. Finally, the spatial distribution law and the reasons for the latitude zone of the Inner Mongolia Plateau were analyzed, based on precipitation, the average temperature, topography, etc. The conclusions are as follows. Firstly, the SS has more spectral information and its sensitivity to biomass is higher than the RS of Sentinel-2 in some bands, and the correlation between the SS-VIs and biomass is higher than that of the RS-VIs. Secondly, among the spectral features, the most accurate AGB estimation was obtained by SS-SF, which gave R 2 = 0.95. The root mean square error (RMSE) was 10.86 g / m 2 and the estimate accuracy (EA) was 82.84% in the MSR model. Additionally, RMSE = 10.89 g / m 2 and EA = 82.78% in the PLSR model. Compared with the traditional estimation methods using RS and VI, R 2 was increased by at least 0.2, RMSE was reduced by at least 14.08 g / m 2 , and EA was increased by 22.26%. Therefore, the simulated spectra method can help improve the estimation accuracy of AGB, and a new idea about regional and global large-scale biomass acquisition is provided.


Introduction
Aboveground biomass (AGB) refers to the mass of plant organic matter per unit area, which is a key trait indicating plant health and ecosystem function [1]. Grassland, as one of the most widely distributed terrestrial ecosystem types in the world, accounts for 26% of the global land surface area [2]. Timely and accurate monitoring of AGB changes in grasslands is of great significance for maintaining the terrestrial ecosystem structure and function, maintaining the balance of vegetation productivity and surface energy, monitoring the carbon/nitrogen circulation of global ecosystems, and global climate change [3].
Traditional manual methods of measuring AGB, which include field collection and measurement in the laboratory, cause great damage to vegetation. Moreover, it is difficult to obtain AGB data due to their time-and labor-consuming approaches on a large-scale, which cannot meet the needs of scientific research [4]. Remote sensing provides a non-destructive, more efficient and convenient, and time-continuous biomass survey method [5]. The plant physiological structure, chemical composition, and environmental stress (water, salt, temperature, etc.) have an impact on the reflectivity of leaves and the canopy. Therefore, information such as the chlorophyll, C, N, and P content and AGB can be obtained according to spectral characteristics [6,7]. The methods of estimating AGB based on remote sensing are mainly divided into physically-based models and empirical regression techniques [8]. Physically-based models (such as SAIL, PROSPECT, PROSAIL, etc.) consider plant leaf structural parameters, the chlorophyll content, the leaf area index, soil factors, and other factors [9] to simulate AGB, but various parameters in these models are often not readily available, which limits the applicability of physical models. There is a strong correlation between the vegetation index and biomass in NDVI [10], and the empirical regression techniques based on that correlation relationship have become a common means to estimate AGB [11,12]. For example, Cabrera-Bosquet et al. [13] used NDVI to successfully estimate the wheat biomass and nitrogen content, and improved the efficiency of genetic breeding. Furthermore, Wang et al. [14] calculated the NDVI, RVI, GNDVI, EVI, SAVI, etc., vegetation indexes based on HJ-CCD images, and then used the random forest regression algorithm to estimate the biomass of wheat in South China, greatly reducing the estimated cost of biomass.
Although remote sensing technology can efficiently estimate AGB, the selection of reasonable remote sensing data can reduce the biomass estimation error and biomass estimation cost. Commonly used remote sensing platforms are airborne and spaceborne UAV, which can be flexibly equipped with a variety of hyperspectral cameras; have the advantages of a high spatial resolution, spectral resolution, and temporal resolution [15,16]; and play an important role in the field of precision agriculture, such as in crop yield estimation [17], growth monitoring [18], and genetic breeding [19,20]. However, the endurance of UAV and other airborne platforms greatly limits the acquisition of airborne hyperspectrum, multispectrum, and other data. In addition, in large-scale grassland monitoring, there is still a reliance on satellite (hyperspectrum and multispectrum) means to monitor the ground. Compared with multispectral satellites, hyperspectral satellites have relatively few types, narrow widths, and high prices, so are also difficult to apply on large scales [21]. Dube et al. [22] compared the ability of ETM+ and OLI to invert AGB, and concluded that the medium-scale resolution satellite Landsat8 has a higher precision in biomass prediction on a regional scale. Li et al. [23] used Landsat8 and Sentinel-2 images to successfully predict the chlorophyll content (CHL) and dry matter content (DMC) of alpine grassland in Tibet, China. These examples show that multispectral satellites have great potential in biomass prediction on a large scale. It is undeniable that multispectral images lack spectral detail information compared to hyperspectral data, which affects the prediction accuracy of plant traits. Immitzer et al. [24] classified farms and forests and identified species with Sentinel-2 data, and the cross-validation results showed that the accuracy of crop type identification was 76%, while the accuracy of tree species identification only reached 65%. Ustin et al. [25] argue that the relationship between vegetation traits and reflectivity is reduced when the leaf scale is extended to the canopy scale, and Schaepman [26] and Vohland [27] believe that there are uncertainties with regards to the inversion accuracy of AGB on the canopy scale due to the canopy structure and observation angle.
Therefore, enhancing image spectral detail information is an effective way of improving the accuracy of grassland traits such as biomass when using multispectral satellites with a wider range and lower resolution to monitor grassland.
Improving the ability of the multispectral satellite to invert biomass has become a hot spot in the accurate estimation of grassland biomass on a large scale. At present, there are three main directions for improving the accuracy of biomass estimation: (i) combining vegetation index and physiological structure parameters to estimate biomass, such as the grass height and image texture [8,28,29]; (ii) combination with other data sources, such as SAR [30]; and (iii) enhancing the sensitivity of the vegetation index to AGB. The application of image texture features in forestry and precision agriculture improves the prediction accuracy of biomass [31], but the grassland and woodland have different structural features, the height of the grass is relatively low, and the grass canopy is uniform. In medium-scale resolution images, the grass texture features become inconspicuous, and it is difficult to obtain centimeter-level grass height data or digital elevation model (DEM) information. The contribution rate of grassland texture features to biomass inversion under large-scale conditions is still unclear and requires further exploration; SAR is suitable for all-weather ground monitoring, even on rainy days. Combining SAR data with the vegetation index can reduce the saturation problem of the vegetation index in high coverage areas and improve the estimation accuracy of biomass [32]. However, it is expensive and not suitable for large-scale grassland monitoring needs. The estimation accuracy of AGB can be improved by constructing the narrow-band hyperspectral vegetation index [8]. The spectral index (NDVI, etc.) constructed by wide-band satellites will have saturation problems in high vegetation coverage areas, which increases the difficulty of biomass inversion, while the narrow-band vegetation index can effectively alleviate the shortage of the wide-band vegetation index [33], so it is mostly achieved by replacing the multispectrum with the hyperspectrum. Furthermore, the use of continuous wavelet decomposition, spectral segmentation, and fractional differential methods can also enhance the sensitivity of the spectrum to some extent [34][35][36], in order to improve the accuracy of the model. However, multispectral satellites are limited by wavelength and wave width, and it is difficult to directly construct a narrow-band vegetation index and derivative analysis. How to construct derivative spectra suitable for multispectral satellites is key to further improving the accuracy of large-scale biomass inversion.
In recent years, the Mongolian Plateau has been affected by global warming, increased evaporation, and decreased precipitation, and the grassland ecosystem has degraded. It is a sensitive area of the northern grassland under global climate change [37]. The timely monitoring AGB is of great significance in monitoring global climate change, and the protection of grassland ecosystems is of great significance. The objectives of this study are to propose a satellite-scale simulated spectral method to improve the ability of multispectral satellites to obtain AGB on a large scale, and to evaluate the accuracy of the results. This article is mainly divided into three parts: (i) a simulated spectral method is proposed to improve the spectral information of Sentinel-2 images; (ii) spectral features related to biomass, the raw spectrum, and the simulated spectrum are extracted; segmentation features are extracted by the multi-granularity spectral segmentation algorithm, wide-band vegetation index, and narrow-band vegetation index; and partial least-squares regression (PLSR) and multiple stepwise regression (MSR) biomass estimation models are established; and (iii) the best model for estimating the AGB, evaluating the accuracy, and mapping the biomass distribution is selected.

Study Area
The study area (Figure 1, 109

Field Data
The field measurements were collected from 3-14 July 2018, at the growth peak of the grasslands. To investigate the AGB of grassland in the Inner Mongolia Plateau, 10 sampling areas were established from east to west, each with an area of about 0.5 ha. Each sampling area produced eight samples, resulting in a total of 80 sampling points, and the distribution of sampling points is shown in Figure 1. All sampling points were scattered, and the distance between the sampling points was in the range of 15-55 m, so that the sampling points corresponded to the Sentinel-2 (10 m) pixels. Each sampling point was set to a 1 × 1 m 2 , and after the ASD spectrometer had acquired the sample spectrum, the quadrat was harvested in the full pasture, labeled in bags, and refrigerated, and the center position of each sampling point was recorded by GPS. The dry matter content was measured after 48 h of drying in the oven at 65 °C in the laboratory.
The field grassland canopy hyperspectral reflectance was measured by using a field spectrometer (ASD Field Spec 2 spectrometer, Analytical Spectral Devices, Boulder,CO, USA). The ASD spectrometer can provide uniform visible near-infrared (325-1075 nm) data, a spectral resolution <3.0 nm, and a field of view angle of 25°. Before the measurement, the whiteboard matched with the instrument was used to calibrate the ASD radiation and convert the ASD to the relative reflectance mode. Under clear sky conditions between 10:00 and 14:00, at a height of 1.3 m above the ground (the coverage of the spectrometer on the ground is about 0.26 m 2 ), the pure spectrum of bare soil and grass which had a coverage of more than 95% was measured, and the measurement was repeated three times for each point, with the average value being taken as the final reflectance.

Remote Sensing Data
Sentinel-2 is becoming increasingly used in environmental monitoring. Sentinel-2A was successfully launched in 2015 with a width of 295 km and the spatial resolution is divided into three levels of 10, 20, and 60 m. Sentinel has a higher time resolution, and it can achieve repeated observations in the same area for 5 days, in cooperation with Sentinel-2B. It has 13 bands of the visible to short-wave infrared range, and three "red-edge" bands are set at 705-783 nm for vegetation reflectance characteristics [39]. The spatial and temporal resolution of Sentinel-2 is better

Field Data
The field measurements were collected from 3-14 July 2018, at the growth peak of the grasslands. To investigate the AGB of grassland in the Inner Mongolia Plateau, 10 sampling areas were established from east to west, each with an area of about 0.5 ha. Each sampling area produced eight samples, resulting in a total of 80 sampling points, and the distribution of sampling points is shown in Figure 1. All sampling points were scattered, and the distance between the sampling points was in the range of 15-55 m, so that the sampling points corresponded to the Sentinel-2 (10 m) pixels. Each sampling point was set to a 1 × 1 m 2 , and after the ASD spectrometer had acquired the sample spectrum, the quadrat was harvested in the full pasture, labeled in bags, and refrigerated, and the center position of each sampling point was recorded by GPS. The dry matter content was measured after 48 h of drying in the oven at 65 • C in the laboratory.
The field grassland canopy hyperspectral reflectance was measured by using a field spectrometer (ASD Field Spec 2 spectrometer, Analytical Spectral Devices, Boulder, CO, USA). The ASD spectrometer can provide uniform visible near-infrared (325-1075 nm) data, a spectral resolution <3.0 nm, and a field of view angle of 25 • . Before the measurement, the whiteboard matched with the instrument was used to calibrate the ASD radiation and convert the ASD to the relative reflectance mode. Under clear sky conditions between 10:00 and 14:00, at a height of 1.3 m above the ground (the coverage of the spectrometer on the ground is about 0.26 m 2 ), the pure spectrum of bare soil and grass which had a coverage of more than 95% was measured, and the measurement was repeated three times for each point, with the average value being taken as the final reflectance.

Remote Sensing Data
Sentinel-2 is becoming increasingly used in environmental monitoring. Sentinel-2A was successfully launched in 2015 with a width of 295 km and the spatial resolution is divided into three levels of 10, 20, and 60 m. Sentinel has a higher time resolution, and it can achieve repeated observations in the same area for 5 days, in cooperation with Sentinel-2B. It has 13 bands of the visible to short-wave infrared range, and three "red-edge" bands are set at 705-783 nm for vegetation reflectance characteristics [39]. The spatial and temporal resolution of Sentinel-2 is better than Landsat 8, so Sentinel-2A/B images were selected as the data source. Limited by conditions such as the sampling time and cloud occlusion, nine scenes of Sentinel-2 images were downloaded from the European Space Agency ESA (website: https://scihub.copernicus.eu/dhus/#/home) and the image acquisition time was between 1 July and 13 July. There was no suitable image for the S2 sampling area. The images were pre-processed by SNAP software, such as radiometric calibration, atmospheric correction, geometric correction, resampling, etc., in which Sentinel-2 data were resampled to a 10 m spatial resolution.

The Simulated Spectrum Method
Multispectral satellites have obvious advantages in earth observation at regional and global scales. However, they have fewer spectral details to reflect the trait information of grassland compared with the hyperspectrum, which limits the prediction accuracy of biomass [40]. The hyperspectrum of the pure endmembers of green grass was acquired on the ground scale. This involved 751 bands in the visible near-infrared (325-1075 nm) range, and could more accurately reflect the grassland details than the multispectrum. If the hyperspectral advantage of the ground-scale is combined with the spatial advantage of multispectral satellites, the biomass estimation accuracy can be effectively improved on a large-scale.
Based on this, this research attempted to obtain the hyperspectrum of the pure endmembers at the main surface objects. According to the principle of mixed pixel decomposition, the fitted spectrum of the multispectral satellite is reversely derived from the pure endmembers and the abundance weight of the ground features. This study area is mainly natural grassland in the growing season, and the proportion of green grass is relatively high. It can be assumed that the main endmember types on the ground are bare soil and green grass. Additionally, the fitted spectrum can be calculated as follows: where λ is the wavelength, Fit λ is the fitted hyperspectral according to the abundance of ground objects, f j is the weight of the class j in unit pixels, and p λ,j is the reflectivity of the class j pure pixel at λ. For large-scale grassland, it can be assumed that the mixed spectrum of Sentinel-2 unit pixels is only affected by grass and bare soil, and the vegetation coverage can be calculated by the following Formula (2) [41]: where f v is the vegetation coverage (%); f s is the proportion of bare soil in the unit pixels (%); NDVI is the normalized vegetation index calculated by multispectral satellite imagery; and V veg and V soil are NDVI values calculated under pure pixels of grass and bare soil, respectively. In this article, these pure endmembers are observed spectra by ASD spectrometer. Moreover, the hyperspectral images (e.g., EO-1 Hyperion) can also be used to obtain pure endmember spectrum in unknown areas.
In Formula (2), the V veg and V soil are used as threshold values of vegetation and soil, respectively, to calculate the vegetation coverage of the entire area. NDVI is a vegetation index based on the spectral characteristics of vegetation with high reflection in the near-infrared band and strong absorption in the red light band. It has a high sensitivity to vegetation changes and can effectively distinguish vegetation from soil [42]. Its calculation formula is as follows: where R nir and R red are the reflectivity of multispectral satellites in near-infrared and red light bands, respectively. Then, the Fit can be optimized by the multispectral satellite. The reason for this is that the Fit is fitted by the hyperspectral endmembers and their abundance weight, and its essence is point data reflecting the characteristics of endmembers. However, for the Sentinel-2 image pixel (10 × 10 m 2 ), its reflection characteristics are determined by the ground objects and environment within 100 m 2 , such as texture characteristics, the ground object abundance, topographic undulations, vegetation types, etc., and its essence is a synthesis of area data. Therefore, using the multispectral satellite to calibrate Fit can reflect the overall characteristics of the pixel scale and increase the spectral detail information of the main feature endmembers. The simulated spectral process is shown in Formulas (4)- (7).
Here, SS λ is the simulated spectrum at λ, there are 751 bands, and the hyperspectral detail is added based on the raw multispectrum, and D λ is the correction coefficient at λ, which is used to solve the inconsistency between the reflectivity trend of SS and true spectral information of Sentinel-2. The correction coefficient can be calculated by Formula (5).
Here, VL λ is the vector line of multispectral satellite reflectance, which could be considered a continuous piecewise function of remote sensing satellite (Sentinel-2) reflectivity, and TL λ is the trend line of Fit λ , which was segmented by the central wavelength of a multispectral satellite.
Here, W k is the central wavelength of the k-th band of the multispectral (k ≥ 2), a k and b k are the slope and intercept of TL in the interval [W k−1 , W k ], respectively, and the calculation formula is as follows: where Fit W k is the value of Fit at the central wavelength W k , and Fit W k ∈ Fit.
The satellite reflectivity better reflects the spectral information of the canopy on a large-scale, and the main purpose of this study was to increase the spectral details of the multispectrum by the simulated spectrum method. As shown in Figure 2, it was assumed that VL is the raw reflectance obtained by the satellite and Fit is the spectral curve fitted according to the weight of the ground objects. The Fit was divided based on the center wavelength of the multispectrum, and the TL was formed by connecting the first and last points of each interval. D (D = VL − TL) represents the distance between the trend line and the satellite reflectivity. When D > 0, TL is too low; when D < 0, TL is too high; and when D = 0, TL is close to VL. The Fit was modified based on distance D, and the SS was finally obtained.

Spectral Segmentation Features
The multi-granularity spectral segmentation method [43] is a new spectral feature extraction technique that was deduced by Kang et al., based on the spectral high-order binary coding (HOBC) method. HOBC is an encoding method that realizes data compression and restoration by transforming the data storage format, such as transforming 16-bit data into 12-bit data. Many details will be lost in the conversion process. These easily lost spectral details may be more sensitive to biomass. Based on this, this method of data segmentation is used to extract spectral information, and its essence is a continuous operation of de-averaging the spectrum to highlight the weak spectrum information within a certain wavelength range. It is assumed that the spectral vector can be approximated by the sum of the products of the M (M > 0)-order segment value and its coefficients, and the residual error decreases with the increase of M: where Hi is the ith order segment value of the spectral vector V, ∈ {−1,1} ; βi is the coefficient of Hi, βi > 0; and RM(V) is the residual vector of the M-order quantized estimation of V. Through convex optimization, the analytical solution of Formula (8) is [44] where i = 1, 2, …, M; N is the number of spectral bands; L1 represents 1-norm; and sign() is a sign function, where, when T ≥ 0, sign(T) = 1, and when t < 0, sign(T) = −1.
According to Formulas (8) and (9), the spectral segmentation process can be summarized as the following formula: where is the segmentation feature (SF) after the i-th spectral segmentation, which also includes the features used to construct the biomass estimation model in this article.
is the segment line for dividing SF at the i-th spectral segmentation.
is the approximate spectrum after i-th spectrum segmentation, and its essence is to gradually approach the original hyperspectral by continuously accumulating from the segment values. The AS is mainly used for data compression of the hyperspectral. Different from AS, the essence of SF is to continuously de-average the original

Spectral Segmentation Features
The multi-granularity spectral segmentation method [43] is a new spectral feature extraction technique that was deduced by Kang et al., based on the spectral high-order binary coding (HOBC) method. HOBC is an encoding method that realizes data compression and restoration by transforming the data storage format, such as transforming 16-bit data into 12-bit data. Many details will be lost in the conversion process. These easily lost spectral details may be more sensitive to biomass. Based on this, this method of data segmentation is used to extract spectral information, and its essence is a continuous operation of de-averaging the spectrum to highlight the weak spectrum information within a certain wavelength range. It is assumed that the spectral vector can be approximated by the sum of the products of the M (M > 0)-order segment value and its coefficients, and the residual error decreases with the increase of M: where H i is the ith order segment value of the spectral vector V, H i ∈ {−1, 1} M ; β i is the coefficient of Hi, β i > 0; and R M (V) is the residual vector of the M-order quantized estimation of V. Through convex optimization, the analytical solution of Formula (8) is [44] where i = 1, 2, . . . , M; N is the number of spectral bands; L1 represents 1-norm; and sign() is a sign function, where, when T ≥ 0, sign(T) = 1, and when t < 0, sign(T) = −1.
According to Formulas (8) and (9), the spectral segmentation process can be summarized as the following formula: where SF i is the segmentation feature (SF) after the i-th spectral segmentation, which also includes the features used to construct the biomass estimation model in this article. SL i is the segment line for dividing SF at the i-th spectral segmentation. AS i is the approximate spectrum after i-th spectrum segmentation, and its essence is to gradually approach the original hyperspectral by continuously accumulating from the segment values. The AS is mainly used for data compression of the hyperspectral.
Different from AS, the essence of SF is to continuously de-average the original hyperspectral to highlight the detailed information in the spectrum, which helps to improve the estimation accuracy of biomass. A vegetation spectrum is taken as an example to illustrate the multi-granularity spectral segmentation algorithm (MGSS) (Figure 3), and the part about the segmentation features is explained. Figure 3a presents a typical vegetation hyperspectral curve, its segmentation lines are regarded as itself, and the approximate spectral vector is 0. Figure 3b shows the segment line (SL 1 = β 1 × H 1 ) of the first segmentation. Then, the first segmentation feature (SF 1 = SF 0 − SL 1 ) was calculated (Figure 3d), and Figure 3c displays the approximate spectrum of the original spectrum after one segmentation. Next, β 2 and H 2 were calculated according to Formula (9), and the second segmentation baseline (SL 2 = β 2 × H 2 ) was calculated according to Formula (10), after which the second segmentation feature (SF2) was extracted. SF1-SF30 were extracted by repeating the above process. In this study, the MGSS method was used to extract the spectral segmentation features of the raw spectrum and the simulated spectrum at different scales. These features have good effects on the biomass and element content.

Selection of Vegetation Indices
The vegetation indices (VIs) are correlated with AGB, and according to previous studies and relevant references, a total of 19 VIs were used to estimate AGB ( Table 1). The RS-VIs were calculated based on the raw spectrum (RS) of Sentinel-2 and SS-VIs were calculated based on SS.  MTCI

Above-Ground Biomass Estimation and the Accuracy Evaluation Index
The flowchart in Figure 4 illustrates the experimental methodology used in the study, including the data, method, results, and evaluation. The data mainly include the measured AGB and endmember spectrum data. The simulated spectrum method was proposed to increase the amount of information in the multispectrum, and different spectral features of RS-SF, SS-SF, RS-VIs, and SS-VIs were extracted from RS and SS. Then, those features were used to estimate grassland AGB by PLSR and MSR models.

Above-Ground Biomass Estimation and the Accuracy Evaluation Index
The flowchart in Figure 4 illustrates the experimental methodology used in the study, including the data, method, results, and evaluation. The data mainly include the measured AGB and endmember spectrum data. The simulated spectrum method was proposed to increase the amount of information in the multispectrum, and different spectral features of RS-SF, SS-SF, RS-VIs, and SS-VIs were extracted from RS and SS. Then, those features were used to estimate grassland AGB by PLSR and MSR models. To evaluate the applicability of the simulated spectrum method to improve the ability of the multispectral satellite to predict grassland AGB, the coefficients of determination (R 2 ), the prediction accuracy (EA), the ratio of performance to deviation (RPD), and the root mean square error (RMSE) and its bias-standard deviation decomposition [59,60] were used.

Satellite-Scale Simulated Spectrum
As shown in Figure 5a, the blue and red lines represent the RS and SS at wavelengths of 375-1000 nm, respectively. The RS is the overall response within the range of the unit pixels (10 × 10 m 2 ), and its reflectivity is closely related to the surface environment, such as vegetation coverage, soil moisture, etc., so the SS does not change the reflectance of the multispectrum at the center wavelength, and the spectral details of the SS are supplemented based on the surface object type and its abundance. Compared with RS, SS has two advantages: (i) The SS contains more spectral information from endmembers, as shown in the black dashed box in Figure 5a, and (ii) due to the design of the sensor, Sentinel-2 did not collect data with a wavelength in the range of 375-445 nm, the pure endmembers obtained by ASD have continuous reflection information in this range, and the data of Sentinel-2 are supplemented by the spectral simulation algorithm in the range of 375-445 nm. All these reflection features may contain useful information for biomass estimation. Moreover, the correlation between biomass and SS and RS was analyzed, respectively, and the results showed that the correlation of SS is better than RS in some frequency bands (Figure 5b). For example, in the wavelength range of 560-665 nm, the R of SS is closer to −1 than RS, which is better than band 3, and in the range of 865-945 nm, SS and RS are positively correlated with biomass, and the R of SS is higher than that of RS, which also means that SS has more advantages when using different band combinations to estimate biomass. Therefore, the simulated spectrum method can increase the sensitivity information of Sentinel-2 to biomass. To evaluate the applicability of the simulated spectrum method to improve the ability of the multispectral satellite to predict grassland AGB, the coefficients of determination (R 2 ), the prediction accuracy (EA), the ratio of performance to deviation (RPD), and the root mean square error (RMSE) and its bias-standard deviation decomposition [59,60] were used.

Satellite-Scale Simulated Spectrum
As shown in Figure 5a, the blue and red lines represent the RS and SS at wavelengths of 375-1000 nm, respectively. The RS is the overall response within the range of the unit pixels (10 × 10 m 2 ), and its reflectivity is closely related to the surface environment, such as vegetation coverage, soil moisture, etc., so the SS does not change the reflectance of the multispectrum at the center wavelength, and the spectral details of the SS are supplemented based on the surface object type and its abundance. Compared with RS, SS has two advantages: (i) The SS contains more spectral information from endmembers, as shown in the black dashed box in Figure 5a, and (ii) due to the design of the sensor, Sentinel-2 did not collect data with a wavelength in the range of 375-445 nm, the pure endmembers obtained by ASD have continuous reflection information in this range, and the data of Sentinel-2 are supplemented by the spectral simulation algorithm in the range of 375-445 nm. All these reflection features may contain useful information for biomass estimation. Moreover, the correlation between biomass and SS and RS was analyzed, respectively, and the results showed that the correlation of SS is better than RS in some frequency bands (Figure 5b). For example, in the wavelength range of 560-665 nm, the R of SS is closer to −1 than RS, which is better than band 3, and in the range of 865-945 nm, SS and RS are positively correlated with biomass, and the R of SS is higher than that of RS, which also means that SS has more advantages when using different band combinations to estimate biomass. Therefore, the simulated spectrum method can increase the sensitivity information of Sentinel-2 to biomass.

Correlation Analysis of Biomass and the Vegetation Index
RS-VIs and SS-VIs were calculated by RS and SS to test the effectiveness of the simulated spectrum methods, and the correlation between the two vegetation indices and AGB were compared and analyzed. The results are shown in Figure 6. Except for the TSAVI and GNDVI, other indexes have a good correlation with AGB (R > 0.6), so the vegetation index is a relatively concise method for estimating biomass. Additionally, the correlation of SS-VIs is better than that of RS-VIs, which is more suitable for AGB estimation. Among the 19 vegetation indices, the correlation coefficients of RS-VIs and SS-VIs are equivalent in RVI and RDVI, and those of RS-VIs are slightly higher than those of SS-VIs in NDVI 705 and MSAVI. However, SS-VIs are better than RS-VIs in other indexes, with MTCI, DVI, EVI, and SAVI increasing by more than 0.1 in R. This is because SS has more spectral information related to the surface, and can select suitable bands within a sensitive range to construct a narrow-band vegetation index. Therefore, the simulated method can increase the selectivity of spectral features and improve the estimation ability of multispectral satellites when estimating biomass with a single vegetation index.

Correlation Analysis of Biomass and the Vegetation Index
RS-VIs and SS-VIs were calculated by RS and SS to test the effectiveness of the simulated spectrum methods, and the correlation between the two vegetation indices and AGB were compared and analyzed. The results are shown in Figure 6. Except for the TSAVI and GNDVI, other indexes have a good correlation with AGB (R > 0.6), so the vegetation index is a relatively concise method for estimating biomass. Additionally, the correlation of SS-VIs is better than that of RS-VIs, which is more suitable for AGB estimation. Among the 19 vegetation indices, the correlation coefficients of RS-VIs and SS-VIs are equivalent in RVI and RDVI, and those of RS-VIs are slightly higher than those of SS-VIs in NDVI705 and MSAVI. However, SS-VIs are better than RS-VIs in other indexes, with MTCI, DVI, EVI, and SAVI increasing by more than 0.1 in R. This is because SS has more spectral information related to the surface, and can select suitable bands within a sensitive range to construct a narrow-band vegetation index. Therefore, the simulated method can increase the selectivity of spectral features and improve the estimation ability of multispectral satellites when estimating biomass with a single vegetation index.

Segmentation Feature Extraction of the Spectrum
The SS enhances the spectral information of multispectral satellites by the simulated spectrum method, but it also has certain drawbacks. On the one hand, due to the addition of a large number of spectral details after being simulated, the difficulty of image storage and processing is increased; on the other hand, SS has certain data redundancy, which restricts the estimation accuracy of biomass.
The MGSS method has become an effective means to solve those problems. As shown in Figure 7, RS represents the raw spectrum of Sentinel-2, which contains 12 bands, and there are only two bands that have correlation coefficients with absolute values above 0.6. Moreover, the best R is -0.79; SS represents the stimulated spectrum, which contains 731 band features, and has 61 bands with |R||R| above 0.6, and there are four high correlation intervals at 575-703, 1025-1075, 1610, and 2190 nm. Therefore, RS has fewer spectral features than SS. However, the reflection characteristics between SS bands are extremely similar, and the redundancy reduces the data quality: the one-dimensional SS is divided into 30 features (SF1-SF30) through the MGSS method. In RS and SS, the spectrum and AGB are mainly negatively correlated, but correlations have a mixed distribution of positive and negative on SF1-SF30, which is conducive to reducing data redundancy and improving the spectral sensitivity. As the segmentation scale increases, the difference between adjacent bands gradually increases. As shown, the wavelength range with a similar correlation gradually shatters and shrinks, and finally, three highly correlated narrow bands of 705-725, 925-975, and 2190 nm are formed, which become sensitive bands related to biomass. The sequence forward selection (SFS) method can not only select the bands with a higher correlation with biomass, but also take into account the combined effect between the selected features. The sensitive bands were selected by SFS for RS, SS, and SF1-SF30, and those bands exhibited a high correlation with biomass and low correlation among each band. As shown in Figure 8, the horizontal axis is the wavelength, the vertical axis is the spectral feature type, the circles represent the selected sensitive bands, and the different colors indicate the priority order of band selection at a single scale. Among the 12 raw bands of RS, only eight bands meet the requirements after SFS selection, which are mainly concentrated near the red edge (675-875 nm). Compared with RS, SS has 16 sensitive bands to be selected, among which four visible light bands are supplemented and six bands are supplemented in near-infrared and shortwave infrared ranges. SF1-SF30 can have more than 16 bands to meet the standard, and these bands are discretely distributed between 375 and 2190 nm, indicating that the MGSS method can reduce the redundancy and mine the weak spectral information related to biomass. Through the MGSS algorithm, the redundancy between the SS bands can be reduced. Through the SFS band selection method, the sensitive bands related to biomass are selected and stored, which effectively reduces the amount of SS data.
Remote Sens. 2020, 12, 4155 13 of 23 selection at a single scale. Among the 12 raw bands of RS, only eight bands meet the requirements after SFS selection, which are mainly concentrated near the red edge (675-875 nm). Compared with RS, SS has 16 sensitive bands to be selected, among which four visible light bands are supplemented and six bands are supplemented in near-infrared and shortwave infrared ranges. SF1-SF30 can have more than 16 bands to meet the standard, and these bands are discretely distributed between 375 and 2190 nm, indicating that the MGSS method can reduce the redundancy and mine the weak spectral information related to biomass. Through the MGSS algorithm, the redundancy between the SS bands can be reduced. Through the SFS band selection method, the sensitive bands related to biomass are selected and stored, which effectively reduces the amount of SS data.

Biomass Estimation Model and Accuracy Evaluation
The accuracy of biomass prediction is the basis for determining whether the simulated spectrum method can effectively improve the utilization rate of Sentinal-2. In this paper, SS, RS, SS-VIs, and RS_VIs were used to construct MSR and PLSR biomass estimation models, and a 10-fold cross-validation approach was used to verify the accuracy. To avoid modeling errors caused by unreasonable sample partitioning, all of the AGB data were sorted into a descending sequence and divided into 10 equal parts. One-tenth of the data were used for verification, and the remaining points were used to build the model. After 10 cycles of this, the estimation accuracy was as shown in Table 2. The biomass estimation results of SS were better than those of RS. Compared with RS, the coefficient of determination (R 2 ) of SS increased from 0.77 to 0.75 in the MSR model, and 0.75 to 0.81 in the PLSR model. The accuracy of SS in the three evaluation indicators of RMSE, EA, and RPD was also improved; under the characteristics of the vegetation index, SS-VI is better than RS-VI when a single vegetation index is used to estimate biomass, and the highest R 2 of SS-VI is 0.70. However, the accuracy of SS-VI is lower than that of RS-VI when multiple vegetation indices are used as parameters for estimating biomass. It may be that when sensitive bands were selected to construct

Biomass Estimation Model and Accuracy Evaluation
The accuracy of biomass prediction is the basis for determining whether the simulated spectrum method can effectively improve the utilization rate of Sentinal-2. In this paper, SS, RS, SS-VIs, and RS_VIs were used to construct MSR and PLSR biomass estimation models, and a 10-fold cross-validation approach was used to verify the accuracy. To avoid modeling errors caused by unreasonable sample partitioning, all of the AGB data were sorted into a descending sequence and divided into 10 equal parts. One-tenth of the data were used for verification, and the remaining points were used to build the model. After 10 cycles of this, the estimation accuracy was as shown in Table 2. The biomass estimation results of SS were better than those of RS. Compared with RS, the coefficient of determination (R 2 ) of SS increased from 0.77 to 0.75 in the MSR model, and 0.75 to 0.81 in the PLSR model. The accuracy of SS in the three evaluation indicators of RMSE, EA, and RPD was also improved; under the characteristics of the vegetation index, SS-VI is better than RS-VI when a single vegetation index is used to estimate biomass, and the highest R 2 of SS-VI is 0.70. However, the accuracy of SS-VI is lower than that of RS-VI when multiple vegetation indices are used as parameters for estimating biomass. It may be that when sensitive bands were selected to construct the SS-VI, it focused more on the correlation between a single vegetation index and biomass, while ignoring the collinearity between vegetation indexes. Therefore, SS-VI is better than RS-VI in a single index model, and RS-VI is better than SS-VI in a multiple vegetation index model. There may be two factors that limit the advantages of SS. On the one hand, the VIs are simple to calculate, and only two-three dominant bands are used, which cannot fully utilize the advantages of the hyperspectral; on the other hand, there is collinearity among the VIs. In the segmentation features, affected by the amount of spectral information, the RS-SF estimation accuracy is poor. However, SS-SF fully utilizes the spectral advantages of SS, and the estimation accuracy is better than that of RS, SS, RS-VIs, and SS-VIs. In the MSR and PLSR estimation models, R 2 reaches 0.95, RMSE is less than 11 g/m 2 , EA is greater than 82.5%, and RPD is above 4.6, which shows that the model has a high stability. The relationship between the measured and estimated AGB using RS, SS, SS-SF, and MSR and PLSR is shown in Figure 9. The results indicate that the estimation accuracy of RS is the lowest, the error is large (RMSE = 24.94-25.08 g/m 2 ), the scattered points are distributed on both sides of the 1:1 straight line, and EA < 60.60%; (ii) compared with RS, the accuracy of SS inversion is improved, RMSE is reduced by 1.08-3.32 g/m 2 , and EA is increased by 1.71-5.25%. The accuracy of the SS model is higher in low biomass regions, but the error is still large in high biomass regions; and (iii) SS-SF provides a better estimation accuracy in MSR and PLSR models: R 2 > 0.95; EA > 82.84%; RMSE reduced by 14.08-14.19 g/m 2 in comparison to RS; and the scattered points are concentrated on the 1:1 line. The inclination amplitude of the SS-SF fitting line and the 1:1 line indicates that the model estimation is more accurate in the medium and low biomass environment, and it is easily underestimated in the high biomass environment. This may be because there are fewer sampling points in the high biomass area. In the RS and SS models, AGB is easily overestimated in low biomass areas and underestimated in high biomass areas, which also causes the error variance and the error bias to be generally greater than SS-SF models. In the comparison of the model accuracy, the MSR and PLSR models have similar results, indicating that the simulated spectrum and the spectral segmentation method can improve the accuracy of large-scale biomass estimation in Sentinel-2 images.

Spatial Variation of Biomass in Longitude
In this study, MSR and PLSR had an approximate estimation accuracy based on the validation set and taking into account the computational difficulty of the two models, the SS-SF+MSR model was used to construct an AGB estimation model. At the same time, to verify the ability to estimate AGB by the simulated spectrum system on a large-scale, in each sampling area (S1-S10), the AGB within a 1 km radius circle (Z1-Z10) was estimated and mapped. The original image (RGB true color synthesis) and biomass distribution are shown in Figure 10. The color from green to red represents the biomass from low to high, and the white hollow area in the figure is a thick cloud or road (masked). It can be seen that the spatial distribution of biomass content has a high degree of agreement with the location of green vegetation. Taking Z2 as an example, the green vegetation distribution area in the true-color image has a higher biomass (>150 g/m 2 ), and the areas such as shadows and roads have a low biomass (<60 g/m 2 ), which is significantly different from vegetation. It can be qualitatively concluded that the estimated biomass has a certain accuracy. Furthermore, the reliability of estimation based on the spatial location and climatic conditions of Z1-Z10 was analyzed and evaluated. Figure 11a shows the average temperature and precipitation in the sampling area from the period of grass turning green to the sampling cut-off period (1 May to 14 July), Figure 11b shows the altitudes of Z1-Z10, and Figure 11c shows the measured AGB in the sampling area and the estimated AGB in the 1 km buffer zone. The latitudes of Z1-Z10 are similar, with obvious longitude changes from east to west, and the climate characteristics transition from those of a monsoon climate to those of a continental climate. Therefore, in theory, grassland biomass will gradually decrease, and the estimation of AGB on the satellite-scale is consistent with the theory. Z1, Z2, and Z3 are located in warm and humid plains and hilly areas, with an altitude of <660 m, an average temperature of >20 °C, precipitation of >150 mm, and high biomass (>120 g/m 2 ); Z4-Z10 forage grass has poor growth conditions and is located in the Inner Mongolia Plateau, with an altitude of >950 m, the temperature is between 18 and 22 °C, there is little precipitation, and the biomass is generally below 40 g/m 2 . Furthermore, the reliability of estimation based on the spatial location and climatic conditions of Z1-Z10 was analyzed and evaluated. Figure 11a shows the average temperature and precipitation in the sampling area from the period of grass turning green to the sampling cut-off period (1 May to 14 July), Figure 11b shows the altitudes of Z1-Z10, and Figure 11c shows the measured AGB in the sampling area and the estimated AGB in the 1 km buffer zone. The latitudes of Z1-Z10 are similar, with obvious longitude changes from east to west, and the climate characteristics transition from those of a monsoon climate to those of a continental climate. Therefore, in theory, grassland biomass will gradually decrease, and the estimation of AGB on the satellite-scale is consistent with the theory. Z1, Z2, and Z3 are located in warm and humid plains and hilly areas, with an altitude of <660 m, an average temperature of >20 • C, precipitation of >150 mm, and high biomass (>120 g/m 2 ); Z4-Z10 forage grass has poor growth conditions and is located in the Inner Mongolia Plateau, with an altitude of >950 m, the temperature is between 18 and 22 • C, there is little precipitation, and the biomass is generally below 40 g/m 2 .
1 Figure 11. The changes of weather, elevation, and AGB in the sampling area and buffer zone: (a) The average temperature and precipitation in the sampling area from the period of greening to the sampling cut-off period; (b) the elevation of each sampling point from east to west; and (c) the mean value of the measured AGB from the S1-S10 area vs. the estimated AGB from Z1-Z10.
Although there is a certain gap between the measured average value of biomass in the sampling area (S1-S10) and the estimated value in the buffer zone (Z1-Z10), it must be highlighted that S1-S10 are only the eight samples within each 1 × 1 m 2 . For the average value of biomass, the selection of plots is random and localized, while Z1-Z10 is the AGB within 1 km of each plot of S1-S10, and the estimation results are more global. Among them, S2 and Z2, S3 and Z3, and S6 and Z6 are quite different. The main reason for this is that the influence of thin clouds in these areas is more serious than in other sample areas, which restricts the estimation accuracy. In summary, the SS-SF + MSR model can accurately estimate biomass information on a large scale.

Biomass Estimation Model Based on the Simulated Spectrum
Multispectral satellites have the advantages of a large width, low cost, and high-efficiency in-ground monitoring, and are still the main means of large-scale grassland character monitoring [21]. However, their low spectral resolution has become the key point that restricts the accuracy of multispectral satellite inversion of grassland characteristics. In the past, satellite image selection was often aimed at pursuing efficiency or precision. For efficiency, the inversion error caused by the lowspectral resolution of multispectral images is always ignored. For the inversion accuracy, selecting hyperspectral imagery as the data source does improve the estimation accuracy [61]. However, hyperspectral satellite images are expensive, with scarce data and a narrow width [62], so are difficult to apply in a wide range. Additionally, many scholars focus on studying the scale effect of remote sensing images to improve the inversion accuracy of low-resolution satellites [63]. In satellite scale conversion, scaling up is easy to operate, but scaling down is still an unsolved hot spot in remote sensing [64].
Compared with the multispectrum, the higher inversion accuracy of the simulated spectrum is due to the fact that it has more spectral details, which can distinguish the ground objects from the different wavelengths. The same species has approximate spectral characteristics, so we can fit the pure pixel spectral detail features to the satellite-scale multispectrum according to the grassland vegetation type and its vegetation coverage, so that multispectral satellites can have more detailed spectral information and improve the earth-monitoring ability of multispectral satellites. The experiment in this paper preliminarily proves that this spectral derivation method can feasibly improve the precision of satellite image inversion.
The vegetation index is a key parameter for many vegetation character estimation models [13,65], and limits the estimation accuracy of AGB. VIs calculated using wide-band multispectral satellites will be supersaturated in high vegetation coverage areas [33]. The simulated spectrum algorithm can enrich the spectral information of multispectral images so that sensitive bands can be selected according to the characteristics of plants in the pseudo-hyperspectral range, and the optimal spectral index can be constructed, which can improve the prediction accuracy of key characters such as biomass.
The experiment in this paper preliminarily proves that this simulated spectrum method has certain feasibility for improving the accuracy of satellite image inversion. Compared with RS directly used for biomass estimation, the estimation accuracy of SS-SF for biomass is improved by 7-17%. Compared with Sentinel-2, the derivative transformation method may be more effective if this method is used in Landsat which has little information, and the next step will be an in-depth study to investigate this.

Uncertainties and Sources of Error
There are certain difficulties and uncertainties in the large-scale inversion of grassland traits based on remote sensing. The grassland ecosystem itself is complex and the grass species type, growth height, and growth stages (greening, heading, flowering, maturing, and yellowing) all lead to differences in grassland characteristics [66]. Moreover, the topography, climatic environment, nutrient content (N and P), water stress, etc. can also affect the spectral reflectivity of grasses, so the construction and applicability of the biomass inversion model are challenging.
The error of the data also affects the prediction accuracy, mainly including field measurement error and image error. In the process of measuring biomass on the ground, instrument errors and operational errors are inevitable, and are relatively small. There is uncertainty in the scale correspondence between the spatial resolution of the remote sensing image and field measurement [67], but it is difficult to obtain field measurement data corresponding to Landsat (30 m) or Sentinel-2 (10 m) pixels in a large scale range: clouds have always been the key to constraining image quality, especially for satellites with a low time resolution (Landsat, etc.). It is easy to have a vacuum of data during the growth stage of grassland, which is out of step with the time of field measurement data, thus increasing the difficulty of grassland biomass inversion.
In the process of large-scale grassland character prediction, those uncertainties and data errors are inevitable. In this study, the spectral derivative method was selected to improve the precision of satellite-scale biomass inversion. In future research, the spatial scale factor will be introduced into the spectral derivation process to explore the precision of grassland biomass inversion in different growth stages.

Conclusions
Using Sentinel-2 imagery as the data source, this paper proposes a biomass estimation method under the satellite-scale simulated spectrum system. This method was used to estimate the grassland biomass in the 44 • N latitude zone of the Inner Mongolia Plateau, and the estimation accuracy was evaluated qualitatively and quantitatively. The main conclusions are as follows.
To improve the spectral resolution of multispectral satellite images, a simulated spectral method is proposed, which combines the advantages of endmember hyperspectral information and Sentinel-2 space to construct a satellite-scale pseudo-hyperspectral image (SS-Sentinel). Compared with RS, SS has more spectral details, and the correlation between reflection features of some bands and biomass is better than that of RS, and the SS-VIs based on SS are better than RS-VIs.
Based on SS, the segmentation feature SS-SF, which was extracted by the MGSS method, was employed to construct MSR and PLSR AGB estimation models, and the estimation accuracy of AGB was greatly improved. The AGB was estimated by RS features, resulting in R 2 = 0.75 and RMSE = 24.94-25.08 g/m 2 upon verification. Compared with RS, the R 2 of SS-SF increased by 0.2, RMSE decreased by 14.08-14.19 g/m 2 , and EA increased by 22.26-22.42%. Therefore, the simulated spectrum method can improve the inversion accuracy of grassland biomass by multispectral satellites, and provide a new idea for the accurate inversion of regional and global large-scale biomass.