Is Spectral Unmixing Model or Nonlinear Statistical Model More Suitable for Shrub Coverage Estimation in Shrub-Encroached Grasslands Based on Earth Observation Data? A Case Study in Xilingol Grassland, China

: Due to the effects of global climate change and altered human land-use patterns, typical shrub encroachment in grasslands has become one of the most prominent ecological problems in grassland ecosystems. Shrub coverage can quantitatively indicate the degree of shrub encroachment in grasslands; therefore, real-time and accurate monitoring of shrub coverage in large areas has important scientiﬁc signiﬁcance for the protection and restoration of grassland ecosystems. As shrub-encroached grasslands (SEGs) are a type of grassland with continuous and alternating growth of shrubs and grasses, estimating shrub coverage is different from estimating vegetation coverage. It is not only necessary to consider the differences in the characteristics of vegetation and non-vegetation variables but also the differences in characteristics of shrubs and herbs, which can be a challenging estimation. There is a scientiﬁc need to estimate shrub coverage in SEGs to improve our understanding of the process of shrub encroachment in grasslands. This article discusses the spectral differences between herbs and shrubs and further points out the possibility of distinguishing between herbs and shrubs. We use Sentinel-2 and Gao Fen-6 (GF-6) Wide Field of View (WFV) as data sources to build a linear spectral mixture model and a random forest (RF) model via space–air–ground collaboration and investigate the effectiveness of different data sources, features and methods in estimating shrub coverage in SEGs, which provide promising ways to monitor the dynamics of SEGs. The results showed that (1) the linear spectral mixture model can hardly distinguish between shrubs and herbs from medium-resolution images in the SEG. (2) The RF model showed high estimation accuracy for shrub coverage in the SEG; the estimation accuracy ( R 2 ) of the Sentinel-2 image was 0.81, and the root-mean-square error ( RMSE ) was 0.03. The R 2 of the GF6-WFV image was 0.72, and the RMSE was 0.03. (3) Texture feature introduced in RF models are helpful to estimate shrub coverage in SEGs. (4) Regardless of the linear spectral mixture model or the RF model being employed, the Sentinel-2 image presented a better estimation than the GF6-WFV image; thus, this data has great potential to monitor shrub encroachment in grasslands. This research aims to provide a scientiﬁc basis and reference for remote sensing-based monitoring of SEGs.


Introduction
Fractional vegetation cover is a direct quantitative indicator reflecting the growth status of vegetation; it is an important parameter for estimating and monitoring the physical and chemical characteristics of vegetation [1,2].In recent decades, due to factors such as climate change and human-induced alterations in land-use patterns, grasslands in the arid and semi-arid areas of Inner Mongolia, China, have gradually degraded, causing shrub vegetation to expand over large areas.On the grassland substrate, patches of shrub vegetation have gradually formed, which is known as shrub-encroached grasslands (SEGs) [3][4][5][6].At present, there are 5.1 million hectares of SEGs of Caragana microphylia in Inner Mongolia [7].With further increase in shrub vegetation, the species composition of the original grassland ecosystem and the competition between shrub and herbaceous vegetation for nutrients will change.This is expected to significantly affect the structure and function of the entire ecosystem in arid and semi-arid regions.Shrub coverage quantitatively represents the degree of shrub encroachment in grasslands [8].Therefore, accurate, real-time, and large-scale estimation of shrub coverage in SEGs is of great significance for the protection and restoration of the original ecosystem in SEGs.
Traditional methods to measure shrub coverage entail field measurements, such as the needling method [9], which not only requires a large amount of manpower and material resources, but also has slow investigation speed and can cause certain damage to vegetation.Additionally, it cannot estimate shrub coverage on a large scale, and thus poses many limitations to feasible application.With the continuous development of remote sensing technology, the use of multispectral and hyperspectral remote sensing imaging has opened the possibility of real-time and large-scale measurement of shrub coverage.At the same time, remote sensing with unmanned aerial vehicles (UAVs) serves as a bridge between aerospace remote sensing and ground surveys, providing a large number of samples for constructing vegetation coverage estimation models [10].In recent years, many studies have used drone aerial images to construct vegetation coverage estimation models [11][12][13][14].Liu et al. [15] used DJI drones to obtain large sample data for the Hulunbuir Grassland; they estimated vegetation coverage using maximum entropy and genetic algorithms and validated their findings with ground survey data, which indicated high estimation accuracy.This method can yield large volumes of ground validation data for remote sensing-based estimation of shrub coverage.The most commonly available remote sensing images on the market are medium-resolution multispectral images that are prone to spectral mixing and other issues.Therefore, scholars may choose to use mixed spectral decomposition models to estimate vegetation coverage.Cao et al. [16] estimated shrub coverage in the Mu Us Sandland of Wushen Banner, Inner Mongolia.They used spectral mixing models and linear regression models with HJ-1B images for shrub coverage estimation.The results showed that the spectral mixing model performed better than linear regression models.Yang et al. [17] used Landsat TM data to analyze the relationship between different components obtained from mixed spectral models and vegetation coverage.The results from their study indicated that it is feasible to estimate the coverage of desert grasslands using a linear spectral mixing model, especially for the mixed pixels.Liu et al. [18] used the Landsat OLI image data to study Xianghuang Banner, Inner Mongolia; they employed the multiple endmember spectral mixture analysis and linear models integrating shrub phenological characteristics to estimate shrub coverage in SEGs.The results indicated certain spectra differences in shrub and herbaceous vegetation in SEGs; the differences make the identification between them possible.Currently, machine learning methods have achieved good results in estimating vegetation coverage [19,20].Yang et al. [21] considered the Baiyangdian-Daqinghe Basin as their research object; they combined multi-source data, such as Sentinel-2, Sentinel-1, and DEM, to achieve vegetation coverage inversion based on the random forest regression model method.The results showed that as compared with the inversion results extracted with traditional NDVI pixel dichotomy, the accuracy of vegetation coverage estimated via the random forest regression model was higher.Ge et al. [22] estimated vegetation coverage in alpine areas using support vector regression (SVR) algorithms based on MODIS data.The results showed that the SVR models monitoring the study area based on nine factors performed best, with R 2 = 0.75 and RMSE = 6.85%, in alpine grassland cover.Previous research focused more on the distinguishability between herbaceous and bare soil and paid less attention to the distinguishability between different vegetation types.However, in SEGs, spectral heterogeneity between different vegetation types needs to be considered.Therefore, the feasibility of spectral mixing models and machine learning methods estimate for shrub coverage estimate under different shrub coverage levels are needed to be further explored.
Sentinel-2 and GF6-WFV have certain applications in vegetation coverage estimation [23][24][25][26].Compared to other low-resolution satellites, Sentinel-2 and GF6-WFV have higher spatial resolution and band characteristics, which help distinguish between shrubs and herbs.Additionally, Sentinel-2 and GF6-WFV are easier to obtain and have a shorter revisit period, avoiding time errors caused by inconsistent sample and image acquisition times.When facing interference from clouds and mist, it is also easier to obtain images of similar time periods to replace them.Based on these factors, this study uses remote sensing images from UAVs to obtain shrub coverage as ground verification data, extracted band reflectance, vegetation index and texture features using Sentinel-2 and GF6-WFV remote sensing image data.By establishing a spectral mixture model and a random forest model through space-air-ground collaboration approach, the accuracy of these models was compared and analyzed across different coverage levels.Ultimately, the optimal model for estimating shrub coverage was determined via remote sensing.The overall goal of these efforts was to provide a theoretical and practical basis for large-scale, real-time monitoring of SEGs, and sustainable utilization of grassland resources in the local area.

Study Area
Xilingol is in the middle of Inner Mongolia (41  [27][28][29] (Figure 1).The terrain of Xilingol is dominated by high plains; the terrain is high in the south and low in the north, and the height above mean sea level is approximately 1000 m.The climate type belongs to the arid and semi-arid continental climate in the middle temperate zone.The annual average temperature is 0-4 • C, and the annual average precipitation is 150-400 mm, decreasing from southeast to northwest [30].The natural grassland in this area mainly consists of Stipa capillata, Artemisia frigida, Cleistogenes squarrosa, and Allium bidentatum as the dominant species; the main tree species are Ulmus pumila; shrubs are dominated by Caragana microphylia, which is mainly distributed in Zhengxiangbai Banner and Xianghuang Banner.With the increase in shrub coverage, the grassland in this area gradually deteriorates and results in significant ecological protection pressure.

Spectral Data Measurements
This study used the ASD FieldSpec Pro FR 2500 Spectrometer (The equipment comes from Analytica Spectra Devices in the United States) from 26 July to 13 August 2022 to obtain the spectral endmember of specific bare soil, shrub vegetation, herbaceous vegetation, and shadows for each sample plot.The instrument collected spectra in the wavelength range of 350-2500 nm, totaling 2150 wavebands.To prevent the influence of different solar zenith angles on reflectivity and ensure sufficient sunlight, the time period with clear weather and low wind force shall be selected for spectrum measurement (10:00-14:00).The main steps were as follows: the probe was placed on various typical ground objects of interest (such as Caragana microphylla shrub vegetation, herb vegetation, quicksand, and shadow) in each sample plot, and spectra were collected downward at 0.1-0.2m close to its surface [31].To ensure the representativeness and applicability of spectral endmembers in the study area, their number should not be less than 10 as frequently as possible.In the end, a total of 47 spectra of shrub vegetation, 45 spectra of herbaceous vegetation, 22 spectra of bare soil, and 10 spectra of shadow were obtained.

Spectral Data Measurements
This study used the ASD FieldSpec Pro FR 2500 Spectrometer (The equipment comes from Analytica Spectra Devices in the United States) from 26 July to 13 August 2022 to obtain the spectral endmember of specific bare soil, shrub vegetation, herbaceous vegetation, and shadows for each sample plot.The instrument collected spectra in the wavelength range of 350-2500 nm, totaling 2150 wavebands.To prevent the influence of different solar zenith angles on reflectivity and ensure sufficient sunlight, the time period with clear weather and low wind force shall be selected for spectrum measurement (10:00-14:00).The main steps were as follows: the probe was placed on various typical ground objects of interest (such as Caragana microphylla shrub vegetation, herb vegetation, quicksand, and shadow) in each sample plot, and spectra were collected downward at 0.1-0.2m close to its surface [31].To ensure the representativeness and applicability of spectral endmembers in the study area, their number should not be less than 10 as frequently as possible.In the end, a total of 47 spectra of shrub vegetation, 45 spectra of herbaceous vegetation, 22 spectra of bare soil, and 10 spectra of shadow were obtained.

Shrub Coverage Data
In this study, shrub coverage was measured through drone imagery (from 26 July to 13 August 2022).The drone model was the DJI Phantom 4 Multispectral (P4M), which is equipped with a multispectral sensor independently produced by the DJI Company, FC6360, with effective pixels of 2.08 million.It can obtain five multispectral bands at once, namely blue, green, red, red edge, and near-infrared [32][33][34].In this study, a total of 15

Shrub Coverage Data
In this study, shrub coverage was measured through drone imagery (from 26 July to 13 August 2022).The drone model was the DJI Phantom 4 Multispectral (P4M), which is equipped with a multispectral sensor independently produced by the DJI Company, FC6360, with effective pixels of 2.08 million.It can obtain five multispectral bands at once, namely blue, green, red, red edge, and near-infrared [32][33][34].In this study, a total of 15 drone images were obtained with an aircraft flying at an altitude of 50 m, a heading overlap rate of 80%, and a sidetracking overlap rate of 65%.During the flight, network real-time kinematics (RTKs) was used for real-time positioning, and Agisoft PhotoScan Professional 1.1 software was used for preprocessing.Finally, a digital ortho map of the drone was obtained, and the normalized vegetation index (NDVI) was calculated.By combining visual interpretation and multiple experimental adjustments, an NDVI threshold was set for each scene image.When the NDVI exceeded this threshold, it indicated the pixel to denote an area with shrub vegetation and vice versa.Finally, a series of quadrats were set on the UAV Orthophoto Image (The quadrats' size was 30 m × 30 m).To avoid homogeneity caused by very small distances between quadrats, one quadrat was set every 30 m on average.Shrub coverage could be obtained by counting the proportion of shrub pixels in each quadrat.A total of 129 shrub coverage sample plots were obtained, with an average coverage of 7.7%, with most of the shrub coverage being within 10%.This study refers to the results of the literature research and field investigations and divides the shrub coverage in the study area into three categories.The specific information is shown in Table 1.Sentinel-2 data were downloaded from the Copernicus data center of the European Space Agency (https://scihub.copernicus.eu,accessed on 20 April 2023).It contained 13 spectral bands with a maximum spatial resolution of 10 m, a width of 290 km, and a revisit period of 5 d [37].This study obtained six Sentinel-2 L2A level datasets based on the location and time of the field survey points.The L2A level data was processed through atmospheric and terrain corrections based on L1C level data, which can directly obtain atmospheric bottom reflectance data [38].At the same time, this study used SNAP 9.0 software to resample the image to a resolution of 10 m.The extraction of reflectance and the calculation of the vegetation index in the later stages were facilitated by finally exporting the image to the environment for visualizing images (ENVI) recognizable format.

GF6-WFV
GF6-WFV data were downloaded from the Land Observation Satellite Data Service Platform of China Resources Satellite Application Center, which included eight spectral fluctuations with a spatial resolution of 16 m, a width of 850 km, and a revisit period of 4 d [39].This study obtained three GF6-WFV level 1A datasets based on the location and time of field survey points.The ENVI 5.3 software was used to preprocess the image, mainly for its radiometric calibration, atmospheric correction, orthophoto correction, and geometric precision correction (using Sentinel-2 L2 A level data as the reference image, which had undergone sub-pixel level geometric precision correction with error controlled within 0.5 pixels) [40].

Methods
Shrub vegetation was the research object in this study.Firstly, we obtained images from UAV and spectral endmember data of ground objects through field shrubs, calculated the shrub coverage, and analyzed the differences between spectral endmember data.We obtained two kinds of remote sensing image data simultaneously with the acquisition of field data, calculated vegetation index and texture, and then built linear spectral mixture models and random forest models, respectively.The goal was to evaluate the accuracy of the model, determine the optimal model for remote sensing estimation of shrub cover, and explore the estimation accuracy of the model under different shrub cover levels.The phenomena observed in the study and the specific processes are shown in Figure 2.

Linear Spectral Mixing Model
The linear spectral mixture model (LSMM) is a simplification of the spectral mixture model in the actual scene.It offers the advantages of simplicity, convenient calculation, clear physical meaning, etc.; therefore, it is widely used in mixed pixel decomposition of remote sensing images [41,42].It assumes that there is no interaction between the reflectance spectra collected by the sensor; that is, the mixed spectrum is a linear combination of the spectral endmembers and their abundance in the sample plot.The resultant model can be described with Equation (1): Remote Sens. 2023, 15, x FOR PEER REVIEW of field data, calculated vegetation index and texture, and then built linear spectral mixture models and random forest models, respectively.The goal was to evaluate the accuracy of the model, determine the optimal model for remote sensing estimation of shrub cover, and explore the estimation accuracy of the model under different shrub cover levels.The phenomena observed in the study and the specific processes are shown in Figure 2.

Linear Spectral Mixing Model
The linear spectral mixture model (LSMM) is a simplification of the spectral mixture model in the actual scene.It offers the advantages of simplicity, convenient calculation, clear physical meaning, etc.; therefore, it is widely used in mixed pixel decomposition of remote sensing images [41,42].It assumes that there is no interaction between the reflectance spectra collected by the sensor; that is, the mixed spectrum is a linear combination of the spectral endmembers and their abundance in the sample plot.The resultant model can be described with Equation (1): Here, 0  1 and ∑  = 1 ,  which represents the mixed spectrum of the ith band;  represents the abundance of the j spectral endmember;  represents the jendmember spectrum corresponding to the i-th band;  represents the error term, reflecting the random noise situation of the data; n represents the number of effective spectral bands; and m represents the number of endmembers.Given the mixed spectral vector R and the endmember spectral vector m, the endmember abundance (a) can be achieved using the fully constrained least squares (FCLS), i.e., Equation (2) [43,44].Ji [45] demonstrated that the feasibility and performance of LSMM in estimating vegetation coverage in arid areas were representative of the selection of the best model, and the unmixing accuracy of the FCLS algorithm was higher than other algorithms.Therefore, this study uses the FCLS algorithm for LSMM unmixing to estimate shrub vegetation coverage.Here, 0 ≤ a j ≤ 1 and ∑ m j=1 a j = 1, R i which represents the mixed spectrum of the i-th band; a j represents the abundance of the j spectral endmember; b ij represents the j-endmember spectrum corresponding to the i-th band; ε i represents the error term, reflecting the random noise situation of the data; n represents the number of effective spectral bands; and m represents the number of endmembers.Given the mixed spectral vector R and the endmember spectral vector m, the endmember abundance (a) can be achieved using the fully constrained least squares (FCLS), i.e., Equation (2) [43,44].Ji [45] demonstrated that the feasibility and performance of LSMM in estimating vegetation coverage in arid areas were representative of the selection of the best model, and the unmixing accuracy of the FCLS algorithm was higher than other algorithms.Therefore, this study uses the FCLS algorithm for LSMM unmixing to estimate shrub vegetation coverage.

Extraction and Selection of Remote Sensing Feature Variables
This study extracted vegetation index, reflectance, and texture features from Sentinel-2 and GF6-WFV images as feature variables to estimate shrub coverage.The extracted vegetation indices mainly included the normalized vegetation index (NDVI), rational vegetation index (RVI), enhanced vegetation index, and soil-adjusted vegetation index (SAVI).The formulas to calculate the indices are presented in Table 2. Textural features were extracted (Table 3), along with reflectance of Sentinel-2 images in the blue (B 2 ), green (B 3 ), red (B 4 ), and near-infrared (B 8 ) bands, using the grayscale co-occurrence matrix in statistical analysis [46][47][48]; textural features and reflectance of GF6-WFV images were extracted in the blue (B 1 ), green (B 2 ), red (B 3 ), near-infrared (B 4 ), red edge 1 (B 5 ), and red edge 2 (B 6 ) bands.In summary, Sentinel-2 images extracted 4 reflectance features, 4 vegetation index features, and 28 single-band textural features, totaling 36 feature variables.The GF6-WFV image extracted 6 reflectance features, 4 vegetation index features, and 42 single-band textural features, totaling 52 feature variables.Note: ρ nir , ρ red , ρ green and ρ blue represent the reflectance of near-infrared, red, green and blue bands, respectively.The value of L is 0.5.This study used the recursive feature elimination (RFE) cross-validation method to reduce data redundancy, reduce model complexity, and improve model running speed by screening the feature variables inputted into the model [52,53].RFE is a greedy algorithm that seeks the optimal feature subset.The basic idea is to build the underlying model to train the initial feature set and give weight to each feature, then remove the feature with the smallest weight, form a new feature subset from other features, and then train again.The process was recursively repeated until the final required number of features was reached [54][55][56].In this study, RFE was conducted based on the random forest (RF) model with default parameters; in addition, in the cycle of 5-fold cross-validation, the number of features and feature variables corresponding to the smallest root-mean-square error (RMSE) of validation results were selected.The above process was achieved by utilizing the caret package in the R language.

Construction of RF Model
RF is a parallel enhanced machine learning algorithm proposed by Breiman in 2001, which combines the bagging method with classification and regression trees [57][58][59].Compared with other traditional parameter models, this method has a higher sample coefficient of determination (R 2 ) and a lower mean square error.At the same time, this method has strong generalization ability, stable performance, and can avoid overfitting to a certain extent.It can also directly yield the importance ranking of each feature variable during model construction as the output, making the model more interpretable and statistically reliable.This study was implemented using the RF package in the R language; 70% of the model samples were used as the training set and 30% as the test set.

Accuracy Evaluation
In order to compare and analyze the estimation effect of the LSMM and RF model on the shrub coverage, the R 2 and root-mean-square deviation (RMSE) between measured and estimated shrub coverage were used for evaluation of the model accuracy, and the calculation formula was as follows: (3) Here, p i represents the measured value of shrub coverage; pi denotes the predicted value of shrub coverage; p i denotes the average of the measured values of shrub coverage; and n represents the number of samples.R 2 reflects the degree of fit of the model to the measured data with a range of 0-1.The larger the R 2 , the higher the explanatory power of the model for the dependent variable; RMSE reflects the statistical dispersion between the measured value of shrub coverage and the predicted value of the model.The smaller the value, the better the prediction quality of the model for the dependent variable [54].

Spectral Feature Analysis
We acquired the spectra of shrubs, herbs, bare soil, and shadows in the study area through ground spectral observation.In order to eliminate the impact of changes in time and space on the spectra of various types of ground objects, five sets of spectral data were obtained for each type of ground object, i.e., shrubs, herbs, bare soil, and shadows, and the average value was considered the final spectrum.At the same time, the bands that were severely affected by the atmosphere and water vapor absorption bands were removed, while retaining spectral ranges of 350-1350 nm, 1450-1750 nm, and 2000-2350 nm [60].The spectral curves are shown in Figure 3.In Figure 3, it can be seen that the spectral curves of bare soil, vegetation, and shadow significantly differ, making it easier to distinguish between them.Due to the highly reflective nature of sandy land, the reflectance of bare soil is significantly higher than that of vegetation and shadows.Shadows have relatively low reflectivity throughout the entire wavelength range, with little fluctuation, and thus, they present significant differences in their spectral characteristics as compared to other types of ground objects.Both shrubs and herbs belong to vegetation, and their spectral curves have obvious wave crest and trough characteristics.Their waveforms are relatively similar; in the red edge band, the reflectivity of shrubs and herbs is almost the same, but in the band range of 350-700 nm and above 800 nm, the reflectivity of shrubs is slightly higher than that of grass, which has differences, making it possible to use a spectral mixture model to distinguish the two.
At the same time, to explore the differences in spectral characteristics of shrubs under different vegetation coverage, the spectral reflectance of Sentinel-2 and GF6-WFV corresponding to each interval was obtained according to the intervals indicated in Section 2.2.1.The results are presented in Figure 4.With an increase in shrub vegetation coverage (Figure 4), absorption by the Caragana microphylia shrub community gradually increased in the green and red bands, and the slope of the red edge band (i.e., the first-order derivative value of the reflectance curve corresponding to the red edge position) gradually increased.The research area was in an arid and semi-arid region with sparse vegetation coverage.Regardless of the images being obtained from Sentinel-2 or GF6-WFV, the spectral curves of shrub vegetation under various coverage levels were relatively smooth (without obvious peaks or valleys).However, with an increase in shrub coverage, the volatility of the spectral curves of shrubs with coverage levels >7% increased as compared to the coverage range of 0-7%.With an increase in shrub vegetation coverage (Figure 4), absorption by the Caragana microphylia shrub community gradually increased in the green and red bands, and the slope of the red edge band (i.e., the first-order derivative value of the reflectance curve corresponding to the red edge position) gradually increased.The research area was in an arid and semi-arid region with sparse vegetation coverage.Regardless of the images being obtained from Sentinel-2 or GF6-WFV, the spectral curves of shrub vegetation under various coverage levels were relatively smooth (without obvious peaks or valleys).However, with an increase in shrub coverage, the volatility of the spectral curves of shrubs with coverage levels >7% increased as compared to the coverage range of 0-7%.

Estimation of Shrub Coverage for Linear Spectral Unmixing
Based on the spectral differences noted among bare soil, shadow, herbaceous vegetation, and shrub vegetation, the endmember spectra shown in Section 3.1 were resampled to the spectral ranges corresponding to Sentinel-2 and GF6-WFV.The FCLS algorithm was used for linear spectral unmixing of four endmembers, and the estimated shrub coverage is shown in Table 4. Table 4 indicates that when shrub coverage was estimated using GF6-WFV image data, R 2 was 0.09 and RMSE was 0.18; when it was estimated using Sentinel-2 image data, R 2 was 0.23 and RMSE was 0.13, showing a slight improvement in the accuracy as compared to the GF6-WFV image data.Overall, the estimation results of both methods were not good.Therefore, using mixed spectral unmixing to estimate shrub coverage in SEGs with medium-resolution multispectral images is not ideal.At the same time, to explore the ability to use the mixed spectral model to estimate shrub coverage under different shrub coverage levels in SEGs, this paper further analyzed shrub coverage estimation effects of the spectral mixture model under different shrub coverage levels, and the results are summarized in Table 5.For Sentinel-2 (Table 5), when the shrub coverage was less than 3%, the spectral mixing model did not recognize the shrub due to the low shrub coverage.When the shrub coverage ranges from 3% to 7%, half of the shrub coverage was successfully estimated.When the shrub coverage was above 7%, 4/5 of the shrub coverage was successfully estimated, and the estimation accuracy was greatly improved when the shrub coverage was greater than 7%.For GF6-WFV images, when the shrub coverage was less than 3%, the spectral mixing model could not recognize the shrubs as well.When the shrub coverage was between 3% to 7%, 1/3 of the shrub coverage was successfully estimated.When the shrub coverage was above 7%, 1/2 of the shrub coverage was successfully estimated.To sum up, shrub coverage affected the estimation accuracy of the LSMM to a certain extent.With an increase in shrub coverage, the probability of the LSMM identifying the shrubs increased, and the shrubs and herbs also showed an obvious lack of differentiation.

Remote Sensing-Based Selection of Feature Variables
The feature variables extracted from Sentinel-2 and GF6-WFV images were selected using RFE 5-fold cross-validation; results are shown in Figure 5. accuracy.For the Sentinel-2 image data, seven feature variables were selected, namely SAVI and B2_mean, B2, B8_mean, NDVI, RVI, and B3_mean; i.e., the features were reduced by 80.6% as compared to the original number of features.For the GF6-WFV images, 12 feature variables were selected, namely B1_mean, B2_mean, B3_mean, B5_mean, B6_mean, B4_mean, B1, B6, B5, B4, B2, and B5_correlation.Here, features were reduced by 76.9% as compared to the original number of features.

Estimation of Shrub Coverage Using the RF Model
Characteristic variables selected via Sentinel-2 and GF6-WFV were used to build the estimation model of RF shrub coverage as shown in Figure 6.As per Figure 5, as the number of features increases, RMSE first decreases and then increases; contrarily, increasing the number of features increases the complexity of the model.Therefore, it was necessary to optimize feature variables, while ensuring model accuracy.For the Sentinel-2 image data, seven feature variables were selected, namely SAVI and B 2 _mean, B 2 , B 8 _mean, NDVI, RVI, and B 3 _mean; i.e., the features were reduced by 80.6% as compared to the original number of features.For the GF6-WFV images, 12 feature variables were selected, namely B 1 _mean, B 2 _mean, B 3 _mean, B 5 _mean, B 6 _mean, B 4 _mean, B 1 , B 6 , B 5 , B 4 , B 2 , and B 5 _correlation.Here, features were reduced by 76.9% as compared to the original number of features.

Estimation of Shrub Coverage Using the RF Model
Characteristic variables selected via Sentinel-2 and GF6-WFV were used to build the estimation model of RF shrub coverage as shown in Figure 6.

Estimation of Shrub Coverage Using the RF Model
Characteristic variables selected via Sentinel-2 and GF6-WFV were used to build the estimation model of RF shrub coverage as shown in Figure 6.It can be seen from Figure 6 that when the Sentinel-2 image data were used to build the random forest-based estimation model for shrub coverage, the R 2 was 0.81 and RMSE was 0.03.When the GF6-WFV image data were used to build the random forest-based shrub coverage estimation model, the R 2 was 0.72 and the RMSE was 0.03.Overall, both the approaches achieved good estimation accuracy, but the estimation accuracy with the Sentinel-2 image was higher than that with the GF6-WFV image data.The method of building a random forest-based regression model to estimate shrub coverage with medium-resolution multispectral images was feasible, as it achieved higher estimation accuracy, provided improved spatial resolution of images, and enhanced model accuracy.The It can be seen from Figure 6 that when the Sentinel-2 image data were used to build the random forest-based estimation model for shrub coverage, the R 2 was 0.81 and RMSE was 0.03.When the GF6-WFV image data were used to build the random forest-based shrub coverage estimation model, the R 2 was 0.72 and the RMSE was 0.03.Overall, both the approaches achieved good estimation accuracy, but the estimation accuracy with the Sentinel-2 image was higher than that with the GF6-WFV image data.The method of building a random forestbased regression model to estimate shrub coverage with medium-resolution multispectral images was feasible, as it achieved higher estimation accuracy, provided improved spatial resolution of images, and enhanced model accuracy.The importance ranking of each variable involved in the estimation of the RF model is shown in Figure 7.  From Figure 7, it can be seen that for Sentinel-2 images, the blue band (B2) was the most important with a contribution of 32.4%, followed by the green band textural mean feature (B3_mean) with a contribution of 15.3%.Among the seven feature variables, three were accounted for by the textural features, with a total contribution of 43.4%.For GF6-WFV images, the green band textural mean (B2_mean) had the highest importance with a contribution of 19.2%, followed by the red edge 1 band textural mean (B5_mean) with a contribution of 18%.Among the 12 feature variables, textural features account for 7; the top-ranked feature contribution was by the textural features, accounting for a total of From Figure 7, it can be seen that for Sentinel-2 images, the blue band (B 2 ) was the most important with a contribution of 32.4%, followed by the green band textural mean feature (B 3 _mean) with a contribution of 15.3%.Among the seven feature variables, three were accounted for by the textural features, with a total contribution of 43.4%.For GF6-WFV images, the green band textural mean (B 2 _mean) had the highest importance with a contribution of 19.2%, followed by the red edge 1 band textural mean (B 5 _mean) with a contribution of 18%.Among the 12 feature variables, textural features account for 7; the top-ranked feature contribution was by the textural features, accounting for a total of 77.7%.At the same time, it was found in Section 3.2 of this paper that the magnitude of the shrub coverage affected the estimation accuracy of the LSMM to a certain extent.Therefore, this paper further discusses the ability to use the RF model to estimate shrub coverage in different shrub coverage intervals.The results are shown in Table 6 and Figure 8.The importance ranking of each variable also varies under different coverage levels (Figure 8).For the Sentinel-2 images, the importance of the blue band (B2) was highest in areas with high coverage, medium coverage, and low coverage.For the GF6-WFV images, in addition to low coverage areas, the importance of the blue band (B1) was also at a relatively high level in both medium and high coverage areas.It can be seen that the blue band made a significant contribution toward explaining the changes in shrub coverage, and thus, it has a potential for application in estimating shrub coverage.The contributions of textural features vary under different levels of shrub coverage.When in areas with low coverage, the textural proportions in the Sentinel-2 and GF6-WFV images were 32.3% and 48.7%, respectively.When in areas with medium coverage, the textural proportions under the Sentinel-2 and GF6-WFV images were 34.1% and 54.9%, respectively.When in areas with high coverage, the textural proportions under the Sentinel-2 and GF6-WFV images were 36.9% and 61.4%, respectively.As the shrub coverage increases, the shrub plane features will also be obvious; therefore, the influence of textural features on shrub coverage also significantly increases.The red edge band of the GF6-WFV images is rich in information.With the increase in shrub coverage, information on the red edge and red edge As per Table 6, for the Sentinel-2 image data, the model exhibits good estimation accuracy, with an R 2 of 0.66, when in a low shrub coverage area (<3%).In a moderate shrub coverage area (3-7%), the R 2 decreased and the RMSE value improved.For a high shrub coverage area (>7%), the model reaches the highest R 2 of 0.67.For the GF6-WFV image data, the estimation accuracy of the model in areas with moderate shrub coverage was also poor.Additionally, the estimation accuracy of the model in areas with high shrub coverage was also higher than that in areas with low shrub coverage.In summary, both the Sentinel-2 and GF6-WFV images exhibit a common phenomenon, which is that their estimation models are more accurate in areas with low and high shrub coverage.However, in areas with moderate shrub coverage, their model accuracies are not as good; bushes in areas with medium coverage lower the accuracy of estimating the overall shrub coverage.
The importance ranking of each variable also varies under different coverage levels (Figure 8).For the Sentinel-2 images, the importance of the blue band (B 2 ) was highest in areas with high coverage, medium coverage, and low coverage.For the GF6-WFV images, in addition to low coverage areas, the importance of the blue band (B1) was also at a relatively high level in both medium and high coverage areas.It can be seen that the blue band made a significant contribution toward explaining the changes in shrub coverage, and thus, it has a potential for application in estimating shrub coverage.The contributions of textural features vary under different levels of shrub coverage.When in areas with low coverage, the textural proportions in the Sentinel-2 and GF6-WFV images were 32.3% and 48.7%, respectively.When in areas with medium coverage, the textural proportions under the Sentinel-2 and GF6-WFV images were 34.1% and 54.9%, respectively.When in areas with high coverage, the textural proportions under the Sentinel-2 and GF6-WFV images were 36.9% and 61.4%, respectively.As the shrub coverage increases, the shrub plane features will also be obvious; therefore, the influence of textural features on shrub coverage also significantly increases.The red edge band of the GF6-WFV images is rich in information.With the increase in shrub coverage, information on the red edge and red edge texture also contribute to better-explaining shrub coverage.

Comparison between Ground-Measured and Aerial-Photo-Derived Shrub Coverage
The training and validation for spectral mixing models and random forest models in this study were supported by aerial photographs.The traditional way to obtain shrub coverage mainly depended on artificial ground acquisition, such as the sample line needling method, which selects a central position in each sample plot, uses RTK to obtain the center coordinates, starts from due north, and arranges three 30 m long sample lines at 60 degrees intervals.On the laid sample lines, the standard is to cover the ground with the flower pole vertically placed and touched, record the types of ground objects touched on the sampling line every 0.5 m (if the vegetation coverage is uniform, the interval distance can be appropriately increased), and the final shrub coverage is the ratio of the number of vegetation touched to the total number of samples sampled [27].However, when this method is applied for investigating sparse and uneven shrub distribution in sample plots, the number of spline lines is prone to errors, difficult to control, and difficult to accurately represent the true situation of the overall sample plot.Drones are widely used in ecological monitoring due to their low cost, high flexibility, and ease of operation [61].In this study, aerial photographs were used to calculate the shrub coverage of the sample plot.At first, the vegetation index was calculated through aerial photographs, and then the sample plots were divided on the aerial photographs to determine the threshold for distinguishing between shrubs and herbs.Finally, the number of shrub pixels within the sample plots was counted.From Figure 3, it can be seen that there is a significant spectral difference between herbaceous vegetation and shrub vegetation canopy within the visible light range.Therefore, threshold values can be well determined on high-resolution aerial photographs to distinguish shrubs and herbaceous vegetation.Fu et al. [62] calculated the vegetation index in the visible light band based on drone images and analyzed the reliability of using threshold values to estimate vegetation coverage.The results showed that the vegetation index threshold in the visible light band is highly sensitive to desert grasslands and can achieve high accuracy in vegetation coverage estimation.Nevertheless, variations in shooting time and weather conditions among different aerial photographs result in distinct thresholds.

Factors Influencing the Estimation of Shrub Coverage via LSMM and Comparison of Estimation Accuracy of the Different Remote Sensing Images
When using the LSMM to estimate shrub coverage in this paper, the estimation accuracy of Sentinel-2 was higher than that of GF6-WFV.The main reason may be that Sentinel-2 has more bands and can distinguish shrubs and herbs from more bands; Sentinel-2 also has a higher spatial resolution, and the degree of spectral mixing in each pixel is relatively low.It is easier to obtain endmember abundance using the LSMM.Ji et al. [45] used a linear spectral mixing model to perform three endmembers spectral unmixing on images with different spatial resolutions.The main images used were Landsat8-OLI (spatial resolution 30 m), GF1-WFV (spatial resolution 16 m), and Sentinel-2 (spatial resolution 10 m).The R 2 values were 0.39, 0.54, and 0.71, respectively.The improvement in image spatial resolution contributed to the estimation accuracy of spectral mixing models [63].The Sentinel-2 sensor with a narrower band has better performance in estimating vegetation coverage than GF6-WFV, as the narrow band is more sensitive to changes in vegetation characteristics and can reduce the impact of soil background on sparse vegetation coverage areas [64].
In this study, the accuracy of shrub coverage estimation with the spectral mixture model was low.Contrarily, Chen et al. [65] estimated vegetation coverage with the pixel dichotomy model and the spectral mixture model based on Landsat8 OLI image; the research has shown that the spectral mixture model has higher accuracy, and the accuracy of vegetation coverage extraction was 85.31%, mainly because the mixed spectrum that was to be decomposed belonged to vegetation and bare soil of different ground object types.This study aimed to estimate the coverage of shrubs in SEGs, and the mixed spectra that were to be decomposed comprised shrubs and herbs of the same genus of vegetation.The research results of Asner [66] and Smith [67] also indicated that community structure and species composition are important factors that affect the unmixing of mixed spectra.At the same time, due to the limitations of the field surveys, the measured canopy spectrum could exhaust the spatial heterogeneity of herbaceous and shrub vegetation in the study area [18], which may be another reason for the poor estimation quality using the LSMM.

Factors Influencing the Estimation of Shrub Coverage with the RF Model and Comparison of Estimation Accuracy of the Different Remote Sensing-Based Imaging
This study successfully estimated shrub coverage in SEGs through the RF model and achieved good estimation accuracy.By comparing its modeling feature factors, it can be found that whether it is Sentinel-2 images or GF6-WFV images, the total contribution of texture features has a certain impact on the model.This was mainly because there were differences in the structure and growth morphology between shrub and herbaceous vegetation, leading to significant differences in the texture reflected by the remote sensing images of these two vegetation types.Therefore, textural features as the preferred feature variables can help establish a model for estimating shrub coverage in SEGs and improve the accuracy of model estimation.This is consistent with the research results of Zhang et al. [35].The phenomenon, like the model, did not show improved performance as the increase in shrub coverage, which is shown in Table 6, may result from the different infrared wave scattering mechanisms with different land cover.Previous research demonstrated that the scattering from soil and vegetation are mixed and are difficult to be separated when the vegetation coverage is moderate.Due to the similar scattering proportion of shrubs and herbs, the difference between them is no longer significant and lead to increased difficulty in extracting shrub coverage [51].However, for vegetation with high and low coverage, due to lack of sufficient soil information in the former and insufficient vegetation to transmit canopy scattering and soil reflection information in the latter, the information received by the sensor is relatively less affected by the soil background [68], and there is a significant spectral difference between shrubs and herbs in this range, which is conducive to the extraction of shrub coverage.
The blue band visible in Figure 8 significantly contributes to the estimation of shrub coverage at different levels of shrub coverage.Studies have shown that chlorophyll is more sensitive to the blue band, and thus strongly absorbs in its range [69].However, in this study, the canopy of shrub vegetation in the study area was darker in color and contains more chlorophyll than that of herbaceous vegetation.Eventually, the blue band became more sensitive to the coverage of shrubs.Compared with the spectral mixture model, regardless of it being based on image features from Sentinel-2 or GF6-WFV, the estimation accuracy of the RF model was higher.At the same time, with this estimation model, the estimation accuracy of Sentinel-2 was also higher than that of GF6-WFV.Therefore, in the SEG, the RF model was more suitable to estimate shrub coverage, which can also effectively improve the estimation accuracy of shrub coverage in large areas.

Limitations and Future Research Directions
When using the spectral mixture model to estimate shrub coverage in SEGs, the differentiation between shrub and herb is insufficient.In future research, attempts are needed to obtain hyperspectral data, such as the ZY-1 02D [70][71][72], to distinguish shrubs and herbs using a greater number of bands and improve the accuracy of shrub coverage estimation.Some studies show that the withering and yellowing period of the Caragana microphylla shrub in the study area differs from that of the herbaceous vegetation.The Caragana microphylla shrub withers and yellows in early October, and the herbaceous vegetation withers and yellows in early September [18].The next steps of research can utilize the differences in vegetation phenological characteristics during this period to establish a shrub coverage estimation model, which may show better results.In addition, shrub coverage is also affected by factors, such as precipitation, soil, and shape [73].In the future, variables, such as precipitation, soil, and shape, can be considered for the construction of the model, which may help to optimize the existing RF model.

Conclusions
This study considers the SEG in Xilingol as the study area and extracts the characteristic factors based on imaging data from remote sensing with Sentinel-2 and GF6-WFV.By discussing the spectral differences between herbaceous vegetation and shrub vegetation, a spectral mixture model and an RF model were constructed; their estimation accuracies were compared and analyzed.The main conclusions were as follows: (1) In the SEG, using the LSMM, estimation of shrub coverage could be achieved with medium-resolution images.Compared with the RF model, its estimation effect can still be improved; furthermore, it shows an obvious lack of differentiation between shrubs and herbs.In addition, the shrub coverage affects the estimation accuracy of the LSMM to a certain extent.(2) The RF model showed high estimation accuracy when estimating shrub coverage in the SEG using Sentinel-2 imaging, with an estimation accuracy, R 2 , of 0.81 and an RMSE of 0.03.The estimation accuracy, R 2 , of the GF6-WFV image is 0.72 and RMSE is 0.03.Especially for estimation models with high shrub coverage, the model accuracy reaches the highest level.(3) In the Sentinel-2 images, the contribution of texture features is 43.4%; in the GF6-WFV images, the contribution of texture features is 77.7%.Therefore, texture features as the preferred feature variables can help to construct a model for estimating shrub coverage in SEG and improve the estimation accuracy of the model.(4) Whether in the LSMM or in the RF model, the Sentinel-2 image may provide better estimation than the GF6-WFV imaging; these data have great potential to monitor SEGs.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.The data are not publicly available due to its confidentiality.

19 Figure 1 .
Figure 1.(a) Location and sample distribution of the study area; (b) shrub-encroached grasslands (SEGs) captured on the ground; (c) SEGs captured via an unmanned aerial vehicle.

Figure 1 .
Figure 1.(a) Location and sample distribution of the study area; (b) shrub-encroached grasslands (SEGs) captured on the ground; (c) SEGs captured via an unmanned aerial vehicle.

19 Figure 3 .
Figure 3. ASD FieldSpec Pro FR 2500: the spectra of each ground object.Note: NIR represents the near-infrared band, and SWIR represents the short-wave infrared band.

Figure 4 .
Figure 4. Spectral curves of shrubs under different levels of vegetation cover.Note: (a) shows the

Figure 3 .
Figure 3. ASD FieldSpec Pro FR 2500: the spectra of each ground object.Note: NIR represents the near-infrared band, and SWIR represents the short-wave infrared band.

Figure 3 .
Figure 3. ASD FieldSpec Pro FR 2500: the spectra of each ground object.Note: NIR represents the near-infrared band, and SWIR represents the short-wave infrared band.

Figure 4 .
Figure 4. Spectral curves of shrubs under different levels of vegetation cover.Note: (a) shows the spectral curves of shrubs obtained under different vegetation cover levels with Sentinel-2.(b) shows the spectral curves of shrubs obtained under different vegetation cover levels with GF6-WFV.

Figure 4 .
Figure 4. Spectral curves of shrubs under different levels of vegetation cover.Note: (a) shows the spectral curves of shrubs obtained under different vegetation cover levels with Sentinel-2.(b) shows the spectral curves of shrubs obtained under different vegetation cover levels with GF6-WFV.

Figure 6 .
Figure 6.Results of estimation and precision tests of shrub coverage with the random forest model.

Figure 6 .
Figure 6.Results of estimation and precision tests of shrub coverage with the random forest model.
Remote Sens. 2023, 15, x FOR PEER REVIEW 12 of 19 importance ranking of each variable involved in the estimation of the RF model is shown in Figure 7.

Figure 7 .
Figure 7. Distribution of variable importance in the random forest model.

Figure 7 .
Figure 7. Distribution of variable importance in the random forest model.

Figure 8 .
Figure 8. Importance ranking of variables under different levels of coverage.

Figure 8 .
Figure 8. Importance ranking of variables under different levels of coverage.
• 35 -46 • 46 N, 111 • 09 -120 • 01 E).It borders Mongolia in the north with a 1098 km long borderline.It is adjacent to the Hebei Province in the south, and Chifeng, Hinggan League, and Tongliao in the east.Its grassland area accounts for approximately 176,000 km 2

Table 3 .
Gray level co-occurrence of matrix features.

Table 5 .
Estimation of different shrub cover categories using Sentinel-2 and GF6-WFV images.

Table 6 .
Results of the estimation of random forest under different shrub coverage.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 19 Author Contributions: Conceptualization, B.S. and Z.X.; methodology, B.S.; software, Z.X.; validation, W.Z., Z.G. and H.W.; formal analysis, W.Z.; investigation, W.Y.; data curation, Z.X.; writing-original draft preparation, Z.X.; writing-review and editing, B.S.; visualization, S.T.; supervision, Z.W.; funding acquisition, B.S. All authors have read and agreed to the published version of the manuscript.This work was supported by the National Natural Science Foundation of China (grant number: 42001386, 42271407, 42161059), the Fundamental Research Funds for the Central Non-profit Research Institution of CAF (grant number: CAFYBB2019ZB004), the special fund for Science and Technology Innovation Teams of Shanxi Province (grant number: 202204051001010). Funding: