Assessing the Impact of Soil on Species Diversity Estimation Based on UAV Imaging Spectroscopy in a Natural Alpine Steppe

: Grassland species diversity monitoring is essential to grassland resource protection and utilization. “Spectral variation hypothesis” (SVH) provides a remote sensing method for monitoring grassland species diversity at pixel scale by calculating spectral heterogeneity. However, the pixel spectrum is easily affected by soil and other background factors in natural grassland. Unmanned aerial vehicle (UAV)-based imaging spectroscopy provides the possibility of soil information removal by virtue of its high spatial and spectral resolution. In this study, UAV-imaging spectroscopy data with a spatial resolution of 0.2 m obtained in two sites of typical alpine steppe within the Sanjiangyuan National Nature Reserve were used to analyze the relationships between four spectral diversity metrics (coefﬁcient of variation based on NDVI (CV NDVI ), coefﬁcient of variation based on multiple bands (CV Multi ), minimum convex hull volume (CHV) and minimum convex hull area (CHA)) and two species diversity indices (species richness and the Shannon–Wiener index). Meanwhile, two soil removal methods (based on NDVI threshold and the linear spectral unmixing model) were used to investigate the impact of soil on species diversity estimation. The results showed that the Shannon–Wiener index had a better response to spectral diversity than species richness, and CV Multi showed the best correlation with the Shannon–Wiener index between the four spectral diversity metrics after removing soil information using the linear spectral unmixing model. It indicated that the estimation ability of spectral diversity to species diversity was signiﬁcantly improved after removing the soil information. Our ﬁndings demonstrated the applicability of the spectral variation hypothesis in natural grassland, and illustrated the impact of soil on species diversity estimation.


Introduction
Grassland biodiversity is critical for the long-term restoration and support services of ecosystem functions [1][2][3], and is directly related to human society [4,5]. However, biodiversity is being lost due to climate change and human activities, and this trend is likely to continue in the future [6,7]. Grasslands bear the brunt of biodiversity loss due to ecosystem vulnerability and severe environmental pressure [8,9]. The Aichi target set in response to the continuing loss of global biodiversity has not yet been fully achieved [10]. The biodiversity of grassland can be divided into three levels: genetic diversity, species diversity and ecosystem diversity [11]. As the primary component of biodiversity, the species diversity of grassland remains a top priority in biodiversity monitoring.
Species diversity refers to biodiversity at the species level and can be considered at least from taxonomic, functional, and phylogenetic diversity perspectives [12][13][14]. In terms of species richness based on variation in NDVI in the Hood River region. Gholizadeh et al. [32] used averaged CV of 142 bands (427-914 nm) as a proxy of α-diversity to estimate grassland species diversity within the Central Platte River ecosystem. VIs can be obtained from commonly-used satellite remote-sensing data and applied to monitor grassland species diversity over a large geographical extent [55,59]. In contrast, spectral diversity metrics based on multiple spectral bands take advantage of increased spectral information, but may also produce the problem of data redundancy [37,60]. In addition, it should be noted that spectral diversity metrics are vulnerable to interference from the soil information of pixels, which will significantly increase the local spectral heterogeneity, thus leading to species diversity estimation with higher values [37,52]. Even spectral diversity showed a negative correlation with species diversity under the influence of soil, which is contrary to the general knowledge [61].
One commonly used method for removing soil information is to set a threshold of NDVI, such as Gholizadeh et al. [52], who selected pixels with an NDVI larger than 0.4 as vegetation based on proximal imaging spectrometry data with 0.001 m spatial resolution, while Zhao et al. [62] applied a threshold of 0.2 to UAV-imaging spectrometry data with 0.03 m spatial resolution. The spectral unmixing model is another method used for soil removal, for example, Gholizadeh et al. [52] extracted vegetation information by the fully constrained least squares spectral unmixing approach based on UAV-imaging spectroscopy data with 0.75 m spatial resolution. Setting NDVI thresholds is relatively simple, but the thresholds are not consistent and can result in a loss of nonsoil information by removing the entire pixel. The spectral unmixing model may keep the vegetation information at pixel scale, but the acquisition of the pure spectra of vegetation and soil in the image is difficult [63]. Meanwhile, regardless of the methods, soil removal is also related to the spatial resolution of observation.
Therefore, the objective of our study is to investigate the optimal spectral metrics to monitor the species diversity of natural alpine grassland while considering the impact of soil background. We compared two species-diversity indices, four spectral-diversity metrics and two approaches of soil removal by using UAV imaging spectroscopy with 0.2 m spatial resolution. The results may provide a reference for the selection of parameters in monitoring grassland species diversity by remote sensing under natural conditions.

Study Area
This study area includes two UAV flight regions (center coordinates: 34 • 19 23 E) in Maduo county at the Sanjiangyuan National Nature Reserve, Qinghai Province of China ( Figure 1). Sanjiangyuan National Nature Reserve is one of the most biodiverse areas in the world and there are 760 species of vascular plants belonging to 241 genera and 50 families [64]. The average annual temperature is about −4°C and the average annual precipitation is 303.9 mm, based on the meteorological records of the monthly standard weather station dataset (1957-2015) [65]. Maduo county, being the source of the Yellow River, is a high plain (altitude > 4000 m) with 80% acreage of grassland and has been regarded as a key conservation area for grassland diversity. Surveys show that this county is characterized by alpine meadow and alpine steppe, the dominant species are Elymus dahuricus, Leontopodium leontopodioides, Kobresia myosuroides, Potentilla chinensis, Oxytropis ochrocephala, etc. [66]. We set two UAV flight regions (No.1: 400 m × 300 m; No.2: 400 m × 400 m) with natural alpine steppe in Maduo county, where complex species composition (more than 20 dominant species) ensured the heterogeneity gradient of sample plots. with land cover data with 10 m spatial resolution from ChinaCover2020 [67], and two UAV imaging spectroscopy images (NIR: 860 nm, red: 660 nm, green: 560 nm) with 18 field-measured sample plots and photographs (right).

Imaging Spectroscopy Data and Preprocessing
The imaging spectroscopy data were collected on 20 August 2018 from 11:00 to 14:00 local time under cloudless conditions in Maduo county using the M600 UAV platform (DJI, Shenzhen). The grassland was in the growing season, which avoided spectral heterogeneity caused by the death of species. We used the UAV platform with a ZK-VNIR-FPG480 push-broom hyperspectral sensor (ZKYD Data Technology Co., Ltd., Beijing), which provided a spectral range of 390-1020 nm with 2.3 nm spectral resolution and a total of 270 spectral channels. We set the spatial resolution of the imaging spectroscopy data and flight altitude as 0.2 m and 150 m, respectively. Therefore, detailed ground information could be obtained, and the errors of ground sample distance caused by high flight altitude could be minimized.
The preprocessing of imaging spectroscopy data consisted of three preliminary steps. Spectral radiation calibration was performed as the first step to determine the central wavelength and bandwidth of each band, as well as the corresponding relationship between the spectral response and the true spectral radiance. Then, the reflectivity spectrum of grass was calculated based on the spectral radiance and the reference spectrum of the reflectivity whiteboard. The hyperspectral reflectance data containing geographic coordinate information was finally obtained after geometric correction based on ground control points and the curved surface spline function [68].

Field Measurements
The field measurements were conducted simultaneously with UAV-hyperspectral image acquisition. We regularly set 9 sample plots (1 m × 1 m) in both flight regions following a criterion that most dominant species within the flight regions should be included. The central coordinate pairs of each plot were recorded through a Trimble Ge-oXH 3000 handheld GPS and differential correction was done based on a differentially corrected global positioning system to minimize errors. The detailed survey for each sample plot included grassland species, individual coverage and number of each species. A total of 18 sample plots and 22 grassland species were investigated and there was a gradient from 3 to 8 in the number of species for each sample plot. Since the height of most species in the flight regions was about 5-10 cm, there was no significant vertical , Maduo county (middle) with land cover data with 10 m spatial resolution from ChinaCover2020 [67], and two UAV imaging spectroscopy images (NIR: 860 nm, red: 660 nm, green: 560 nm) with 18 field-measured sample plots and photographs (right).

Imaging Spectroscopy Data and Preprocessing
The imaging spectroscopy data were collected on 20 August 2018 from 11:00 to 14:00 local time under cloudless conditions in Maduo county using the M600 UAV platform (DJI, Shenzhen, China). The grassland was in the growing season, which avoided spectral heterogeneity caused by the death of species. We used the UAV platform with a ZK-VNIR-FPG480 push-broom hyperspectral sensor (ZKYD Data Technology Co., Ltd., Beijing, China), which provided a spectral range of 390-1020 nm with 2.3 nm spectral resolution and a total of 270 spectral channels. We set the spatial resolution of the imaging spectroscopy data and flight altitude as 0.2 m and 150 m, respectively. Therefore, detailed ground information could be obtained, and the errors of ground sample distance caused by high flight altitude could be minimized.
The preprocessing of imaging spectroscopy data consisted of three preliminary steps. Spectral radiation calibration was performed as the first step to determine the central wavelength and bandwidth of each band, as well as the corresponding relationship between the spectral response and the true spectral radiance. Then, the reflectivity spectrum of grass was calculated based on the spectral radiance and the reference spectrum of the reflectivity whiteboard. The hyperspectral reflectance data containing geographic coordinate information was finally obtained after geometric correction based on ground control points and the curved surface spline function [68].

Field Measurements
The field measurements were conducted simultaneously with UAV-hyperspectral image acquisition. We regularly set 9 sample plots (1 m × 1 m) in both flight regions following a criterion that most dominant species within the flight regions should be included. The central coordinate pairs of each plot were recorded through a Trimble GeoXH 3000 handheld GPS and differential correction was done based on a differentially corrected global positioning system to minimize errors. The detailed survey for each sample plot included grassland species, individual coverage and number of each species. A total of 18 sample plots and 22 grassland species were investigated and there was a gradient from 3 to 8 in the number of species for each sample plot. Since the height of most species in the flight regions was about 5-10 cm, there was no significant vertical structure. The shadows or species overlap would not interfere with the calculation of spectral diversity metrics.

Species Diversity Indices
We used species richness and Shannon-Wiener index to represent species diversity and calculated them within each sample plot of 1 m 2 . Species richness refers to the total number of different grassland species. Shannon-Wiener index (H ) consists of two components: species number and equitability or evenness of species distribution [23,59,69]. The Shannon-Wiener index was calculated based on the following formula: where n is the total number of different species in a sample plot; p i is the proportional abundance of the species i; ln(p i ) is the natural logarithm of this proportion. Shannon-Wiener index gradually increases to the maximum value from zero, as the species distribution within the plot changes from only one species to the same abundance of all different species [70].

Spectral Diversity Metrics
CV, CHA and CHV were selected as the spectral diversity metrics in our study after comprehensively analyzing the principle and applicability of several frequently used metrics. Among them, CV could be calculated by both characteristic vegetation index (CV NDVI ) and multibands (CV Multi ), but CHA and CHV were necessarily based on multibands. It should be noted that all spectral diversity metrics were calculated within a pixel set of 5 × 5 in order to match the 1 m × 1 m ground sample plot.
CV NDVI was calculated based on the mean and standard deviation of NDVI (Equation (2)) of all the pixels within a sample plot (Equation (3)), while CV Multi was the mean of coefficients of variation for all bands from 390 nm to 1020 nm (Equation (4)).
where NDVI was calculated based on central reflectance of near-infrared (860 nm) and red (660 nm); std (NDVI) and mean (NDVI) are the standard deviation and mean value of NDVI of all pixels in a ground sample plot, while std (ρ λ ) and mean (ρ λ ) are standard deviation and mean value of reflectance at band λ; ρ λ is the reflectance of band λ; n is the number of bands, which was 270 in our study. CHV is another metric of spectral diversity, which represents the minimum volume of the convex hull formed by three-dimensional spectral information. The three-dimensional spectral information is usually obtained by dimensionality reduction of n-dimensional imaging spectroscopy data. The first three dimensions of principal component analysis (PCA) could explain 85-90% of the total information according to the gravel figure, so we used the first three principal components of the reflectance values to calculate CHV (convhull, matlab) in our study.
CHA represents the minimum area of the convex hull in a two-dimensional space formed by both the reflectance of each pixel and the mean reflectance of all pixels in one sample plot [52]. It maximizes the spectral separation between the sample plot and each pixel within this sample plot. The average value of CHA of all effective pixels represents the CHA of the sample plot, which was calculated as follows: where CHA L is the spectral diversity of a sample plot; m is the number of effective pixels of Lth plot; R K,L and R L , respectively, define two dimensions of the convex hull and R K,L represents the Kth pixel in Lth plot, while R L is the mean spectrum of all pixels in Lth plot and both are n × 1 vectors where n is band number; CHA R K,L , R L represents the area between Kth pixel and the average of Lth plot and it nears to 0 when the spectra of one pixel is extremely similar to the average spectra of sample plot.

Soil Filtering
We respectively applied two methods to filter soil information. One is setting NDVI thresholds. In this study, we used three thresholds of NDVI (0.2, 0.3 and 0.4) based on the previous studies [52,62]. Pixels with NDVI below the thresholds were removed and the four spectral diversity metrics (i.e., CV NDVI , CV Multi , CHA and CHV) of each sample plot were recalculated using the remaining pixels.
Another method is using the spectral unmixing model, which decomposes the mixed pixel into different basic components (i.e., endmembers) to filter soil information. In general, the linear spectral unmixing model considers the spectra of the mixed pixels as the linear addition of the spectra of vegetation and soil according to their respective proportions (Equations (6) and (7)).
For the pure soil selection, we marked three bare soil ground samples (0.5 m × 0.5 m) in each flight region and collected the spectra of bare soil by ASD field spectroradiometers (ASD Co., Alpharetta, GA, USA). Then we selected a total of 54 pixels from the UAVhyperspectral imagery that were similar to the field spectra of bare soil and calculated their average spectrum as the pure soil spectrum for spectral unmixing model. Meanwhile, the fractions for both vegetation and soil were obtained from the FVC (fractional vegetation coverage) products based on dimidiate pixel mode [71,72]. Therefore, the pure vegetation spectrum of each pixel could be extracted by inverting the linear spectral unmixing model [73]. R j = R veg,j F veg + R soil,j F soil j = 1, 2, . . . , p where R j is the reflectance of the mixed pixel in band j; R veg,j is the pure reflectance for vegetation components in band j and R soil,j is the pure reflectance of soil components in band j; F veg and F soil are the fractional coverage of vegetation and soil within the pixel and their sum is equal to 1.

Results
The performance of the relationships between the two species diversity indices and the four spectral diversity metrics before and after removing the soil based on two different methods are listed as follows (Table 1).

Responses of Spectral Diversity to Species Diversity
The correlations between four spectral diversity metrics (CV NDVI , CV Multi , CHV and CHA) and two species diversity indices (species richness and Shannon-Wiener index) are shown in Figure 2. From the perspective of species diversity indices, Shannon-Wiener index showed a significant positive correlation with all spectral diversity metrics (p < 0.05) compared to species richness. Species richness just reveals how many species exist in a certain region and cannot reflect the uniformity of the distribution of species, but the uniformity can be indicated by Shannon-Wiener index based on the species richness and their relative abundance (evenness). Therefore, the Shannon-Wiener index could better respond to the spectral diversity, which is consistent with the previous study of Oldeland [70]. Moreover, Figure 2a-d show the variation of spectral diversity with the same species richness. For example, the differences between the maximum and minimum CVMulti were 0.10 with six species richness and 0.11 with eight species richness, both of them were higher than 0.02, which was the difference between the mean value of CVMulti with six and eight species richness. This might be a potential reason for the nonsignificant relationships between species richness and spectral diversity metrics. Notes: *, 0.01 < p-value < 0.05, significant; **, p-value < 0.01, extremely significant. With soil represents spectral diversity metrics before soil removal; NDVI threshold represents spectral diversity metrics after soil removal based on NDVI threshold; unmixing represents spectral diversity metrics after soil removal based on linear spectral unmixing model.
Although the Shannon-Wiener index had a better relationship with spectral diversity than species richness, the performance was shown to be similar (Figure 2e-h, R 2 from 0.24 to 0.28) within the four spectral diversity metrics. This indicated there was not much difference for monitoring species diversity by these spectral diversity metrics and the optimal spectral diversity could not be determined.

Impact of Soil on Spectral Diversity Metrics
We compared the relationships between the Shannon-Wiener index and spectral diversity calculated after soil removal using two methods. For the NDVI thresholds of soil removal, we finally used 0.4 for the following analysis due to its rigorousness for removing soil information after comparing the results of those three thresholds (Table 2). Obviously, the relationships between the Shannon-Wiener index and the four spectral diversity metrics were all improved after removing soil information ( Figure 3). However, by filtering soil pixels based on NDVI threshold, the correlations between the Shannon-Wiener index and the four spectral diversity metrics had R 2 from 0.33 to 0.44. In contrast, by extracting the vegetation information based on the inverted linear spectral unmixing model, the relationships between the Shannon-Wiener index and the four spectral diversity metrics had an R 2 from 0.37 to 0.61, and CV Multi had the best performance (R 2 = 0.61, Figure 3f). Although the Shannon-Wiener index had a better relationship with spectral diversity than species richness, the performance was shown to be similar (Figure 2e-h, R 2 from 0.24 to 0.28) within the four spectral diversity metrics. This indicated there was not much difference for monitoring species diversity by these spectral diversity metrics and the optimal spectral diversity could not be determined.

Impact of Soil on Spectral Diversity Metrics
We compared the relationships between the Shannon-Wiener index and spectral diversity calculated after soil removal using two methods. For the NDVI thresholds of soil removal, we finally used 0.4 for the following analysis due to its rigorousness for removing soil information after comparing the results of those three thresholds (Table 2). Obviously, the relationships between the Shannon-Wiener index and the four spectral diversity metrics were all improved after removing soil information (Figure 3). However, by filtering soil pixels based on NDVI threshold, the correlations between the Shannon-Wiener index and the four spectral diversity metrics had R 2 from 0.33 to 0.44. In contrast, Figure 2. The relationships between species diversity and spectral diversity with soil information. The left column indicates the relationships between four spectral diversity metrics and species richness (a-d) and the right column indicates the relationships between four spectral diversity metrics and Shannon-Wiener index (e-h).   Figure 4 shows that the median values of the 18 sample plots for the four spectral diversity metrics generally decreased after removing the soil. It demonstrates that the soil information might increase spectral heterogeneity to a certain extent, which leads to weaker relationships between spectral diversity metrics and species diversity. However, the maximum values of all spectral diversity metrics based on NDVI threshold had a slight change as the 0.4 threshold might not be suitable for soil removing of all sample plots. By contrast, the maximum values of spectral diversity metrics based on the linear spectral unmixing model showed an apparent reduction. Since the vegetation information of each pixel was extracted based on its various fractionals of vegetation coverage, rather than by a uniform threshold for all pixels, the linear spectral unmixing model could be more suitable for different grassland conditions. It demonstrated that the spectral diversity based on the linear spectral unmixing model had better performance at measuring the Shannon-Wiener index than when based on setting an NDVI threshold.  Figure 4 shows that the median values of the 18 sample plots for the four spectral diversity metrics generally decreased after removing the soil. It demonstrates that the soil information might increase spectral heterogeneity to a certain extent, which leads to weaker relationships between spectral diversity metrics and species diversity. However, the maximum values of all spectral diversity metrics based on NDVI threshold had a slight change as the 0.4 threshold might not be suitable for soil removing of all sample plots. By contrast, the maximum values of spectral diversity metrics based on the linear spectral unmixing model showed an apparent reduction. Since the vegetation information of each pixel was extracted based on its various fractionals of vegetation coverage, rather than by a uniform threshold for all pixels, the linear spectral unmixing model could be more suitable for different grassland conditions. It demonstrated that the spectral diversity based on the linear spectral unmixing model had better performance at measuring the Shannon-Wiener index than when based on setting an NDVI threshold.
Additionally, the relationships between species richness and spectral diversity changed from nonsignificant to significant after soil removal (Table l), although the correlation between them still did not perform as well as the Shannon-Wiener index. This result also suggested that the soil information had a great effect on the response of spectral diversity to species diversity. Figure 4. The variation of four spectral diversity metrics between 18 sample plots before and after soil information removal. Each box shows the maximum, the upper quartile, the mean, the lower quartile and the minimum. Each spectral diversity metric includes three conditions: with soil information (blue), without soil information, removed by setting NDVI threshold (red) and without soil information, removed by linear spectral unmixing model (green).
Based on the above results and analysis, after removing soil information by the linear spectral unmixing model, CVMulti was selected as the optimal spectral diversity metric to map the Shannon-Wiener index in this study area. We set a moving window with 5 × 5 pixels in order to match the sample plot of ground (1 m × 1 m) and calculate the regional CVMulti. Finally, the spatial distribution of the Shannon-Wiener index covering two UAV flight regions with 1 m spatial resolution was mapped and is shown in Figure 5. . The variation of four spectral diversity metrics between 18 sample plots before and after soil information removal. Each box shows the maximum, the upper quartile, the mean, the lower quartile and the minimum. Each spectral diversity metric includes three conditions: with soil information (blue), without soil information, removed by setting NDVI threshold (red) and without soil information, removed by linear spectral unmixing model (green).
Additionally, the relationships between species richness and spectral diversity changed from nonsignificant to significant after soil removal (Table l), although the correlation between them still did not perform as well as the Shannon-Wiener index. This result also suggested that the soil information had a great effect on the response of spectral diversity to species diversity.
Based on the above results and analysis, after removing soil information by the linear spectral unmixing model, CV Multi was selected as the optimal spectral diversity metric to map the Shannon-Wiener index in this study area. We set a moving window with 5 × 5 pixels in order to match the sample plot of ground (1 m × 1 m) and calculate the regional CV Multi . Finally, the spatial distribution of the Shannon-Wiener index covering two UAV flight regions with 1 m spatial resolution was mapped and is shown in Figure 5.
Remote Sens. 2021, 13, x FOR PEER REVIEW Additionally, the relationships between species richness and spectral div changed from nonsignificant to significant after soil removal (Table l), although th relation between them still did not perform as well as the Shannon-Wiener index result also suggested that the soil information had a great effect on the response o tral diversity to species diversity. Figure 4. The variation of four spectral diversity metrics between 18 sample plots before an soil information removal. Each box shows the maximum, the upper quartile, the mean, the quartile and the minimum. Each spectral diversity metric includes three conditions: with formation (blue), without soil information, removed by setting NDVI threshold (red) and w soil information, removed by linear spectral unmixing model (green).
Based on the above results and analysis, after removing soil information by t ear spectral unmixing model, CVMulti was selected as the optimal spectral diversity to map the Shannon-Wiener index in this study area. We set a moving window wit pixels in order to match the sample plot of ground (1 m × 1 m) and calculate the re CVMulti. Finally, the spatial distribution of the Shannon-Wiener index covering two flight regions with 1 m spatial resolution was mapped and is shown in Figure 5.

Methods for Grassland Species Diversity Estimation
Our findings demonstrated that the spectral variation hypothesis could be applied in the natural alpine steppe, and there was a significant positive correlation between spectral diversity and species diversity. We also highlighted that removing soil information in high-resolution UAV imaging spectroscopy data could improve the performance of spectral diversity metrics significantly, which was also observed in previous simulated and realworld grassland diversity studies [37,52]. Mixed pixels of soil and grass are ubiquitous in imaging spectroscopy data and soil information removal is a challenge. Compared with filtering soil or soil-dominant pixels by setting a fixed NDVI threshold, it is more suitable to remove soil information by the inverted linear spectral unmixing model (Figure 3). However, the spectral determination of pure soil is difficult since it is not easy to find the bare soil pixels in the natural alpine steppe, and the soil also has a variety of types over a large region. Therefore, multitemporal and hyperspectral remote sensing data could be considered in the future to extract each pixel with bare soil in the nongrowing season and accurately distinguish soil types with more spectral detail [74].
We used all bands from 390 nm to 1020 nm to calculate the spectral diversity metrics of CV Multi (R 2 = 0.61), which performed much better than the CV of single band (R 2 = 0.02~0.45) and CV NDVI (R 2 = 0.37) to estimate the Shannon-Wiener index after soil removal based on the linear spectral unmixing model ( Figure 6). It indicated that the multiple spectral bands could better explain the species heterogeneity than by a single band or certain VIs. However, the redundant and highly correlated bands still need deep analysis. For example, correlations and simple linear or random forest regression can be used for selecting the most important and nonredundant bands and PCA may be applied for extracting major axes to represent spectral heterogeneity. Additionally, the full spectral range from visible light (380-780 nm) to near infrared (780-2500 nm) could be further attempted to explore the effective band combinations.

Methods for Grassland Species Diversity Estimation
Our findings demonstrated that the spectral variation hypothesis could be applied in the natural alpine steppe, and there was a significant positive correlation between spectral diversity and species diversity. We also highlighted that removing soil information in high-resolution UAV imaging spectroscopy data could improve the performance of spectral diversity metrics significantly, which was also observed in previous simulated and real-world grassland diversity studies [37,52]. Mixed pixels of soil and grass are ubiquitous in imaging spectroscopy data and soil information removal is a challenge. Compared with filtering soil or soil-dominant pixels by setting a fixed NDVI threshold, it is more suitable to remove soil information by the inverted linear spectral unmixing model (Figure 3). However, the spectral determination of pure soil is difficult since it is not easy to find the bare soil pixels in the natural alpine steppe, and the soil also has a variety of types over a large region. Therefore, multitemporal and hyperspectral remote sensing data could be considered in the future to extract each pixel with bare soil in the nongrowing season and accurately distinguish soil types with more spectral detail [74].
We used all bands from 390 nm to 1020 nm to calculate the spectral diversity metrics of CVMulti (R 2 = 0.61), which performed much better than the CV of single band (R 2 = 0.02 ~ 0.45) and CVNDVI (R 2 = 0.37) to estimate the Shannon-Wiener index after soil removal based on the linear spectral unmixing model ( Figure 6). It indicated that the multiple spectral bands could better explain the species heterogeneity than by a single band or certain VIs. However, the redundant and highly correlated bands still need deep analysis. For example, correlations and simple linear or random forest regression can be used for selecting the most important and nonredundant bands and PCA may be applied for extracting major axes to represent spectral heterogeneity. Additionally, the full spectral range from visible light (380-780 nm) to near infrared (780-2500 nm) could be further attempted to explore the effective band combinations. The spectral heterogeneity essentially reflects the differences of grassland species in physiological and biochemical characteristics. Previous studies demonstrated that the many biochemical characteristics of grassland, usually considered as functional diversity, could be accurately estimated using remote sensing, such as chlorophyll a, b, β -carotene and lutein by visible bands, leaf nitrogen content and carbon content by near infrared bands and cellulose by shortwave infrared bands [75][76][77][78]. The ability of spectral diversity calculated based on the bands which might be sensitive to specific biochemical characteristics even exceeds that of NDVI for grassland diversity monitoring ( Figure 6). With the relationships among spectral diversity, functional diversity and species diversity of grassland being further explored [35], the optimal functional components could also be used for indicating the species diversity.
Besides the biochemical characteristics, the functional diversity of grassland also includes the structural characteristics [45,79]. The vertical canopy structure even would mediate the link between spectral diversity and species diversity [80]. Although there are still uncertainties for obtaining grassland canopy structures, several structural characteristics of grassland have been estimated by terrestrial laser scanning. For instance, Guimarães-Steinicke et al. [81] regarded mean height and LAI as metrics of vertical structure, community stand gaps, canopy surface variation and emergent flowers as metrics of horizontal structure. Airborne or UAV-based LiDAR data could also be attempted to estimate the canopy structure characteristics of grassland over large areas [82]. However, the detection of grassland structure remains a challenge compared with forests, due to the more complex canopy structure and the smaller size of individual plants.
In addition, for the methods of mapping the species diversity of grassland, we usually consider the simple linear relationship between spectral diversity metrics and species diversity. Nevertheless, when the fundamental biochemical, structural and spectral characteristics related to the species diversity of grassland can be retrieved, a method of clustering by machine learning would be an alternative approach to monitor grassland diversity by remote sensing. For example, Zhao et al. [35] proposed a grassland species diversity estimation model that integrated the optimal biochemical components as functional diversity and spectral diversity based on the self-adaptive fuzzy c-means clustering algorithm.

Scales for Grassland Diversity Mapping
At the individual scale or leaf scale, spectral differences between species are enough to distinguish one species from another. However, it is difficult, in natural grassland, to separate individual tufts of different grassland species by some approaches applied in forests, such as the improved watershed algorithm [83,84]. Currently, most studies on grassland diversity mapping over a larger region were performed at pixel scale, just a few studies used proximal imaging spectrometry with millimeter resolution to estimate the species diversity of grassland, which was close to the individual scale [52,69]. Therefore, the spatial resolution of the acquired image and the scale of window used for estimation determine the accuracy of grassland species diversity mapping.
The commonly used Sentinel-2, Landsat TM and MODIS satellite data are mostly applied to retrieve different grassland biophysical parameters and estimate grassland types, communities or habitats by regression or classification model [85,86]. However, using these satellite data to monitor grassland species diversity based on spectral variation hypothesis is difficult, since the spectral variations are not just caused by species at such coarse spatial resolution [40]. The UAV or airborne data with spatial resolution in sub-meters provide the feasibility for estimating the α-diversity of the dominant species of grassland, which could also bridge the gaps between ground and satellite observations [36]. However, the mixture of several species within a pixel is the most common problem, even in this study by 0.2 m UAV data. Additionally, the individual size and structure of grassland plants also obviously varies between different grassland types, which exacerbates the problem of species mixing. Therefore, the optimal spatial resolution for monitoring grassland diversity at regional scale based on UAV or airborne still needs to be considered.
For the window scale of grassland diversity mapping, we used 5 × 5 pixels in this study due to the 1 m × 1 m size of thefield sample plots. Then, the simple linear relationships between spectral diversity and species diversity were demonstrated for species richness less than 10 and a Shannon-Wiener index less than 2 in each window. However, the spectral diversity could tend to reach saturation, due to the increasing species richness with upscaling the window size, which might be similar to the empirical power-law relationship between the species richness and area [87]. For example, Zhao et al. [35] found a tendency of gradual saturation based on the spectral characteristics when the species richness of grassland was above 17 within the estimation scale of 1.2 m × 1.2 m. Another study showed a contrary result that there was no saturated trend even when the species richness reached 50 within the windows of 60 m × 60 m, based on airborne data [32]. In addition, the window scale of diversity mapping is not only caused by the number of species, but also related to the environmental heterogeneity. Therefore, the optimal window scale for species diversity estimation in different study areas with different types of grassland and local environmental factors needs further study in the future by setting a gradient of window scale, especially for grassland with very high species richness.

Conclusions
In this study, we compared the relationships between four spectral diversity metrics and two species diversity indices, and further assessed the impact of soil on species diversity estimation based on UAV imaging spectroscopy in a natural alpine steppe in the Sanjiangyuan National Nature Reserve of China. We proved that spectral diversity metrics could be used to map grassland species diversity and the accuracy of estimation was improved effectively after removing soil information by the linear spectral unmixing model. Moreover, the Shannon-Wiener index and CV Multi were the better proxies of species diversity and spectral diversity, respectively. This result is conducive to bridging the scale gap for monitoring grassland species diversity from ground to near-ground by UAV data and provides a reference for assessing the impact of soil on species diversity estimation.
The UAV imaging spectroscopy and LiDAR technologies make it possible to monitor grassland species diversity at multiple perspectives and scales. These will also allow us to develop various methods and understand the scale differences caused by grassland itself and natural conditions. Further studies will explore the effectiveness of monitoring grassland species diversity based on more vegetation characteristics, which could be retrieved by remote sensing, such as biochemical and structural characteristics. Additionally, considering the variety of grassland types and their species, we suggest more attention should be paid to the applicability of the window scale and spatial resolution for grassland species diversity estimation, which would be a meaningful task for the accurate monitoring of grassland species diversity at a national or even global scale.

Conflicts of Interest:
The authors declare no conflict of interest.