Forest Tree Species Diversity Mapping Using ICESat-2/ATLAS with GF-1/PMS Imagery

: Forest ecosystems depend on species of tree variety. Remote sensing for obtaining large-scale spatial distribution information of tree species diversity is a geoscience research hotspot to overcome the limitations of conventional tree species diversity survey approaches. Airborne LiDAR or synergy with airborne optical imagery has been used to model and estimate tree species diversity for speciﬁc forest communities, with many revealing results. However, the data collection for such research is costly, the breadth of monitoring ﬁndings is limited, and obtaining information on the geographical pattern is challenging. To this end, we propose a method for mapping forest tree species diversity by synergy satellite optical remote sensing and satellite-based LiDAR based on the spectral heterogeneity hypothesis and structural variation hypothesis to improve the accuracy of the remote sensing monitoring of forest tree species diversity while considering data cost. The method integrates horizontal spectral variation from GF-1/PMS image data with vertical structural variation from ICESat-2 spot data to estimate the species diversity of trees. The ﬁndings reveal that synergistic horizontal spectral variation and vertical structural variation overall increase tree species diversity prediction accuracy compared to a single remote sensing variation model. The synergistic approach improved Shannon and Simpson indices prediction accuracy by 0.06 and 0.04, respectively, compared to the single horizontal spectral variation model. The synergistic model, single vertical structural variation model, and single horizontal spectral variation model were the best prediction models for Shannon, Simpson, and richness indices, with R 2 of 0.58, 0.62, and 0.64, respectively. This research indicates the potential of synergistic satellite-based LiDAR and optical remote sensing in large-scale forest tree species diversity mapping.


Introduction
Collecting data on the spatial patterns of forest tree species diversity is critical for the scientific delineation of priority biodiversity conservation areas, objective understanding of forest ecosystem productivity levels and stability, quantification of disturbance and drought resistance of forest ecosystems, and promotion of sustainable forest management [1][2][3][4].Currently, remote sensing technology, with its technical advantages of rapid acquisition of large-scale, continuous, and long-time series spatial data, is becoming an effective tool for monitoring forest species diversity and has great potential for the large-scale and continuous monitoring of forest biodiversity [4][5][6].How to use remote sensing to achieve the efficient large-scale and continuous monitoring of tree species diversity to overcome the inherent limitations of traditional monitoring methods is, therefore, an important objective for forest biodiversity research and conservation, and is of great importance for promoting biodiversity conservation and research and sustainable forest management [4,7].
For a long time, remote sensing modeling of tree species diversity has been based on the principle of spectral heterogeneity [8], which explains tree species diversity only Forests 2023, 14, 1537 2 of 20 from the spatial variability of spectral features in the forest canopy.However, such studies have not considered the relationship between the complexity of the vertical structure of the community and tree species diversity and then used remote sensing to quantify the variation features of the vertical structure of the forest to reflect tree species diversity from the dimension of vertical structure variability.This problem is particularly prominent in satellite remote sensing applications [9][10][11].Spectral variation information can explain the diversity of forest tree species through the spatial heterogeneity of spectral features in the vegetation canopy on a large scale.Spectral variation information is an indispensable source of information for the remote sensing modeling of vegetation diversity due to its high availability, support for continuous observation on a large scale, and wealth of information [8].However, spectral variability information at the canopy level can only reflect species variability in the horizontal orientation, not the vertical, and is susceptible to a light saturation effect.Vertical structure variation information can indirectly reflect tree species diversity through canopy height variability in the vertical direction, which is a useful complement to spectral variability information, but it cannot explain canopy-level tree species variation [12,13].Therefore, how to integrate the horizontal spectral variation and vertical structural variation based on satellite remote sensing data to objectively represent tree species diversity is the focus of current forest vegetation diversity modeling research, which is one of the important objectives of this study.
Studies have demonstrated that the complexity (diversity) of the vertical structure of forest communities forms and enhances the environmental heterogeneity of forest communities and is a significant factor influencing forest species diversity [14][15][16].In recent years, remote sensing has been used to extract or quantify the vertical structure of forest communities and its variability based on the relationship between the vertical structure and forest species diversity, and to model forest biodiversity combined with optical remote sensing, which has become a research hotspot for forest biodiversity monitoring [12,13,[17][18][19].To model and estimate tree species diversity indices in particular forest communities, previous studies have mostly employed airborne LiDAR (including UAV platforms) as the data source or combined aerial LiDAR and airborne optical images, and many illuminating study findings have been achieved.These studies have several clear limitations, however, including the following [19][20][21][22][23][24][25]: (1) expensive equipment and high data acquisition costs; (2) monitoring results limited to a small spatial range of "bands", making it difficult to obtain information on the spatial pattern of forest biodiversity and its variation patterns; (3) limited to specific communities, making it challenging to apply technical methods.The need for forest biodiversity surveys is now shifting toward large-scale, high timeliness, and multidimensionality, and there is an urgent need for biodiversity survey methodologies that can match this shifting demand.Furthermore, relevant research indicates that largescale, time-efficient, and multidimensional biodiversity monitoring is inextricably linked to satellite platforms and high-resolution remote sensing data [5][6][7].In recent years, ICESat-2/ATLAS, a satellite-borne lidar system of the latest iteration, has played an increasingly important role in a number of disciplines, including measurements of ice-cover thickness and surface elevation [26,27], the dynamic monitoring of lake water levels [28], and the imaging of forest physical parameters.In the estimation and mapping of forest physical parameters, ICESat-2 data are frequently used in synergy with satellite optical imagery for forest aboveground biomass inversion [29][30][31] and tree height mapping at the regional scale [32][33][34], and have become the crucial data source for large-scale forest aboveground biomass and tree height mapping.However, no studies on forest biodiversity monitoring using the novel satellite-based LIDAR ICESat-2/ATLAS in conjunction with satellite optical remote sensing have been reported to date, and there has been little exploration of the potential of combining horizontal spectral variation and vertical structural variation features of satellite remote sensing data for application in forest tree species diversity modeling [4].With the continuous innovation of remote sensing technology and the spectral variation hypothesis [8,35], the structural variation hypothesis [12], the high spatial resolution satellite optical images (GF-Series) [36], and especially the release of new satellite-based lidar data (e.g., ICESat-2/ATLAS) [37], this presents an effective and significant potential to integrate spectral variation information from the forest canopy with vertical structural information for forest biodiversity monitoring and mapping at the landscape and even larger scales.
This study aims to investigate the potential and feasibility of synergistic vertical structural and spectral variation features in modeling forest tree species diversity and to fill relevant research gaps to achieve high-precision remote sensing inversion of forest tree species diversity.To this end, the study attempts to combine vertical structural variability features derived from satellite-based LiDAR ICESat-2 data and spectral variability features derived from GF-1/PMS imagery as an alternative integrated feature set for estimating forest tree species diversity in the study area, so to discuss the potential of this synergistic approach in tree species diversity modeling and mapping.

Study Area
Haikou Forestry is located in Haikou Town, Xishan District, Kunming City, between 102 • 28 102 • 38 E and 24 • 43 24 • 56 N (see Figure 1); the altitude is 1800~2400 m, and the forest coverage is greater than 80%, encompassing approximately 7500 hectares.It is predominantly mountainous, with high terrain in the south and low terrain in the north, and the ridges run predominantly northeast-southwest.Haikou Forestry is a highland subtropical plateau mountain monsoon climate, dry and wet, with a small annual temperature difference and a large daily difference; the annual average temperature is 14.7 • C; a frost-free period can reach more than 240 days; the annual average sunshine time is approximately 2200 h; snowfall is uncommon; the annual average rainfall can reach between 900 and 1200 mm, and is primarily concentrated from May to July.The main tree species in the forest include Yunnan Qinggang (Cyclobalanopsis glaucoides Schottky.),Yunnan pine (Pinus yunnanensis Franch.),Huashan pine (Pinus armandi Franch.),Yunnan oil fir (Keteleeria evelyniana Mast.)Yuanjiang tannin (Castanopsis orthacantha Franch.),oak (Quercus), alder (Alnus cremastogyne Burk.), alpine tannin (Castanopsis delavayi), etc. [38][39][40].
Forests 2023, 14, x FOR PEER REVIEW 3 of 21 technology and the spectral variation hypothesis [8,35], the structural variation hypothesis [12], the high spatial resolution satellite optical images (GF-Series) [36], and especially the release of new satellite-based lidar data (e.g., ICESat-2/ATLAS) [37], this presents an effective and significant potential to integrate spectral variation information from the forest canopy with vertical structural information for forest biodiversity monitoring and mapping at the landscape and even larger scales.This study aims to investigate the potential and feasibility of synergistic vertical structural and spectral variation features in modeling forest tree species diversity and to fill relevant research gaps to achieve high-precision remote sensing inversion of forest tree species diversity.To this end, the study attempts to combine vertical structural variability features derived from satellite-based LiDAR ICESat-2 data and spectral variability features derived from GF-1/PMS imagery as an alternative integrated feature set for estimating forest tree species diversity in the study area, so to discuss the potential of this synergistic approach in tree species diversity modeling and mapping.

Study Area
Haikou Forestry is located in Haikou Town, Xishan District, Kunming City, between 102°28′102°38′ E and 24°43′24°56′ N (see Figure 1); the altitude is 1800~2400 m, and the forest coverage is greater than 80%, encompassing approximately 7500 hectares.It is predominantly mountainous, with high terrain in the south and low terrain in the north, and the ridges run predominantly northeast-southwest.Haikou Forestry is a highland subtropical plateau mountain monsoon climate, dry and wet, with a small annual temperature difference and a large daily difference; the annual average temperature is 14.7 °C; a frost-free period can reach more than 240 days; the annual average sunshine time is approximately 2200 h; snowfall is uncommon; the annual average rainfall can reach between 900 and 1200 mm, and is primarily concentrated from May to July.The main tree species in the forest include Yunnan Qinggang (Cyclobalanopsis glaucoides Schottky.),Yunnan pine (Pinus yunnanensis Franch.),Huashan pine (Pinus armandi Franch.),Yunnan oil fir (Keteleeria evelyniana Mast.)Yuanjiang tannin (Castanopsis orthacantha Franch.),oak (Quercus), alder (Alnus cremastogyne Burk.), alpine tannin (Castanopsis delavayi), etc. [38][39][40].

Sample Site Survey and Quantification of Tree Species Diversity
The period of the ground data survey was from September 2022 to November 2022.To match the spatial resolution of the adopted GF-1/PMS images, a 20 × 20 m sample Forests 2023, 14, 1537 4 of 20 square was laid out using a forest compass, a tape measure, and other instruments so that at least four GF-1/PMS images were located within it (a sample square in this study contains 100 PMS images).In addition, the handheld Trimble GPSGeoXT2008 was used to capture the sample corner's precise spatial coordinates, resulting in a positioning accuracy of greater than 0.5 m.Subsequently, the sample center coordinates were collected, and the initial condition was a tree height greater than 2 m.The species name and the number of seedlings belonging to each species were determined for each tree in the sample.During the survey, 45 samples were established along the elevation gradient of Haikou Forest, and the tree species diversity survey covered all forest community types (evergreen broad-leaved forest, deciduous broad-leaved forest, and mixed coniferous forest) in the study area.
This study utilized the Shannon diversity index (H) [41], the Simpson diversity index (D) [42], and the richness index (S) [43] to quantify the diversity level of tree species within the survey sample sites.In forest tree species diversity monitoring, richness is typically determined by the number of tree species within the survey sample; the Shannon index is currently the most widely used measure for evaluating forest species diversity, reflecting not only the abundance of species within the community but also the distributional diversity of species.On the other hand, the Simpson index is strongly correlated with the Shannon index, while being less correlated with richness and more sensitive to enriched species.The diversity indices listed above were calculated as follows: N denotes the number of tree species within the survey sample, i.e., tree species richness; P i is the probability of occurrence of tree species, i.e., the proportion of the number of individuals of the ith tree species to the total number of individuals of all tree species in the sample for a survey sample containing S tree species.

ICESat-2/ATLAS Dataset
Based on the literature research, this study proposes a forest tree species diversity inversion and mapping method based on ICESat-2/ATLAS data in collaboration with satellite multispectral images.In this study, ICESat-2/ATLA data are mainly used to construct the vertical remote sensing variation feature set based on the sample site scale and to better serve the remote sensing modeling of forest tree species diversity from the dimension of vertical structural variation.The ICESat-2/ATLAS-generated products ATL03 and ATL08 covering the subject area were chosen based on the requirements of the research.The ATL03 product is global positioning photon data that provides photon latitude and longitude coordinates and absolute elevation data, whereas the ATL08 product is land and vegetation height data that primarily provides vegetation canopy height and its statistical parameters and canopy height percentile within each 100 m segment [37,44].Both of these ICESat-2 data products were downloaded using Order Files from the National Snow and Ice Data Center website for this research.
This work employs an enhanced OPTICS photon denoising method [34] and an improved PTD photon classification algorithm [45].After ICESat-2/ATLAS data collection in the study area was completed, photon data were denoised and categorized.Based on the output of ATL03, ATLAS photon denoising, and classification produced tens of thousands of photon footpoints throughout the study area.After sampling the ATL03 data in 100 m segments, 520 residual noise records were removed and 3587 valid points were extracted, as shown in Figure 2. of photon footpoints throughout the study area.After sampling the ATL03 data in 100 m segments, 520 residual noise records were removed and 3587 valid points were extracted, as shown in Figure 2. ICESat-2 data has a high sampling density at larger spatial scales (only 0.7 m along the track), its data distribution is extremely dense, and the present study area is a subset of the larger scale region.Therefore, we hypothesize that there is a significant spatial correlation between ICESat-2 data within the study area and ICESat-2 data outside the adjacent study area, satisfying the intrinsic smoothness and second-order smoothness conditions, and conforming to the first law of geography, i.e., spatial autocorrelation.Of course, the density and spatial distribution pattern of ICESat-2 data may constrain the accuracy of remote sensing models to some degree.

GF-1/PMS Dataset
In this study, 2 m panchromatic images and 8 m multispectral images of the study area were acquired using the GF-1/PMS sensor as a data source for quantifying the spectral variability characteristics of forest tree species imaged in August 2020.The two-view PMS images of the study area acquired in 2020 were preprocessed by the ENVI5.3software platform, which included radiometric calibration, FLAASH atmospheric correction, orthorectification correction based on the RPC algorithm, image fusion, mosaic stitching, and cropping, and 2 m spatial resolution multiband real surface reflectance data of the study area were obtained [35,36].

Horizontal Remote Sensing Variation Feature Construction
This study extracted various spectral feature variables of the GF-1/PMS images after preprocessing based on the spectral heterogeneity hypothesis and with reference to related studies: B1-B4 band reflectance, vegetation index (Table 1), multispectral band first principal component (MSS_PC1), minimum noise separation feature (MSS_MNF) of the multispectral band, and a total of 21 variables as initial optical remote sensing features variables [10].Furthermore, using the ENVI 5.3 software, eight texture feature variables ICESat-2 data has a high sampling density at larger spatial scales (only 0.7 m along the track), its data distribution is extremely dense, and the present study area is a subset of the larger scale region.Therefore, we hypothesize that there is a significant spatial correlation between ICESat-2 data within the study area and ICESat-2 data outside the adjacent study area, satisfying the intrinsic smoothness and second-order smoothness conditions, and conforming to the first law of geography, i.e., spatial autocorrelation.Of course, the density and spatial distribution pattern of ICESat-2 data may constrain the accuracy of remote sensing models to some degree.

GF-1/PMS Dataset
In this study, 2 m panchromatic images and 8 m multispectral images of the study area were acquired using the GF-1/PMS sensor as a data source for quantifying the spectral variability characteristics of forest tree species imaged in August 2020.The two-view PMS images of the study area acquired in 2020 were preprocessed by the ENVI5.3software platform, which included radiometric calibration, FLAASH atmospheric correction, orthorectification correction based on the RPC algorithm, image fusion, mosaic stitching, and cropping, and 2 m spatial resolution multiband real surface reflectance data of the study area were obtained [35,36].

Two-Dimensional Remote Sensing Variation Feature Construction Method for Tree Species Diversity Modeling 2.4.1. Horizontal Remote Sensing Variation Feature Construction
This study extracted various spectral feature variables of the GF-1/PMS images after preprocessing based on the spectral heterogeneity hypothesis and with reference to related studies: B1-B4 band reflectance, vegetation index (Table 1), multispectral band first principal component (MSS_PC1), minimum noise separation feature (MSS_MNF) of the multispectral band, and a total of 21 variables as initial optical remote sensing features variables [10].Furthermore, using the ENVI 5.3 software, eight texture feature variables were extracted from the first main component of the GF-1 multispectral picture based on the grayscale co-occurrence matrix GLCM with a window size of 3 × 3 (Table 2).The coordinates (i, j) in image GLMC represent the pixel gray-level contrast relationship, where i and j represent the gray level between two pixels, with values ranging from 0 to N − 1, and N is the total number of gray levels.

GLCM Texture Features of GF-1/PMS Image
We calculated the dispersion statistics of the aforementioned categories of optical remote sensing feature variables at the sample scale (20 × 20 m) based on the theory of spectral variation and the assumption of habitat heterogeneity.In other words, the standard deviation (SD) [59], variance [9], mean, and coefficient of variation (CV) [11] statistics of the aforementioned optical remote sensing feature variables were calculated at a scale of 20 × 20 m as an alternative set of optical remote sensing variation features used in modeling to quantify the spatial heterogeneity of the forest canopy spectral features for the development of horizontal spectral variation features (HSVFs).
Before modeling tree species diversity, the aforementioned spectral variables derived from the GF-1/PMS images had to be decorrelated, i.e., feature variables with correlation coefficients r > 0.70 were removed, and a total of 16 spectral variables, as shown in Figure 3, were retained to form the final set of alternate spectral variation features.Before modeling tree species diversity, the aforementioned spectral variables derived from the GF-1/PMS images had to be decorrelated, i.e., feature variables with correlation coefficients r > 0.70 were removed, and a total of 16 spectral variables, as shown in Figure 3, were retained to form the final set of alternate spectral variation features.

Vertical Remote Sensing Variation Feature Construction
To realize the modeling of forest tree species diversity monitoring from two dimensions of horizontal spectral variation and vertical structural variation by combining satellite optical remote sensing and satellite-based LiDAR, this study proposes a method to construct the vertical heterogeneity of remote sensing features using the new satellitebased LiDAR ICESat-2/ATLAS data.The method is based on the ATL08 product of ICESat-2 data, and the dispersion index of distinct percentile canopy heights at each spot footpoint is calculated by extracting the parameters of each spot footpoint's percentile canopy height [12].As shown in Table 3, the standard deviation, variance, mean, and coefficient of variation of each percentile canopy height were extracted to construct the remote sensing vertical variation feature set (VSVF) for monitoring forest tree species diversity.

Vertical Remote Sensing Variation Feature Construction
To realize the modeling of forest tree species diversity monitoring from two dimensions of horizontal spectral variation and vertical structural variation by combining satellite optical remote sensing and satellite-based LiDAR, this study proposes a method to construct the vertical heterogeneity of remote sensing features using the new satellite-based LiDAR ICESat-2/ATLAS data.The method is based on the ATL08 product of ICESat-2 data, and the dispersion index of distinct percentile canopy heights at each spot footpoint is calculated by extracting the parameters of each spot footpoint's percentile canopy height [12].As shown in Table 3, the standard deviation, variance, mean, and coefficient of variation of each percentile canopy height were extracted to construct the remote sensing vertical variation feature set (VSVF) for monitoring forest tree species diversity.This study spatially extrapolates remotely sensed vertical variance feature variables from spatially discrete data to continuous raster data using kriging interpolation based on structural analysis of variance functions.The expressions for variance functions [60,61] and kriging interpolation [62] are provided below.where (h) is the variance function; h is the sampling spacing of paired sample points; n(h) is the total number of pairs of sample points when the spacing is h, z(X i + h) are the target variables at position X i , and (X i + h) are the attribute values of the target variable at position and location, respectively.
where Z 0 is the attribute value of the remote sensing vertical variation feature at the unknown point, i is the contribution weight of the sample points involved in interpolation to the attribute of the point to be estimated, X i is the sample point location, and Z(X i ) is the observed value of the remote sensing vertical variation feature at location X i .When estimating the contribution weights of the known points, not only the distance between the points to be estimated and the known points should be considered, but also the values of the known points' characteristics and their spatial distribution patterns.

Feature Preference and Importance Ranking
Before feature variable screening, only the initial spectral feature variables with correlation coefficients R < 0.7 are retained in this study to assure the feasibility and efficacy of the modeling.To achieve a preferential selection of the model features and reduce the redundancy of the feature variables, this study employs a recursive feature screening method based on the RF model, namely the RF-RFE algorithm, to eliminate feature variables that contribute less to model accuracy and retain feature variables that are vital for reducing the model's RMSE [63].The RF-RFE algorithm uses the change in the model RMSE as the basis for feature preference.If the model RMSE is significantly reduced due to the introduction of feature variables, such features are retained for their eventual participation in the modeling; otherwise, such variables are removed.RF-RFE quantifies the modeling relevance of each feature based on its classification contribution [63][64][65].

GBRT Regression Modeling
GBRT (gradient boosting regression tree, abbreviated as GBRT) is a combinatorial ensemble learning algorithm [66] made up of three basic elements: base learners, gradient boosting, and reduction algorithms.GBRT combines the results of all its base learners by a set of principles to produce the final prediction.As the base learner of GBRT, each new regression tree fit is taught by the residuals taught by the prior regression tree.It continuously reduces residuals in a gradient-boosting manner and adjusts the learning rate to enhance the learning effect [67,68].The gradient boosting regression tree GBRT algorithm can be broken down into several fundamental stages, as shown in Table 4 [67], based on a summary of related research.Table 4. Basic steps and process of the GBRT algorithm with the gradient boosting regression tree [67].

Input
Training samples T = (x i , y i ) = x i1 , . . ., x ip, y i

Model Performance Evaluation
In this paper, the model performance is evaluated using the leave-one-out method of cross-validation (LOOCV) as well as the coefficient of determination R 2 and root mean square error (RMSE) as the model accuracy evaluation index.

Results of Data Analysis of Tree Species Diversity Sample Plots
This study calculated the tree species diversity indices within the 45 sample plots using the diversity index calculation formulas of Equations ( 1)-( 3) and obtained a summary of the tree species diversity indices of the sample plots, as shown in Table 5.Moreover, the scatter diagram of the correlation between the Simpson index and the Shannon-Wiener index (Figure 4) visually demonstrates the significant correlation between the two.

Model Performance Evaluation
In this paper, the model performance is evaluated using the leave-one-out method of cross-validation (LOOCV) as well as the coefficient of determination R 2 and root mean square error (RMSE) as the model accuracy evaluation index.

Results of Data Analysis of Tree Species Diversity Sample Plots
This study calculated the tree species diversity indices within the 45 sample plots using the diversity index calculation formulas of Equations ( 1)-( 3) and obtained a summary of the tree species diversity indices of the sample plots, as shown in Table 5.Moreover, the scatter diagram of the correlation between the Simpson index and the Shannon-Wiener index (Figure 4) visually demonstrates the significant correlation between the two.

Interpolation Results of Vertical Structural Variation Features
Based on the results of the structural analysis of the vertical structural variation features' variation function (its optimal variation function theoretical model and its parameters) (Table 6), kriging interpolation was performed for each of the above vertical structural variation features, and then a continuous raster layer of the attribute values of the vertical structural variation features of the target in the study area was formed, as shown in Figure 1.Table 6 summarizes the optimal variance function theoretical models for each ICESat-2 light-spot factor-based vertical structural variation feature.The analysis of the structure of the variance functions shows that the exponential model fits the vertical structural variance features in the study area better overall.The optimal variation function theoretical models for features sd_CH, var_CH, and mean_CH are all exponential models, while the optimal variation function theoretical model for feature cv_RH is spherical.The variance degrees of the above four vertical remote sensing variance features are all <11%, the spatial autocorrelation is very significant, and all of them have the conditions for spatial interpolation (Figure 5).
ters) (Table 6), kriging interpolation was performed for each of the above vertical structural variation features, and then a continuous raster layer of the attribute values of the vertical structural variation features of the target in the study area was formed, as shown in Figure 1.  6 summarizes the optimal variance function theoretical models for each ICESat-2 light-spot factor-based vertical structural variation feature.The analysis of the structure of the variance functions shows that the exponential model fits the vertical structural variance features in the study area better overall.The optimal variation function theoretical models for features sd_CH, var_CH, and mean_CH are all exponential models, while the optimal variation function theoretical model for feature cv_RH is spherical.The variance degrees of the above four vertical remote sensing variance features are all <11%, the spatial autocorrelation is very significant, and all of them have the conditions for spatial interpolation (Figure 5).

Results of Remote Sensing Modeling Feature Preferences for Tree Species Diversity
The horizontal spectral variation features and vertical structural variation features, as well as the synergistic features of spectral variation and structural variation, were optimized using the RFE algorithm to improve the feasibility and efficiency of the modeling.Figure 6 demonstrates the outcomes.
After the preferred selection, the number of spectral variance features derived from the GF-1/PMS images that entered the modeling of the three diversity indices, such as Shannon, Simpson, and richness, was 3, 12, and 4, respectively.The spectral variation features contributing to the Shannon index include var_mNDVI, mean_B2, cv_Second-Moment (variation coefficient of second-order moment texture information under a 3 × 3 window), etc.There are 12 variables involved in Simpson index modeling, including var_mNDVI, mean_B2, and var_CRI.Moreover, the tree richness index modeling features include mean_B2, cv_SecondMoment, var_Correlation, and var_mNDVI.
For the vertical structural variation features constructed using ICESat-2/ATLAS data, the final number of vertical structural variation features involved in the modeling of the Shannon index, Simpson diversity index, and richness index of tree species in the study area were 1, 4, and 2, respectively.The optimal features for the Shannon index modeling were only cv_CH, while the optimal features for the Simpson and richness index models were cv_CH, sd_CH, var_CH, mean_CH, and cv_CH with sd_CH, respectively.The optimal features of the Shannon index modeling were only cv_CH, while the optimal features of the Simpson and richness index models were cv_CH, sd_CH, var_CH, mean_CH, and cv_CH with sd_CH, respectively, and there were common optimal features.

Results of Remote Sensing Modeling Feature Preferences for Tree Species Diversity
The horizontal spectral variation features and vertical structural variation features, as well as the synergistic features of spectral variation and structural variation, were optimized using the RFE algorithm to improve the feasibility and efficiency of the modeling.Figure 6 demonstrates the outcomes.For the synergistic features combining the above spectral variance features and structural variance features, the final number of features involved in tree species Shannon, Simpson, and richness index modeling was 4, 16, and 6, respectively.That is, the RMSE corresponding to the Shannon-Wiener, Simpson, and richness index models can be reduced to the minimum value when one, four, and two vertical variation features are taken, respectively.After the preferred selection, the number of spectral variance features derived from the GF-1/PMS images that entered the modeling of the three diversity indices, such as Shannon, Simpson, and richness, was 3, 12, and 4, respectively.The spectral variation features contributing to the Shannon index include var_mNDVI, mean_B2, cv_SecondMoment (variation coefficient of second-order moment texture information under a 3 × 3 window), etc.There are 12 variables involved in Simpson index modeling, including var_mNDVI, mean_B2, and var_CRI.Moreover, the tree richness index modeling features include mean_B2, cv_SecondMoment, var_Correlation, and var_mNDVI.

Ranking the Importance of Tree Species Diversity Remote Sensing Modeling Features
For the vertical structural variation features constructed using ICESat-2/ATLAS data, the final number of vertical structural variation features involved in the modeling of the Shannon index, Simpson diversity index, and richness index of tree species in the study area were 1, 4, and 2, respectively.The optimal features for the Shannon index modeling were only cv_CH, while the optimal features for the Simpson and richness index models were cv_CH, sd_CH, var_CH, mean_CH, and cv_CH with sd_CH, respectively.The optimal features of the Shannon index modeling were only cv_CH, while the optimal features of the Simpson and richness index models were cv_CH, sd_CH, var_CH, mean_CH, and cv_CH with sd_CH, respectively, and there were common optimal features.
For the synergistic features combining the above spectral variance features and structural variance features, the final number of features involved in tree species Shannon, Simpson, and richness index modeling was 4, 16, and 6, respectively.That is, the RMSE corresponding to the Shannon-Wiener, Simpson, and richness index models can be reduced to the minimum value when one, four, and two vertical variation features are taken, respectively.In the modeling of tree species diversity based on a single level of spectral variation characteristics, the variables mean_B2 and var_mNDVI play a crucial role in both Shannon and Simpson diversity modeling, indicating their great significance.The most significant contribution is made by the variable mean_B2, which rates first in the importance for Shannon diversity, Simpson diversity, and richness index modeling, and contributes the most to the accuracy of the model.Overall, the variables mean_B2 and var_mNDVI had the greatest influence on model accuracy and were highly indicative of tree species diversity.In the modeling of tree species diversity based on a single level of spectral variation characteristics, the variables mean_B2 and var_mNDVI play a crucial role in both Shannon and Simpson diversity modeling, indicating their great significance.The most significant contribution is made by the variable mean_B2, which rates first in the importance for Shannon diversity, Simpson diversity, and richness index modeling, and contributes the most to the accuracy of the model.Overall, the variables mean_B2 and var_mNDVI had the greatest influence on model accuracy and were highly indicative of tree species diversity.

Ranking the Importance of Tree Species Diversity Remote Sensing Modeling Features
The feature cv_CH (coefficient of variation of different percentile canopy heights) ranked first in importance in all three tree species diversity indices modeling and was the most critical vertical structural variation feature for forest tree species diversity modeling in the research of the tree species diversity modeling based on a single vertical structural variation feature.This was followed by sd_CH, which came in second in both Simpson's diversity and richness indices models, just below cv_CH.
The three features, cv_CH, mean_B2, and var_mNDVI, were the most important in the synergistic modeling of tree species diversity in the study area with GF-1 data and ICESat-2 data, followed by the feature cv_SecondMoment.All the above four remote sensing variables were involved in modeling the three tree species diversity indices.

Optimization Results of GBRT Model Parameters
The "gbm() function" from the R language caret package and the tidyverse package was used to train the GBRT model of forest species diversity in the study area.In addition, leave-one-out cross-validation (LOOCV) was used to find the optimal values of four key hyperparameters that affect the performance of the GBRT model: n.trees, shrinkage, interaction, depth, and n.minobsinnode.The parameter with the lowest RMSE was chosen as the modeling parameter [69].Table 7 shows its model parameters search and performance parameters.

Model Performance for Remote Sensing Estimation of Tree Species Diversity
Based on the results of model feature selection, the optimized GBRT model was used to estimate three forest species diversity indices in the study area based on GF-1 data, ICESat-2 data, and synergistic data of ICESat-2 and GF-1, respectively, to realize forest species diversity modeling by synergistic satellite optical remote sensing and satellite-based LiDAR.The estimation results are shown in the scatter plot in Figure 8.This study compared the horizontal spectral variance features based on GF-1 images, the vertical structural variance features based on ICESat-2 data, and the synergistic features combining horizontal spectral variance features and vertical structural variance features based on the estimation of forest species diversity indices and their effects on model accuracy.Table 8 displays the effects of GF-1 images, ICESat2 data, and synergistic features incorporating horizontal spectral variation and vertical structure variation on the estimation of forest tree species diversity in the study area.Among the above three types of forest tree species diversity monitoring models, the horizontal spectral variance feature based on GF-1 image data had the best prediction of the richness index of forest tree species in the study area, with an R2 of 0.56; the vertical structural variance feature based on ICESat-2 data had the highest estimation accuracy of the Simpson diversity index of tree species, with an R2 of 0.54; the model combining hor- This study compared the horizontal spectral variance features based on GF-1 images, the vertical structural variance features based on ICESat-2 data, and the synergistic features combining horizontal spectral variance features and vertical structural variance features based on the estimation of forest species diversity indices and their effects on model accuracy.Table 8 displays the effects of GF-1 images, ICESat2 data, and synergistic features incorporating horizontal spectral variation and vertical structure variation on the estimation of forest tree species diversity in the study area.Among the above three types of forest tree species diversity monitoring models, the horizontal spectral variance feature based on GF-1 image data had the best prediction of the richness index of forest tree species in the study area, with an R2 of 0.56; the vertical structural variance feature based on ICESat-2 data had the highest estimation accuracy of the Simpson diversity index of tree species, with an R2 of 0.54; the model combining horizontal spectral variance features and vertical structural variance features (hereafter referred to as the "synergistic model") showed the strongest estimation ability for the tree species Shannon index, with an R2 of 0.64.In addition, the study showed that the synergistic model performed best overall and could improve the prediction accuracy of tree species diversity indices in general.Among the three tree species diversity indices, the synergistic model outperformed the other two single-source models in estimating two of the diversity indices.Compared with the single spectral variation model, the synergistic model improved the estimation accuracy of the Shannon and Simpson diversity indices from 0.58 to 0.64 and from 0.43 to 0.47, respectively, and the inverse accuracy of the Shannon and richness indices from 0.62 to 0.64 and from 0.43 to 0.47, respectively, compared with the single vertical structure variation model improved from 0.62 to 0.64 and from 0.52 to 0.54, respectively.

Predictive Mapping of Forest Species Diversity in the Study Area
Based on the leave-one-out cross-validation accuracy performance of each model in Table 8, we used the GBRT model to map the Shannon, Simpson, and richness indices of tree species in the study area based on the synergistic feature of combining horizontal spectral variation with vertical structural variation, single vertical structural variation feature, and single horizontal spectral variation feature at a mapping scale of 20 × 20 m, respectively.Figure 9 shows the spatial distribution of the Shannon and Simpson diversity and richness indices of tree species in the study area predicted by the GBRT model.The spatial heterogeneity of forest tree species diversity in the study area is strong, and the diversity level generally decreases from north to south, showing an overall spatial distribution pattern of "high in the east and low in the west, high in the north and low in the south".

Predictive Mapping of Forest Species Diversity in the Study Area
Based on the leave-one-out cross-validation accuracy performance of each model in Table 8, we used the GBRT model to map the Shannon, Simpson, and richness indices of tree species in the study area based on the synergistic feature of combining horizontal spectral variation with vertical structural variation, single vertical structural variation feature, and single horizontal spectral variation feature at a mapping scale of 20 × 20 m, respectively.Figure 9 shows the spatial distribution of the Shannon and Simpson diversity and richness indices of tree species in the study area predicted by the GBRT model.The spatial heterogeneity of forest tree species diversity in the study area is strong, and the diversity level generally decreases from north to south, showing an overall spatial distribution pattern of "high in the east and low in the west, high in the north and low in the south".

Discussion
Based on the spectral variance hypothesis and structural variance hypothesis, this study proposes a method of forest species diversity inversion combining satellite optical remote sensing and satellite-based LiDAR, which improves the accuracy of forest species diversity inversion over large areas and achieves high-precision inversion and the mapping of forest species diversity (R 2 up to 0.64).The method uses GF-1/PMS data and ICE-

Discussion
Based on the spectral variance hypothesis and structural variance hypothesis, this study proposes a method of forest species diversity inversion combining satellite optical remote sensing and satellite-based LiDAR, which improves the accuracy of forest species diversity inversion over large areas and achieves high-precision inversion and the mapping of forest species diversity (R 2 up to 0.64).The method uses GF-1/PMS data and ICESat-2/ATLAS data to construct horizontal and vertical remote sensing variability features as modeling ensembles, respectively, and then discusses the feasibility and effectiveness of synergistic satellite-based LiDAR and satellite optical remote sensing in forest species diversity modeling inversion.

Analysis of the Contribution of Horizontal and Vertical Remote Sensing Variables in Modeling Tree Species Diversity
Among the spectral variation feature variables, three variables, var_mNDVI, mean_B2, and cv_SecondMoment, contributed most significantly to the accuracy of remote sensing modeling of tree species diversity the study area and were the key variables affecting the accuracy of inversion of the tree species diversity index.Among the vertical structural variation features, the feature cv_CH showed the highest importance, whether based on a single spectral variation feature or synergizing vertical structural variation features with spectral variation features for tree species diversity modeling.In the synergistic modeling of horizontal spectral variation and vertical structural variation features, the importance of vertical structural variation features was higher than that of spectral variation features overall.

Performance of ICESat-2/ATLAS and GF-1/PMS Data in Tree Species Diversity Estimation
Although it had only four bands, the GF-1/PMS optical image data had a great capacity to show the diversity of tree species that could be found in subtropical biological public forests.In addition to the Simpson index, both the Shannon and richness indices had model decidability coefficients greater than 0.5.The ability of the ICESat-2 data to make accurate predictions was generally superior to that of the PMS data.When it came to estimating the tree species diversity, the synergistic data obtained by combining ICESat-2 and PMS data performed better than the two individual datasets.

Effect of Synergistic Methods on the Accuracy of Inversion of Tree Diversity Indices
Using the leave-one-out cross-validation accuracy of the best model GBRT as the evaluation criterion, the synergistic strategy improved the prediction accuracy of tree species diversity indices in the study area to different degrees.Compared with modeling based on single GF-1/PMS data, the prediction accuracy R 2 of tree species Shannon index improved from 0.58 to 0.64, and the prediction accuracy R 2 of Simpson diversity improved from 0.43 to 0.47.Compared with modeling based on single ICESat-2 data, the prediction accuracy R 2 of tree species Shannon index improved from 0.62 to 0.64.In addition, the prediction accuracy of the synoptic method for the tree species richness index was lower than that of the model based on single PMS data only, but its R 2 still reached 0.54.Thus, synoptic satellite optical remote sensing and satellite-based LiDAR were generally effective in improving the modeling accuracy of tree species diversity, which is important for future research on large-scale forest biodiversity modeling and mapping.

Analysis of the Potential of ICESat-2/ATLAS in Forest Tree Species Diversity Mapping
The ICESat-2 data-based vertical structural variability characteristics showed the best predictive power for the Shannon diversity index, with a model decidability factor R 2 of 0.62, and the second-best prediction for the Simpson index, with an R 2 of 0.54, both of which were better than the GF-1/PMS data.Our study shows that ICESat-2 data have significant advantages of large-scale monitoring, high cost-effectiveness, and accessibility compared with airborne LiDAR, which has high data acquisition cost, expensive equipment, and small monitoring range, and provide new opportunities for efficient large-scale monitoring and mapping of forest biodiversity in the future.

Spatial Patterns of Tree Species Diversity in the Study Area
The spatial variation of forest species diversity in the study area was large, with an overall spatial pattern of "high in the north and low in the south, high in the east and low in the west".The high values of tree species diversity were concentrated in the northeastern and central-eastern parts of the study area, while the low values were mainly distributed in the western and southern fringes of the study area.The consistency of the spatial distribution of the Shannon and Simpson diversity indices reflects the intrinsic link between different metrics of species α-diversity.

Conclusions
In this study, horizontal spectral variability features (horizontal remote sensing variability features) and vertical structural variability features (vertical remote sensing variability features) were constructed based on the GF-1/PMS and ICESat-2/ATLAS data, respectively, to serve as the modeling feature set, and then explored the potential and validity of the synergistic onboard Lidar and satellite-based optical remote sensing in the mapping of the diversity of forest tree species.It was found that the introduction of vertical structure variability features obtained from satellite LiDAR data could effectively improve the accuracy of the prediction of tree species diversity models.This suggests that forest vertical structure information can effectively explain tree species diversity to a certain extent, and is an important complement to spectral variation information in tree species diversity modeling.This study suggests that satellite-based LiDAR data have greater potential than optical imagery for modeling tree species diversity.The synergistic modeling approach of horizontal spectral and vertical structural variation features improved the prediction accuracy of tree species diversity in general, and its prediction ability was better than that of the model containing only a single remotely sensed variation feature.The synergistic combination of optical satellite remote sensing and satellite-based LiDAR holds great promise for future large-scale forest tree species diversity mapping.

Figure 1 .
Figure 1.Location of the study area and distribution of tree species diversity sample plots.Figure 1. Location of the study area and distribution of tree species diversity sample plots.

Figure 1 .
Figure 1.Location of the study area and distribution of tree species diversity sample plots.Figure 1. Location of the study area and distribution of tree species diversity sample plots.

Figure 2 .
Figure 2. Distribution of ICESat-2/ATLAS light spots covering the study area obtained in this study.

Figure 2 .
Figure 2. Distribution of ICESat-2/ATLAS light spots covering the study area obtained in this study.

Forests 2023 ,
14, 1537 7 of 20 20 × 20 m as an alternative set of optical remote sensing variation features used in modeling to quantify the spatial heterogeneity of the forest canopy spectral features for the development of horizontal spectral variation features (HSVFs).

Figure 3 .
Figure 3. Variable correlation matrix of spectral variation feature variable set after correlation processing.

Ii=1
Prediction sample (x0, y 0 ) = (x 01 , x 02 , . . . ,x 0 , y 0 ) Number of residual tree training M, reduced step size λ, complexity parameter cp Training samples T i = T where ŷ = (y 1 , y 2 , . . . ,y n ) For j = 1, 2, . . ., M Training process (1) Train the jth residual tree on the training samples T j based on the complexity parameter (2) Give the predicted value of the training sample based on the jth residual tree (3) Update the output variables of the training sample T j to obtain T j+1 .END For Output y 0 = ŷM 0 + λ ∑ M−1

Figure 4 .
Figure 4. Scatter plot of Shannon-Wiener and Simpson diversity indices in the sample area.

Figure 4 .
Figure 4. Scatter plot of Shannon-Wiener and Simpson diversity indices in the sample area.

Forests 2023 , 21 Figure 5 .
Figure 5. Spatial interpolation results of ICESat-2 light-spot vertical structure variation characteristics in 4 study areas: (a) Standard deviation of percentile canopy height, (b) variance of percentile canopy height, (c) mean percentile canopy height, and (d) coefficient of variation of (a) standard deviation of percentile canopy height, (b) variance of percentile canopy height, (c) mean percentile canopy height, and (d) coefficient of variation of percentile canopy height.

Figure 5 .
Figure 5. Spatial interpolation results of ICESat-2 light-spot vertical structure variation characteristics in 4 study areas: (a) Standard deviation of percentile canopy height, (b) variance of percentile canopy height, (c) mean percentile canopy height, and (d) coefficient of variation of (a) standard deviation of percentile canopy height, (b) variance of percentile canopy height, (c) mean percentile canopy height, and (d) coefficient of variation of percentile canopy height.

Figure 6 .
Figure 6.Three types of remote sensing feature preferences (GF-1, ICESat-2, and ICESat-2 + GF-1) for the tree species diversity model based on the RF-RFE algorithm.(a) The number of variables selected for the Shannon diversity model; (b) The number of variables selected for the Simpson diversity model; (c) The number of variables selected for the richness model.

Figure 7
Figure7depicts the distinctions between the contributions of each variable in the modeling of the three tree diversity indices.The RF-RFE algorithm employs the parameters %IncMSE and %IncNodepority as quantitative measures of the variables' significance.

Figure 6 .
Figure 6.Three types of remote sensing feature preferences (GF-1, ICESat-2, and ICESat-2 + GF-1) for the tree species diversity model based on the RF-RFE algorithm.(a) The number of variables selected for the Shannon diversity model; (b) The number of variables selected for the Simpson diversity model; (c) The number of variables selected for the richness model.

Figure 7
Figure 7 depicts the distinctions between the contributions of each variable in the modeling of the three tree diversity indices.The RF-RFE algorithm employs the parameters %IncMSE and %IncNodepority as quantitative measures of the variables' significance.

Figure 7 .
Figure 7. Ranking of feature importance for tree species diversity models based on different remote sensing data.(a) Feature importance ranking for the Shannon diversity model; (b) Feature importance ranking for the Simpson diversity model; (c) Feature importance ranking for the richness model.

Figure 7 .
Figure 7. Ranking of feature importance for tree species diversity models based on different remote sensing data.(a) Feature importance ranking for the Shannon diversity model; (b) Feature importance ranking for the Simpson diversity model; (c) Feature importance ranking for the richness model.

Forests 6 .
Model Performance for Remote Sensing Estimation of Tree Species DiversityBased on the results of model feature selection, the optimized GBRT model was used to estimate three forest species diversity indices in the study area based on GF-1 data, ICESat-2 data, and synergistic data of ICESat-2 and GF-1, respectively, to realize forest species diversity modeling by synergistic satellite optical remote sensing and satellitebased LiDAR.The estimation results are shown in the scatter plot in Figure8.

Forests 2023 , 21 Figure 8 .
Figure 8. Scatter plots of estimated versus observed tree diversity indices were obtained using the horizontal spectral variance (GF-1/PMS), the vertical structural variance (ICESat-2/ATLAS), and the synergistic feature combining the horizontal spectral variance with the vertical structural variance (ICESat-2 + GF-1) modeled under leave-one-out cross-validation.(a) Performance of the Shannon diversity model; (b) Performance of the Simpson diversity model; (c) Performance of the richness model.

Figure 8 .
Figure 8. Scatter plots of estimated versus observed tree diversity indices were obtained using the horizontal spectral variance (GF-1/PMS), the vertical structural variance (ICESat-2/ATLAS), and the synergistic feature combining the horizontal spectral variance with the vertical structural variance (ICESat-2 + GF-1) modeled under leave-one-out cross-validation.(a) Performance of the Shannon diversity model; (b) Performance of the Simpson diversity model; (c) Performance of the richness model.

Figure 9 .
Figure 9. Distribution of inversion results of forest species diversity in the study area.(a) Predicted distribution of tree species Shannon diversity; (b) Predicted distribution of tree species Simpson diversity; (c) Predicted distribution of tree species richness.(a) Predicted distribution of tree species Shannon diversity; (b) Predicted distribution of tree species Simpson diversity; (c) Predicted distribution of tree species richness.

Figure 9 .
Figure 9. Distribution of inversion results of forest species diversity in the study area.(a) Predicted distribution of tree species Shannon diversity; (b) Predicted distribution of tree species Simpson diversity; (c) Predicted distribution of tree species richness.(a) Predicted distribution of tree species Shannon diversity; (b) Predicted distribution of tree species Simpson diversity; (c) Predicted distribution of tree species richness.

Table 3 .
Construction of vertical remote sensing variation features for tree species diversity modeling.
Figure 3. Variable correlation matrix of spectral variation feature variable set after correlation processing.

Table 3 .
Construction of vertical remote sensing variation features for tree species diversity modeling.

Table 5 .
Descriptive data of tree species diversity survey in the study area.Prediction sample ( ,  ) = ( ,  , … ,  ,  ) Number of residual tree training M, reduced step size λ, complexity parameter cp Training samples  =  where  = ( ,  , … ,  ) For j = 1, 2, ..., M Train the jth residual tree on the training samples  based on the complexity parameter (2) Give the predicted value of the training sample based on the jth residual tree (3) Update the output variables of the training sample  to obtain  .

Table 5 .
Descriptive data of tree species diversity survey in the study area.

Table 6 .
Summary of optimal variogram theoretical models for variation characteristics of each vertical structure.

Table 6 .
Summary of optimal variogram theoretical models for variation characteristics of each vertical structure.

Table 7 .
Table of GBRT model parameters for different tree species diversity.

Table 8 .
Comparison of the accuracy of tree species diversity estimation for GF-1, ICESat-2 data, and their synoptic data.

Table 8 .
Comparison of the accuracy of tree species diversity estimation for GF-1, ICESat-2 data, and their synoptic data.