Forest Canopy Cover Inversion Exploration Using Multi-Source Optical Data and Combined Methods

: An accurate estimation of canopy cover can provide an important basis for forest ecological management by understanding the forest status and change patterns. The aim of this paper is to investigate the four methods of the random forest (RF), support vector regression (SVR), k-nearest neighbor (KNN),


Introduction
Forest canopy cover is not only a good indicator of forest resources, but also a basic indicator of the effectiveness of forests and the extent of the spatial use of forest stands, which can be used to support forest management decisions.By grasping its characteristics Forests 2023, 14, 1527 2 of 17 and changing patterns, we can establish the foundation for forest management.Traditional methods of estimating forest canopy cover include in situ measurements, point sampling methods, line sampling methods, and canopy projection, which are not only time-consuming and laborious [1,2], but also include more subjective human factors in the process of measurement [3] and can only obtain a small amount of data in a small area, which is not conducive to studying the distribution and changes of forest canopy cover in a larger area [4].
With the development of remote sensing technology, the study of forest canopy cover estimation has entered a whole new stage.In recent years, the maturity of LiDAR technology has provided an efficient and highly accurate method for estimating forest canopy cover.Studies have shown that high-density laser point cloud data can be used as a sample for forest canopy cover estimation after normalization [5,6], and the sample size from LiDAR is larger and more accurate than in situ measurements, point sampling methods, line sampling methods, and canopy projection methods [7], but it is expensive to obtain full-coverage laser point cloud data over a large area.On the contrary, in recent years, a large amount of freely available open-source multispectral satellite remote sensing data (e.g., Landsat series satellite data, Sentinel-2 series satellite data) have emerged with the capability of low-cost, rapid, and continuous regional and global scale monitoring, providing an important data source and potential for dynamic, rapid, timely and convenient access to forest canopy cover information on a large scale [8].With the successful launch of the Gaofen-1 (GF-1) satellite in 2013, a new direction for forest parameter extraction and research was created, but at present, GF-1 multispectral imagery is mostly used for forest stand classification, with few studies and applications in the investigation of forest canopy cover [3], so this paper incorporates multispectral data from the GF-1 PMS sensor for forest canopy cover estimation.Landsat data have been used for forest canopy cover inversion studies since the 1990s.In the inversion of forest canopy cover using optical remote sensing data, the choice of the data source, remote sensing feature factors, and inversion model directly affect the accuracy of forest canopy cover estimation results.At present, most of the data sources used for the inversion of forest canopy cover are Landsat series satellite data, Sentinel-2 data, and some of the high resolution series satellite data, and the remote sensing features extracted in early studies are mainly spectral features.In recent years, textural features have also been increasingly used in the inversion of forest canopy cover [9,10].Parametric models are mostly stepwise regression models and linear models, while non-parametric models include the back-propagation neural network (BPNN), k-nearest neighbor models (KNN), and various deep learning models [11][12][13][14].
Landsat data, which became freely available in 2008, and Sentinel data, which became accessible and freely downloadable in 2014, have made it possible to obtain dynamic spatial and temporal information regarding forest canopy cover, and this has enabled the development of a large number of reliable forest canopy cover inversion models based on these open source data.Although a number of effective remote sensing features and feasible inversion models have been explored for forest canopy cover inversion, there is no systematic summary of the findings regarding the characterization ability of each remote sensing feature for forest canopy cover and the inversion accuracy of each model, and the conclusions are not consistent.In addition, the selection of remote sensing features that can characterize the parameters to be estimated in forest parameter inversion is one of the key steps to improve the accuracy of forest parameter inversion, but there are few studies that consider the selection of remote sensing features in this field.Accumulating remote sensing feature parameters from different existing data sources and ensuring the stability of the model used by adding training samples may be a simple and effective way to improve the inversion of forest canopy cover.In order to make full use of the advantages of free access to the existing Landsat and Sentinel data and the advantages of the high spatial resolution of the high-resolution data, and to clarify the potential and accuracy of each model in such data for forest canopy cover inversion, this paper uses Landsat8 OLI, Sentinel-2A, and GF-1 multispectral data as data sources, combining LiDAR canopy cover as the test data.The remote sensing features, such as spectral information, vegetation index, and texture of the three data sources, were extracted together, and a non-parametric remote sensing feature optimization method was used for remote sensing feature selection.Each was then explored, based on the feasibility of the commonly used non-parametric and automatic feature optimization non-parametric models used in the inversion of typical cold-temperate larch forest canopy covers.The results of the study can provide a reference for the selection of regional-scale forest canopy cover inversion methods using publicly available data sources.

Study Area
The study area is part of the Daxinganling Forest Ecosystem National Field Scientific Observation and Research Station in Genhe, Inner Mongolia, which is the forest ecosystem field research station with the highest latitude in China (120 • 12 -122 • 55 E, 50 • 20 -52 • 30 N, Figure 1).The study area covers an area of 102 km 2 , with an elevation of about 810 m-1116 m.It has a cold-temperate continental monsoon climate, with an average annual temperature of −5.4 • C, extreme minimum temperatures reaching −50 • C, and maximum temperatures of 40 • C. The average annual rainfall can reach 450-550 mm, with most of the rainfall concentrated in July and August, and there is a snowfall period from September to early May of the following year, with an average snowfall of 20-40 cm, accounting for about 12% of the average annual precipitation; the average annual surface evapotranspiration is 800-1200 mm, and the frost-free period is 80 days.The study area is a typical cold-temperate boreal forest, with a forest cover of 75%.The dominant tree species is the Larix gmelinii (Rupr.)Kuzen, which covers 79% of the total observation area.Sentinel-2A, and GF-1 multispectral data as data sources, combining LiDAR canopy cover as the test data.The remote sensing features, such as spectral information, vegetation index, and texture of the three data sources, were extracted together, and a non-parametric remote sensing feature optimization method was used for remote sensing feature selection.Each was then explored, based on the feasibility of the commonly used non-parametric and automatic feature optimization non-parametric models used in the inversion of typical cold-temperate larch forest canopy covers.The results of the study can provide a reference for the selection of regional-scale forest canopy cover inversion methods using publicly available data sources.

Study Area
The study area is part of the Daxinganling Forest Ecosystem National Field Scientific Observation and Research Station in Genhe, Inner Mongolia, which is the forest ecosystem field research station with the highest latitude in China (120°12′-122°55′ E, 50°20′-52°30′ N, Figure 1).The study area covers an area of 102 km², with an elevation of about 810 m-1116 m.It has a cold-temperate continental monsoon climate, with an average annual temperature of −5.4 °C, extreme minimum temperatures reaching −50 °C, and maximum temperatures of 40 °C.The average annual rainfall can reach 450-550 mm, with most of the rainfall concentrated in July and August, and there is a snowfall period from September to early May of the following year, with an average snowfall of 20-40 cm, accounting for about 12% of the average annual precipitation; the average annual surface evapotranspiration is 800-1200 mm, and the frost-free period is 80 days.The study area is a typical cold-temperate boreal forest, with a forest cover of 75%.The dominant tree species is the Larix gmelinii (Rupr.)Kuzen, which covers 79% of the total observation area.(2) Sentinel-2A data acquisition and pre-processing Sentinel-2A is a satellite launched by the European Space Agency in 2015 that provides complete image coverage of the Earth's equatorial region every 5 days.In this study, Sentinel-2A data imaged on 28 August 2016 were downloaded from the Copernicus Science Data Center (https://scihub.copernicus.eu/,accessed on 28 August 2016), with data exhibiting greater than 20% cloudiness filtered out, and data with less than 20% cloudiness de-clouded by a quality check band.Atmospheric correction was performed using Sen2Cor to obtain the 2A product, topographic correction was performed using the C model, and resampling was performed in SNAP using Sen2Res to synthesize the 20 m, 60 m bands into a 10 m band, allowing all 12 bands to be used for forest parameter extraction and modeling.

Data and Pre-Processing
(3) GF-1 data acquisition and pre-processing In 2013, China's high-resolution earth observation satellite system's major special task was successfully carried out, and China's independently developed domestic satellite Gaofen-1 (GF-1) was successfully launched, carrying two panchromatic and multispectral sensors (PMS) and four wide field-of-view (WFV) sensors, of which the PMS has a spatial resolution of 8 m, and the WFV sensor has a spatial resolution of 16 m; the acquired panchromatic images have a spatial resolution of 2 m [3].In this paper, multispectral data acquired by the GF-1 PMS sensor were used, with images acquired on 5 September 2015 at a spatial resolution of 8 m, and panchromatic images acquired at a spatial resolution of 2 m.GF-1 data pre-processing consists of three steps: radiometric calibration, atmospheric correction, and orthorectification.Orthorectification corrects images with geometric distortions to remote sensing images with multiple central projection planes.In addition to correcting normally occurring geometric distortions, orthorectification also removes geometric errors in images caused by topographic distortions [15].Radiometric calibration and FLAASH atmospheric correction of the GF-1 data was completed in ENVI 5.3, followed by orthocorrection using the RPC file that accompanied the raw data and the DEM data.
(4) ASTER GDEM data acquisition and pre-processing The DEM data were downloaded from the Geospatial Cloud (http://www.gscloud.cn,accessed on 1 January 2009), and the data were spatially resolved to 30 m ASTER GDEM remote sensing image data.In the paper, the DEM data were obtained by bilinear interpolation in ArcGIS 10.5, sampled at the same spatial resolution as that used in Land-sat8, Sentinel-2A, and GF-1, respectively, and the slope and slope direction information were extracted.

Airborne LiDAR Data Acquisition and Pre-Processing
The manned airborne LiDAR data for this study were acquired in August and September 2012.The airborne data were obtained with the Yun-5 as the platform, loaded with a Leica airborne radar system, and the laser scanning instrument was an ALS60.A total of 32 strips of airborne data were obtained for this flight, with a scanning angle of ±35 • , a coverage area of 213 km 2 , and an average density of the laser point cloud data obtained of approximately 5.6 points/m 2 .According to studies, high-density UAV LiDAR point cloud data can be directly used for forest canopy cover ground truth points [5,6].In this paper, the LiDAR360 software (version 5.4.3.0) was used to first obtain the canopy height model (CHM), extract the forest canopy cover based on the small size of the sample plots (the sample plots were uniformly 40 m × 40 m), and then to combine the latitude and longitude coordinates of 55 sample plots surveyed in 2012 and 2013, respectively.The LiDAR canopy cover data corresponding to the sample squares were extracted as validation samples for the canopy cover inversions used in the text (Figure 2).

Feature Extraction
In this paper, we extracted the remote sensing features commonly used in the current remote sensing inversions of densities, specifically spectral features, textural features, and statistical features, after tassel cap transformation.In addition, we extracted two topographic features, slope and aspect, which have an impact on forest canopy cover.
More commonly applied to spectral features are vegetation indices, which in this paper, specifically include the normalized vegetation index (NDVI), the transformation of the vegetation index (TNDVI), the square root of the ratio vegetation index RVI (SQRT(IR/R)), and the difference vegetation index (DVI) [16].NDVI is an important indicator of biomass, canopy cover, leaf area index, and vegetation monitoring; TNDVI has overcome the obstacle of reduced sensitivity of NDVI when vegetation cover is low or high, and can better reflect the comprehensive condition of vegetation.SQRT (IR/R) has a significant effect on the difference in the spectral response of green plants, and there is an inversion relationship, which can indicate the difference between the two bands for the degree of difference in the reflectance; DVI is suitable for monitoring early to mid-vegetation development, or when the vegetation canopy cover is low to medium [17].All four vegetation indices were obtained using the band calculator in ENVI 5.3, and their specific formulae are shown in Table 1.The texture features were extracted through a grey-scale co-occurrence matrix, with a total of eight basic features: mean (MEA), variance (VAR), contrast (CON), dissimilarity

Methodology 2.3.1. Feature Extraction
In this paper, we extracted the remote sensing features commonly used in the current remote sensing inversions of densities, specifically spectral features, textural features, and statistical features, after tassel cap transformation.In addition, we extracted two topographic features, slope and aspect, which have an impact on forest canopy cover.
More commonly applied to spectral features are vegetation indices, which in this paper, specifically include the normalized vegetation index (NDVI), the transformation of the vegetation index (TNDVI), the square root of the ratio vegetation index RVI (SQRT(IR/R)), and the difference vegetation index (DVI) [16].NDVI is an important indicator of biomass, canopy cover, leaf area index, and vegetation monitoring; TNDVI has overcome the obstacle of reduced sensitivity of NDVI when vegetation cover is low or high, and can better reflect the comprehensive condition of vegetation.SQRT (IR/R) has a significant effect on the difference in the spectral response of green plants, and there is an inversion relationship, which can indicate the difference between the two bands for the degree of difference in the reflectance; DVI is suitable for monitoring early to mid-vegetation development, or when the vegetation canopy cover is low to medium [17].All four vegetation indices were obtained using the band calculator in ENVI 5.3, and their specific formulae are shown in Table 1.

Vegetation Index
Formulas The texture features were extracted through a grey-scale co-occurrence matrix, with a total of eight basic features: mean (MEA), variance (VAR), contrast (CON), dissimilarity (DIS), second order moment (SM), homogeneity (HOM), correlation (COR), and entropy (ENT) [10].In this paper Landsat 8 data and Sentinel-2A data were used to extract features in a 3 × 3 window, and due to the higher spatial resolution of the GF-1 data, texture features were extracted in a 5 × 5 window.The ENVI 5.3 is capable of generating a grey-Forests 2023, 14, 1527 6 of 17 scale co-occurrence matrix using the following formula for texture features, as shown Table 2.

Texture Features
Formulas Implication Degree of deviation of image elements from the mean Measuring the local grey uniformity of an image The degree of change in the local element value of the image Uniformity and coarseness of distribution of image grey values

Linearity of the image in grey scale
The tasseled cap (Kanth-Th-milstransform, K-T transform) transformation refers to the linear transformation of a remotely sensed image with multiple bands to obtain several meaningful components.Three statistical features, soil brightness, greenness, and wetness, were extracted using the K-T transform [18].It is available through the ENVI 5.3 out-of-thebox package.

Feature Preference Methods
In this paper, we use random forest feature optimization to perform feature optimization using random forest (RF), support vector machine regression (SVR), and k-nearest neighbor (KNN).RF is a machine learning algorithm proposed by Breiman and Adele in 2001 to perform classification, regression, and survival analysis [2].Remote sensing feature selection using RF is actually conducted by ranking the importance of features, i.e., using the MSE increment (%IncMSE) and the node purity of the model tree (IncNodePurity) to determine the importance of the ranking of features, then removing the features with low importance and rebuilding the model, repeating this step, and finally selecting the feature with the lowest model error as the preferred feature, which is then substituted into the RF, SVR, and KNN models, respectively, for forest canopy cover estimation.
The KNN method is not required to follow existing function distributions and is better suited for feature fusion in multi-modal remote sensing and the estimation of missing values, mostly for the quantitative estimation of forest parameters.The KNN-FIFS algorithm used in this paper is an inversion model based on the K-NN algorithm created by Han Zongtao et al. to optimize the remote sensing features by an iterative method and to improve the accuracy of forest parameter inversion by optimizing the combination of feature factors for screening [19].Based on the existing KNN method, a forward search feature selection algorithm is used to optimally reorganize the remote sensing feature factors to improve the estimation accuracy.During the feature selection process, the k-value is continuously changed to obtain the estimation model and its corresponding root mean square error (RMSE).The validation method is the leave-one-out (LOO) method, in which the minimum root mean square error is RMSEb, and the combination of features and their optimization is completed.

Forest Inversion Model for Canopy Cover
The inversion models used in this paper include the non-parametric models random forest (RF), support vector regression (SVR), k-nearest neighbor (KNN), and k-nearest neighbor with fast iterative features selection (KNN-FIFS), which allow for automatic remote sensing feature selections and combinations.
RF boasts the advantages of being efficient in handling large datasets, being unaffected by anomalous data, not required to satisfy a priori data distributions, and being robust in inversion problems; it is also widely used in forest parameter inversion.RF for regression problems consists of three main steps, i.e., first sampling using the bootstrap method, then building a regression tree; and finally, aggregating multiple regression trees into RF and making predictions to be valued [2, 20,21].SVR originates from the regression module in support vector machines (SVM), which essentially establishes a high-dimensional optimal plane that minimizes the total deviation of all samples from the hyperplane, thus enabling the estimation of regression problems [22].The core concept of the KNN method is to find the closest k-samples of the example to be measured in a feature space using a distance measurement and to categorize this sample into k-samples.The KNN method does not need to follow the existing function distribution, and it is widely used for the quantitative estimation of forest parameters, as it offers good results regarding the feature fusion of multi-modal remote sensing and the estimation of missing values.However, when the dimensionality of the input remote sensing features is high, the model prediction efficiency and accuracy are reduced [23].To overcome this drawback, we proposed the KNN method in 2018, based on fast iterative feature selection (KNN-FIFS), and applied it to the inversion of forest above-ground biomass.KNN-FIFS uses an iterative method to optimize remote sensing features based on the KNN algorithm and improves the accuracy and efficiency of forest parameter inversion by optimizing the combination of feature factors for screening [19].
This paper implements a remote sensing estimation model based on RF and SVR for estimating forest cover in R language.In the experiment, the number of decision trees (ntree) is set to 2500, and the number of mtry features is one-third of the number of remote sensing features; while in the SVR model, four kernel functions, linear, polynomial, sigmoid, and radius basis function (RBF), are used.In the linear function, the penalty parameter (C) is 0.5 and the epsilon is 0.40625; in polynomial, the penalty parameter (C) is 0.5, the degree is 2, the gamma is 0.02, and the epsilon is 0.96875; in sigmoid, the penalty parameter (C) is 0.5, the gamma is 0.02, and the epsilon is 1; in radius, the penalty parameter (C) is 2.5, the gamma is 0.02, and the epsilon is 0.03125.The KNN method is implemented in MATLAB and is based on Euclidean distance, with a range of k-values from 1 to 15.The KNN-FIFS method is cross-validated by the LOO method, i.e., each validation process uses n − 1 samples (n is the number of original samples) for model training, which excludes the random error caused by the allocation of the training/testing samples during the experiment, thus ensuring the stability of the KNN-FIFS feature selection results and making the obtained estimation results more reliable, in which the k-values are 1~11, and the window sizes are 1~11.

Accuracy Evaluation Methods
In this paper, four methods, RF, SVM, KNN, and KNN-FIFS, are used to estimate the accuracy of the forest canopy cover estimation using three evaluation indicators: the coefficient of determination (R 2 ), the root mean square error (RMSE), and the relative error (RMSEr).R 2 is a measure of model accuracy, representing the degree of similarity between the predicted and measured canopy cover, ranging from 0 to 1.The larger the value, the higher the model accuracy; RMSE and RMSEr represent the degree of difference between the predicted and the measured canopy cover, and the smaller the value, the more accurate the model.The equations for R 2 , RMSE, and RMSEr are Equations ( 1), (2), and (3), respectively.
where Yi is the measured canopy cover of the i sample plot, y i is the estimated canopy cover of the i sample plot, y is the mean of the measured canopy cover, and ŷ is the mean of the estimated canopy cover.

Results
In this paper, four methods, namely RF, SVR, KNN, and KNN-FIFS, are used to build the forest canopy cover inversion model.Different feature screening methods are used for different models to filter the feature variables: for the non-parametric estimation method, the random forest algorithm is used to rank and filter the importance of the feature variables, and the KNN-FIFS algorithm provides parameter optimization and combination, and it is modeled with the above four machine learning methods to yield the forest canopy cover inversion model.
By comparing the accuracy and error of different forest canopy cover modeling methods and the predicted values of the forest cover for each sample, the most suitable method to invert the cover of the study area was selected for forest canopy cover mapping.Finally, in this paper, the results of the different remote sensing images are compared with each other to analyze the advantages and disadvantages of the different optical remote sensing images.

Landsat8 OLI Data Results and Comparative Analysis
In this paper, a total of 75 feature variables such as spectral features and textural features were extracted using the first seven bands of Landsat8 OLI data.Table 3 shows the accuracy and error evaluation of the four canopy cover estimation models for the Landsat8 OLI data, and it can be seen that KNN-FIFS shows better estimation accuracy than do RF, SVR, and KNN.The four methods, RF, SVR, and KNN, were involved in modeling after filtering the variables by random forest features, R 2 = 0.59, RMSE = 0.12, and RMSEr = 17.81 for the RF model; R 2 = 0.20, RMSE = 0.16, and RMSEr = 20.32 for the SVR model; R 2 = 0.15 and RMSE = 0.16 and RMSEr = 21.28 for the KNN model; and RMSEr = 21.28,R 2 = 0.60, RMSE = 0.11, and RMSEr = 14.64 for the KNN-FIFS model; all three evaluation metrics are higher than those of RF, SVR and KNN. Figure 3 shows the scatter plots of the accuracy validation of the four models for estimating forest canopy cover for Landsat8 OLI data.(a), (b), (c), and (d) are the scatter plots of the estimated cover values and the measured cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that among the four machine learning methods, SVR and KNN are poorly fitted and the scatter plots are more discrete.The RF and KNN-FIFS 0.60 0.11 14.64 Figure 3 shows the scatter plots of the accuracy validation of the four models for estimating forest canopy cover for Landsat8 OLI data.(a), (b), (c), and (d) are the scatter plots of the estimated cover values and the measured cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that among the four machine learning methods, SVR and KNN are poorly fitted and the scatter plots are more discrete.The RF and KNN-FIFIS methods are better fitted and KNN-FIFS exhibits the highest inversion accuracy under comparison.All four machine learning methods show high value underestimation and low value overestimation, and the KNN-FIFS estimation accuracy is the best under the comprehensive comparison of the four models.

Sentinel-2A Data Results and Comparative Analysis
In this paper, a total of 120 feature variables, such as spectral features and textural features, were extracted using 12 bands from the Sentinei-2A data.Table 4 shows the accuracy and error evaluation of the four canopy cover estimation models for Sentinel-2A data.It can be seen that KNN-FIFS shows better estimation accuracy than do RF, SVR, and KNN, and the RF, SVR, and KNN methods are involved in modeling after random forest feature screening, with R 2 = 0.38, RMSE = 0.

Sentinel-2A Data Results and Comparative Analysis
In this paper, a total of 120 feature variables, such as spectral features and textural features, were extracted using 12 bands from the Sentinei-2A data.Table 4 shows the accuracy and error evaluation of the four canopy cover estimation models for Sentinel-2A data.It can be seen that KNN-FIFS shows better estimation accuracy than do RF, SVR, and KNN, and the RF, SVR, and KNN methods are involved in modeling after random forest feature screening, with R 2 = 0.38, RMSE = 0.   has high values of underestimation, and the SVR and KNN generally have low values of overestimation.Comparing these three machine learning methods, the KNN-FIFIS method exhibits the best fitting effect and the highest inversion accuracy.Moreover, the overall effect of canopy cover inversion of the RF, SVR and KNN models is discrete and poorly fitted, while the KNN-FIFS fit is aggregated, and the phenomenon of high-value underestimation and low-value overestimation is not prominent.

KNN
0.18 0.16 20.30KNN-FIFS 0.69 0.09 13.26 Figure 4 shows the accuracy validation scatter plots of the four models for estimating the forest canopy cover for Sentinel-2A data.(a), (b), (c), and (d) are scatter plots of the estimated canopy cover values versus the measured canopy cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that the RF method generally has high values of underestimation, and the SVR and KNN generally have low values of overestimation.Comparing these three machine learning methods, the KNN-FIFIS method exhibits the best fitting effect and the highest inversion accuracy.Moreover, the overall effect of canopy cover inversion of the RF, SVR and KNN models is discrete and poorly fitted, while the KNN-FIFS fit is aggregated, and the phenomenon of high-value underestimation and low-value overestimation is not prominent.

GF-1 PMS Data Results and Comparative Analysis
In this paper, a total of 45 feature variables, such as spectral features and textural features, were extracted using four bands from the GF-1 PMS data.Table 5 shows the accuracy and error evaluation of the four canopy cover estimation models for GF-1 PMS data.It can be seen that KNN-FIFS shows better estimation accuracy than do RF, SVR, and

GF-1 PMS Data Results and Comparative Analysis
In this paper, a total of 45 feature variables, such as spectral features and textural features, were extracted using four bands from the GF-1 PMS data.Table 5 shows the accuracy and error evaluation of the four canopy cover estimation models for GF-1 PMS data.It can be seen that KNN-FIFS shows better estimation accuracy than do RF, SVR, and KNN, and the RF, SVR, and KNN methods are involved in modeling after random forest feature screening, with R 2 = 0.32, RMSE = 0.15, and RMSEr = 18.93 for the RF model; the SVR model showed R 2 = 0.43, RMSE = 0.13, and RMSEr = 16.92; the KNN model yielded R 2 = 0.34, RMSE = 0.14, and RMSEr = 18.66; and the KNN-FIFS model revealed R 2 = 0.55, RMSE = 0.12, and RMSEr = 15.04.Among them, the KNN-FIFS model exhibited the highest estimation highest precision.Figure 5 shows the scatter plots of the accuracy validation of the four models for estimating the forest canopy cover for GF-1 data.(a), (b), (c), and (d) are scatter plots of the estimated canopy cover values versus the measured canopy cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that the RF method generally has high values of underestimation, and the SVR and KNN generally have low values of overestimation.Comparing these three machine learning methods, the KNN-FIFIS method exhibits the best fitting effect and the highest inversion accuracy.Moreover, the overall effect of the canopy cover inversion of the RF, SVR, and KNN models is discrete and poorly fitted, while the KNN-FIFS fit is aggregated and the phenomenon of high-value underestimation and low-value overestimation is not prominent.Figure 5 shows the scatter plots of the accuracy validation of the four models for estimating the forest canopy cover for GF-1 data.(a), (b), (c), and (d) are scatter plots of the estimated canopy cover values versus the measured canopy cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that the RF method generally has high values of underestimation, and the SVR and KNN generally have low values of overestimation.Comparing these three machine learning methods, the KNN-FIFIS method exhibits the best fitting effect and the highest inversion accuracy.Moreover, the overall effect of the canopy cover inversion of the RF, SVR, and KNN models is discrete and poorly fitted, while the KNN-FIFS fit is aggregated and the phenomenon of highvalue underestimation and low-value overestimation is not prominent.

Results and Comparative Analysis of the Three Data Features Combined
Based on 75 feature parameters extracted from Landsat8 images, 120 feature parameters extracted from Sentinel-2A images, and 45 feature parameters extracted from GF-1 PMS images, a total of 240 feature variables were combined and then evaluated for accuracy in forest canopy cover modeling using each of the four machine learning methods mentioned above.Table 6 shows the accuracy and error evaluation of the four canopy cover estimation models under the combination of the three data features.Of these, KNN-FIFS remains the method with the best estimation accuracy, with the RF, SVR, and KNN methods participating in the modeling after random forest feature screening, with R 2 = 0.33, RMSE = 0.14, RMSEr = 18.52 for the RF model; R 2 = 0.53, RMSE = 0.12, and RMSEr = 15.65 for the SVR model; R 2 = 0.50, RMSE = 0.13, and RMSEr = 16.44 for the KNN model; and R 2 = 0.82, RMSE = 0.08, and RMSEr = 10.40 for the KNN-FIFS model, where the KNN-FIFS model has the highest estimation accuracy and is more precise than the KNN-FIFS accuracy of the three single multispectral image inversions.Figure 6 shows scatter plots of the accuracy validation of the four models for estimating forest canopy cover after the combination of the three data features.(a), (b), (c), and (d) are scatter plots of the estimated versus measured canopy cover values for RF, SVR, KNN, and KNN-FIFS, correspondingly.The combination of the three data features shows that the results are generally better than those of the estimation model built from single multispectral data, and the model accuracy is improved based on more feature variables involved in the optimization and modeling of the optimal model KNN-FIFS, which is more suitable for forest canopy cover estimation.Meanwhile, the accuracy of the other three machine learning methods was also improved.
methods participating in the modeling after random forest feature screening, with R 2 = 0.33, RMSE = 0.14, RMSEr = 18.52 for the RF model; R 2 = 0.53, RMSE = 0.12, and RMSEr = 15.65 for the SVR model; R 2 = 0.50, RMSE = 0.13, and RMSEr = 16.44 for the KNN model; and R 2 = 0.82, RMSE = 0.08, and RMSEr = 10.40 for the KNN-FIFS model, where the KNN-FIFS model has the highest estimation accuracy and is more precise than the KNN-FIFS accuracy of the three single multispectral image inversions.

Optimal Model Inversion of Canopy Cover and Comparative Analysis
Based on Landsat8 OLI, Sentinel-2A, and GF-1, three different multispectral remote sensing data sources were used to screen and combine feature variables.Comparing the four modeling methods, both the single data inversion and the joint inversion of features from multiple sources of data resulted in the best results for the KNN-FIFS method and the best inversion of the forest canopy cover.Table 7 shows the combinations of features screened under the KNN-FIFS method for both single and joint inversions of the three datasets.From the table, it can be seen that texture features are involved in modeling inversions, whether from a single data source or a combination of multiple data sources, which shows their importance in forest densities estimation.The NIR band and the rededge band (B5 for OLI, Band 6 for 2A, and B3 and B4 for GF-1) of all three data sources are Forests 2023, 14, 1527 13 of 17 involved in the modeling inversion, indicating that these two bands significantly reflect vegetation and provide a better response to forest structure.As can be seen from Table 7, in the KNN-FIFS method, the dependent variable of the test set is predicted, based on a weighted average of the k-points that are closest to the measured value of the independent variable and the measured value of the independent variable in the training set, so the k-value means the number of closest sample-k sites.Therefore, the k-value remains basically constant, with a k-value of 2 or 3.However, the window size is related to the sample size and the spatial resolution of the image, and since the sample size is uniform with the LiDAR canopy cover image element, the different spatial resolutions of the image will result in different window sizes being implemented.
The features chosen vary for different data sources.For the Landsat 8 OLI data, the greenness, brightness, and shortwave infrared band (SWIR1) generated by the tassel cap transformation have an important influence on the inversion of the canopy cover.For the Sentinel-2A data, the red-edge band plays an important role in the inversion of the canopy cover.For the GF-1 data, the red band plays an important role in the inversion of the canopy cover, as it is in the chlorophyll absorption range.In the combination of the three data features, the different feature parameters of the three datasets are used simultaneously to improve the accuracy of the inversion of the canopy cover.Textural features were involved in the inversion in a variety of data, demonstrating the indispensable role of textural features in the inversion of lushness.
Figure 7 shows the inverse forest canopy cover mapping for the four data types.The best inversion results were obtained with the combined data.In particular, the Landsat8 OLI data overestimated the canopy cover of the main forest roads and underestimated the canopy cover of the deeper forest areas at higher elevations.The inversion results based on Sentinel-2A data can basically reflect the general situation of the main forest area, but there is still some underestimation of forest canopy cover; the GF-1 data can well reflect the outline of the main forest area, but the overall result of the canopy cover estimation is low, which cannot reflect the high canopy cover at a high altitude; the inversion results based on the combination of the three data sources are the best, which can basically reflect the distribution of the forest canopy cover in the study area.However, there is a certain amount of overfitting phenomenon, overestimating the canopy cover of the main forest area, when combining the advantages of the three kinds of data, greatly reducing the phenomenon of overestimation of low values and underestimation of high values, basically reflecting the more realistic distribution of forest canopy cover in the study area.
From the joint data forest canopy cover mapping with the best inversions, it can be seen that the forest area with a canopy cover in the range of 0 to 0.2 represents approximately 5% of the study area; the forest area with a canopy cover in the range of 0.2 to 0.4 represents approximately 10% of the study area; the forest area with a canopy cover in the range of 0.4 to 0.6 represents approximately 40% of the study area; the forest area with a canopy cover in the range of 0.6 to 0.8 represents approximately 20% of the study area; and the forest area with a canopy cover in the range of 0.8 to 1.0 represents approximately 25% of the study area.The forest areas with densities between 0.8 and 1.0 account for about 25% of the study.It can be concluded that the forest canopy cover in this study is relatively luxuriant, and the ecological environment in the deep forest areas is good, except for the main forest area, which is vulnerable to low densities due to human activities.
canopy cover in the range of 0.6 to 0.8 represents approximately 20% of the study area; and the forest area with a canopy cover in the range of 0.8 to 1.0 represents approximately 25% of the study area.The forest areas with densities between 0.8 and 1.0 account for about 25% of the study.It can be concluded that the forest canopy cover in this study is relatively luxuriant, and the ecological environment in the deep forest areas is good, except for the main forest area, which is vulnerable to low densities due to human activities.

Discussion
The results show that the potential of three types of multispectral remote sensing data to estimate forest canopy cover using the same remote sensing parameter extraction method has advantages and disadvantages, but the level of accuracy is significantly influenced by the image sensor and spatial resolution.The inversion results for the Sentinel-2A data were slightly better than for the Landsat8 OLI data according to single data source forest canopy cover mapping, in the same way that Korhonen et al. [24] used a generalized summation model and concluded that the Sentinel data model provided slightly better results than did the Landsat8 OLI model.The red-edge bands are sensitive to the growth of green vegetation [25], as can be seen in Table 5, where the red-edge band (bands 5, 6, and 7), the near-infrared band (band 8), and the short-wave infrared band (band 11) are selected several times in the combination of features in the Sentinel-2A data.Hua and Zhao [26] used red-edge bands based on Sentinel-2 satellite images to estimate FCC; the results showed that red-edge bands can effectively improve the accuracy of FCC estimation models for different FCC classes, which is consistent with the conclusions reached in this paper.The importance of the texture feature factor in the construction of the forest parameter model is greater than the influence of the band information and vegetation index factor, whether for a single data source or for combined multi-source data, which is roughly the same as the conclusion reached by Pan et al. [27].For all four models developed for the three datasets, there is some overestimation of low canopy cover and overestimation of high canopy cover values, which is a common and unavoidable problem with current remote sensing inversions of forest parameters [16].Karlson et al. [28] used spectral and textural features of WorldView-2 and Landsat-8 images to invert the forest canopy cover, and according to their findings, the most salient features in forest canopy cover mapping were the grey scale coevolution matrix (GLCM) and NDVI, and in contrast, the most significant results in this paper are for the tassel cap transformation and textural features.Wu [18] concluded that, using Landsat8 OLI estimation, band 7 and the TNDVI made the highest contribution to forest canopy cover.In this paper, the inversion of Landsat8 OLI data showed the frequent occurrence of the three components of greenness, wetness, and brightness of the tassel cap transformation, which is different from Wu's conclusion.It is possible that there are structural differences between the study area in Genhe City, Inner Mongolia Province, and Wu's study area in Sandwich City, Fujian Province, in terms of north-south forest stands.However, similar to the study of Liu [29], who used Landsat TM data to extract the tasseled cap transform component for the estimation of canopy cover, the results of the above three researchers were compared with those in this study to show that the tasseled cap transform occupies a more important position in the inversion of forest canopy cover and is a reliable inversion feature for Landsat 8 data.All four data sources in this paper achieved good accuracy in estimating forest canopy cover, which may be related to the fact that the study area of this paper is located in the Daxinganling Ecological Station, where there are relatively few human activities, and the forest stand structure has certain advantages compared with the complex understory structure in the southern tropics; In addition, the measured canopy cover values were used to construct a CHM of high-precision LiDAR point cloud data as the model validation data, which indirectly reduces the human error and can improve the model accuracy, to a certain extent [16].Combining multi-source optical data can effectively improve the inversion accuracy of forest canopy cover, which is consistent with the findings of Popescu [30], Brovkina [31], and Hyde [32].The study proved that a multi-source data inversion of forest parameters is more effective than using a single data source, and this method can effectively improve the inversion accuracy of forest parameters.

Conclusions
In this paper, using multispectral remote sensing data from Landsat8 OLI, Sentinel-2A, and GF-1, a large number of remote sensing features extracted from their spectral features, textural features, topographic features, color transformations, and tassel cap transformations were filtered, and then the optimal combination of non-parametric methods for modeling forest canopy cover was used to compare the advantages and disadvantages of different modeling methods for estimating canopy cover from two data sources, and to analyze and compare the potential of three satellite datasets for estimating forest canopy cover.It has been shown that parameter selection can filter out most irrelevant variables and improve the accuracy of the corresponding model.The highest accuracy and relative stability was obtained for the KNN-FIFS estimation of forest canopy cover based on different data sources; RF and SVR show average inversion accuracy in this study due to the large number of training samples required; KNN does not yield as satisfactory results as does the KNN-FIFIS improved feature preference under random forest preference; GF-1 has the highest spatial resolution, but with fewer bands and some outliers removed from the extraction of the textural features, resulting in ineffective results and the subsequent consideration of improved methods of extracting textural features.However, the spatial resolution of the obtained DEM data is only 30 m, which does not well reflect the distribution of forest canopy cover at different elevations, and the combination of higher accuracy DEM data with the inversion or the use of higher accuracy LiDAR data to obtain DEM data is subsequently considered.As the experiment was conducted in the northern study area only, the generalizability of the findings to forest canopy cover estimation requires further research.Accurate and appropriate canopy classification requires sufficient training and

Figure 1 .
Figure 1.Geographical location of the study area of the Daxinganling Ecological Station.

2.2. 1 .
Optical Remote Sensing Data, DEM Data Acquisition, and Pre-Processing (1) Landsat8 OLI data acquisition and pre-processing The optical remote sensing data acquired in this paper are Landsat8 OLI data imaged on 19 October 2013, without significant cloud cover, with airband number 123/24, and image reference information obtained from transverse Mercator projection (UTM 51° N belt) and WGS-84 ellipsoid coordinates, downloaded for free through the Google Earth

Figure 1 .
Figure 1.Geographical location of the study area of the Daxinganling Ecological Station.

2. 2 .
Data and Pre-Processing 2.2.1.Optical Remote Sensing Data, DEM Data Acquisition, and Pre-Processing (1) Landsat8 OLI data acquisition and pre-processing The optical remote sensing data acquired in this paper are Landsat8 OLI data imaged on 19 October 2013, without significant cloud cover, with airband number 123/24, and image reference information obtained from transverse Mercator projection (UTM 51 • N belt) and WGS-84 ellipsoid coordinates, downloaded for free through the Google Earth Engine platform.The data acquired in this paper are Level 1T products, geometrically corrected, and after radiometric calibration and FLASSH atmospheric correction in The Environment for Visualizing Images (version 5.3) software, the first seven bands were mined for Forests 2023, 14, 1527 4 of 17 remote sensing parameter extraction by Gram-Schmidt pan sharpening after fusing the panchromatic bands (B8).

Figure 2 .
Figure 2. LiDAR canopy cover of the Daxinganling Ecological Station.

Figure 2 .
Figure 2. LiDAR canopy cover of the Daxinganling Ecological Station.

Forests
methods are better fitted and KNN-FIFS exhibits the highest inversion accuracy under comparison.All four machine learning methods show high value underestimation and low value overestimation, and the KNN-FIFS estimation accuracy is the best under the comprehensive comparison of the four models.

Figure 3 .
Figure 3. Scatter plot of the accuracy verification results of Landsat8 OLI: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 3 .
Figure 3. Scatter plot of the accuracy verification results of Landsat8 OLI: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 4
Figure4shows the accuracy validation scatter plots of the four models for estimating the forest canopy cover for Sentinel-2A data.(a), (b), (c), and (d) are scatter plots of the estimated canopy cover values versus the measured canopy cover values for the RF, SVR, KNN, and KNN-FIFS methods, respectively.It can be seen that the RF method generally

Figure 4 .
Figure 4. Scatter plot of the accuracy verification results of Sentinel-2A: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 4 .
Figure 4. Scatter plot of the accuracy verification results of Sentinel-2A: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 5 .
Figure 5. Scatter plot of the accuracy verification results of GF-1: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 5 .
Figure 5. Scatter plot of the accuracy verification results of GF-1: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 6
Figure6shows scatter plots of the accuracy validation of the four models for estimating forest canopy cover after the combination of the three data features.(a), (b), (c), and (d) are scatter plots of the estimated versus measured canopy cover values for RF, SVR, KNN, and KNN-FIFS, correspondingly.The combination of the three data features shows that the results are generally better than those of the estimation model built from single multispectral data, and the model accuracy is improved based on more feature variables involved in the optimization and modeling of the optimal model KNN-FIFS, which is more suitable for forest canopy cover estimation.Meanwhile, the accuracy of the other three machine learning methods was also improved.

Figure 6 .
Figure 6.Scatter plot of the accuracy verification results of three data combinations: (a) estimated canopy cover of RF; (b) estimated canopy cover of SVR; (c) estimated canopy cover of KNN; (d) estimated canopy cover of KNN-FIFS.

Figure 7 .
Figure 7. Canopy cover map of Daxinganling Ecological Station: (a) canopy cover map from Land-sat8 OLI; (b) canopy cover map from Sentinel-2A; (c) canopy cover map from GF-1; (d) canopy cover map showing data combinations.

Figure 7 .
Figure 7. Canopy cover map of Daxinganling Ecological Station: (a) canopy cover map from Landsat8 OLI; (b) canopy cover map from Sentinel-2A; (c) canopy cover map from GF-1; (d) canopy cover map showing data combinations.

Table 2 .
Texture feature calculation equation and implication.

Table 3 .
Four estimate model accuracy evaluations of Landsat8 OLI.

Table 4 .
Four estimate model accuracy evaluations of Sentinel-2A.

Table 5 .
Four estimate model accuracy evaluations of GF-1.

Table 6 .
Four estimate model accuracy evaluations of three data combinations.

Table 6 .
Four estimate model accuracy evaluations of three data combinations.