Tree Species (Genera) Identification with GF-1 Time-series in A Forested Landscape, Northeast China

: Forests are the most important component of terrestrial ecosystem; the accurate mapping of tree species is helpful for the management of forestry resources. Moderate- and high-resolution multispectral images have been commonly utilized to identify regional tree species in forest ecosystem, but the accuracy of recognition is still unsatisfactory. To enhance the forest mapping accuracy, this study integrated the land surface phenological metrics and text features of forest canopy on tree species identification based on Gaofen-1 (GF-1) wide field of view (WFV) and time-series images (36 10-day NDVI data), conducted at a forested landscape in Harqin Banner, Northeast China in 2017. The dominant tree species include Pinus tabulaeformis , Larix gmelinii , Populus davidiana , Betula platyphylla , and Quercus mongolica in the study region. The result of forest mapping derived from a 10-day dataset was also compared with the outcome based upon a commonly utilized 30-day dataset in tree species identification. The results indicate that tree species identification accuracy is significantly (p < 0.05) improved with higher temporal resolution (10-day, 79.4%) of images than commonly used monthly data (30-day, 76.14%), and the accuracy can be further increased to 85.13% with a combination of the information derived from principal component analysis (PCA) transformation, phenological metrics (standing for the information of growing season) and texture features. The integration of higher dimensional NDVI data, vegetation growth dynamics and feature of canopy simultaneously will be beneficial to map tree species at the landscape scale.


Introduction
Forest is one of the most important components of the global terrestrial ecosystem, which covers about one third of the earth's land surface [1]. An accurate understanding the patterns of tree species is required to reasonably utilize and manage regional forest resources [2,3]. Remotely sensed data are considered an effective method to identify tree species or forest types at broader scale [4,5], and the higher resolution of satellite images can avoid the spectral distortion of vegetation caused by mixed pixels and is helpful to identify forest patterns [6][7][8]. Compared to airborne hyperspectral and light detection and ranging (LiDAR) data, the multispectral satellite images with higher

Data from in-Situ Forest Inventory
There were, in total, 1650 plots (30×30 m) set up during the period 2017-2018, considering various forest stand ages, coverages and density for dominant tree species across the study region, with approximately 330 plots for each tree species (Figure 1c). Field surveys of tree species identification were carried out in late September 2017 and early July 2018, respectively, and each of the sampled dominant tree species were geo-located separately with high-precision GPS, 1 m in precision (Mobile Mapper 6, Magellan Corporation, USA). The stand density in plots ranged from 650 to 2750 ha -1 , and the top of forest canopy and vertical structure of five dominant tree species were delineated as in Figure 2. The field data served as input for classification in remotely sensed data analysis, randomly divided into training (30%) and validation (70%) data to reduce random errors, as suggested in many related studies [38][39][40][41][42].

Remotely Sensed Data and Pre-processing
Chinese GF-1 satellite is equipped with four WFV multispectral sensors (blue: 0.45-0.52 μm, green: 0.52-0.59 μm, red: 0.63-0.69 μm, and near-infrared: 0.77-0.89 μm), which can provide images with 800 km width, 10-day in temporal scale and 16 m in spatial resolution. A total of 36 WFV images covering the study area were used in this study; there were 33 images acquired in 2017 while the other three images are replaced by the concurrent data in 2016 due to atmospheric and cloud contamination (12%-35% of cloud coverage) ( Table 1 and Figure 3). All GF-1 data were obtained from the China center for resources satellite data and application (CCRSDA; http:// www.cresda.com/CN/).  1  20170110  13  20170507  25  20170921  2  20170118  14  20170520  26  20170930  3  20170202  15  20170529  27  20171014  4  20170211  16  20170610  28  20161019  5  20170219  17  20170626  29  20161023  6  20170228  18  20170708  30  20171103  7  20170308  19  20170717  31  20171116  8  20170312  20  20170810  32  20171124  9  20170325  21  20170822  33  20171202  10  20170409  22  20170830  34  20171215  11  20160422  23  20170907  35  20171223  12  20170430  24  20170912  36 20171231 First, all GF-1 images were orthographic corrected based on 30 m DEM with 30 ground control points recorded in the field which were geo-located using a global positioning system with an error less than 1 m in precision. We then transformed the images' digital value (DN) into the reflectance by radiometric calibration, including an illumination and atmospheric correction model (IACM) atmospheric correction and pseudo-invariant feature (PIF) radiation normalization [43], and the calibration parameters corresponding to the dates of images could be obtained from the official websites of the CCRSDA. Finally, according to the ground control points, geometric correction is carried out on the image in late September, 2017 (the reference image, number 25 in Table 1), after which the image-to-image approach with a second-order polynomial transformation, and nearest neighbor resampling was applied to co-register other images, and ensured that the root mean squared error (RMSE) of the corrected images were less than 0.5 pixels [36]. The NDVI (normalized difference vegetation index) images were then calculated as Equation (1) = (1) where the NIR and R are the surface reflectance of near infrared (NIR) and red bands, respectively.
The NDVI values ranged from -1 to 1, and the observed values generally ranged from below 0 for desert, open water, and snow, to nearly 1 in dense forests [44]. Many studies demonstrated that the time series of NDVI data can reflect the seasonal dynamics of different tree species [13,45]. However, the underestimation of NDVI is common due to clouds and atmospheric contamination, especially in mountainous regions in which the noise needs to be smoothed by filtering [46]. In this study, the Savitzky-Golay (SG) algorithm was selected to perform polynomial filtering on the time-series NDVI data, because it can better maintain the vegetation temporal dynamics and minimize the atmospheric effects [47,48], compared to double logistic (DL) and asymmetric gaussian (AG) [47,49]. The algorithm parameters of smooth window and convolution dimension were set to 4 and 1, to reconstruct and compare the time-series images ranging from 10-day (the first, middle, and last ten days of the month), 20-day (the first and last ten days of the month) to 30-day (the middle ten days of the month) intervals, which the time-scale of datasets had commonly considered in analysis.
In order to reduce the redundant information and detect useful patterns from NDVI time-series images, the principal component analysis (PCA; Figure 3) was utilized to transform the original correlated data into another set of uncorrelated variables [50]. The PCA will only keep a few spatiotemporal patterns which are independent and distinct from each other, that will help to separate the seasonal variability of different vegetation dynamics [51,52].

Temporal Texture Feature of Forest
To extract the texture features of tree species from images, the common gray level co-occurrence matrices (GLCM) was applied to identify the spatial relationship and structural characteristics of image's gray value, and its statistical distribution of image texture can be realized as well [53,54]. Studies have demonstrated that several textures derived from images using GLCM would be supportive for regional forest tree species identification and be capable of preferable results [55][56][57]. Four widely used texture parameters were selected for application, including contrast (CON), entropy (ENT), second moment (SM) and correlation (COR), that can better characterize forest canopy, as suggested [35,56,58,59]. The calculations of the four parameters can be express as below (Equations (2)-(5)) where Pi,j is the values of row i and column j in the GLCM, which is the probability of two pixel values corresponding to row i and column j in the image, appearing simultaneously at a certain distance and direction, and N is the dimension of GLCM ( Figure 3). The mean and VAR indicate the mean and variance of the adjacent pixel values, respectively. The window size was set to 3×3. For the details, please refer to Haralick et al. [60]. In order to eliminate the data redundancy, the Jeffries-Matusita (J-M) distance is used as a criterion to select the best texture feature variables which have been proven to accurately represent the separability of different categories [61,62]. The J-M distance ranges from 0 to 2; the higher the value, the higher the separability. In general, when the J-M value reaches over 1.9, there is a significant separation between the selected samples [63]. . The working flow of major images acquired, data processing, texture feature extractions, phenologiocal metrics calculation, and landscape mapping and accuracy assessment in ENVI and ArcGIS software.

Phenological Metrics Calculation
TIMESAT is a powerful tool to derive specific vegetation growth phase from time-series satellite images based upon the best mathematical fitting function [64,65]. In this study, five main phenological metrics, including the start of growing season (SOS, Julian date), end of growing season (EOS, Julian date), length of growing season (LOS, days), peak of season (POS, Julian date), and the length of peak-time (LOP, days) were obtained from the thirty-six GF-1 NDVI images ( Figure 3) using SG smoothing function in TIMESAT v.3.3 as suggested [47,66]. The SOS and EOS was identified from the fitted function as they reached the time point setting that was previously fixed, such as 10%−30% of the distance between the left/right minimum and the maximum [67,68]. Nevertheless, the threshold of the time setting varied from 20% to 30% due to the characteristics of locations and forest types or tree species selected [69][70][71]. To find a suitable timepoint, we first tested the SOS and EOS using the time setting of 20%, 25% and 30% derived from NDVI time series respectively situated at four dominant deciduous tree species, Lg, Qm, Bp, and Pd, except for Pt due to lack of field record. Then, the derived SOS and EOS were compared with the in-situ observation data acquired from the China phenology observation network (CPON) in the study area. The tests showed that the SOS and EOS of four tree species based on the 20% threshold from images analysis were more identical to ground truth than 25% and 30% of time settings (p < 0.05; Figure 4), hence the time setting of 20% was decided to obtain all phenological metrics. The LOS could be determined as the difference between the EOS and SOS, the date of the maximum value of the fitted temporal dynamic curve was defined as the POS, and the LOP was the length between 80% of left/right maximum of temporal curve.

Forest Mapping on Landscape and Accuracy Assessment
The landscape patterns of tree species were calculated based on Support Vector Machine (SVM), which is a non-parametric semi-supervised classifier derived from statistical learning theory and does not require input data with normal distribution. It maps the training data onto a higher-dimensional space and segregates a few new features from the original data [72]. Previous studies showed that SVM performs better than other usual approaches, such as the maximum-likelihood classification (MLC) [73,74], artificial neural networks (ANNs) [75,76] and random forest (RF) [77,78]. Therefore, we selected the SVM classifier with radial basis function (RBF), and set its kernel function to 1 and the penalty coefficient to 100, as suggested [35]. An appropriate classifier is essential to identify tree species and the supervised classification is clearly preferable to accuracy assessment based on the accessible detailed information and high-precision training data [79,80]. In our study, a total of 36 NDVI time-series images, the numbers of best obtainments of PCA transformed patterns, five selected phenological metrics and the determined texture features were provided as input layers for forest mapping on landscape and the schemes of various combinations of these data layers.
The overall accuracy of tree species identification was evaluated by kappa statistics, and user's and producer's accuracies [81]. The random sampling of data for classification and verification was repeated five times to generate a set of accuracies and kappa values for each temporal resolution image (10-, 20-, and 30-day). Differences in classification results (i.e., overall accuracy and kappa coefficient) among various resolutions and schemes of data combination were analyzed by using a one-way analysis of variance (ANOVA), and the least-significant difference (LSD) test with a significant p-value < 0.05.

Forest Species Identification Based on NDVI Time-Series Data
The kappa coefficients and overall accuracies (OA) of identification for different forest species decrease significantly from 79.4% (0.74) at 10-day, 78.35% (0.73) at 20-day, to 76.14% (0.70) using 30-day NDVI time-series, respectively (p < 0.05, Figure 5a). The kappa coefficient (OA) of forest tree species identification based upon PCA transformed patterns derived from 10-day NDVI time series increased from 0.41 (53.32%) only using the pattern of first component to 0.72 (78.27%), including the first five components in the analysis, and after that with more components included in the analysis the kappa coefficient (OA) kept on a stable condition around 0.76 (80%), and reaching the highest at 22 PCs (0.78 of kappa coefficient and 82.18% of OA) (Figure 5b). Therefore, the 22 PCs were provided as input for final landscape forest mapping.

Phenoloigcal Patterns and Forest Species Identification with Phenological Metrics
The phenological metrics of five dominant tree species in the study area were significant different ( Figure 6). The patterns of five phenological metrics obviously changed with elevational gradient, for example the SOS (EOS) delayed (advanced) with elevation, which resulted in a decrease in LOS with increasing elevation. Both POSs (LOP) advanced (decreased) with increasing elevation (Figure 1c,  Figure 6a). The evergreen tree species, Pinus tabulaeformis (Pt), has a significant earliest SOS ( Figure 6b). The differences among four deciduous dominant species, such as the earlier SOS of Lg, the later EOS of Qm, the shorter LOS of Bp, and the later POS of Lg, and the shorter LOP of Pd, shows the potential to be used to distinguish the different tree species.

Forest Mapping with Temporal Texture Features
The J-M value of four texture parameters, CON, ENT, SM and COR, in most comparisons among species were higher than 1.95, except for the pair-comparisons of Pt-Lg in ENT, Pt-Bp in COR, and Lg-Bp in ENT, SM and COR, which were lower 1.80 ( Table 2). The texture feature CON was finally selected for the following identification of tree species due to the highest J-M value (1.99). Therefore, the integration of 10-day NDVI time series, phenological metrics (SOS, EOS, LOS, POS and LOP) and CON texture feature together showed the highest kappa coefficient (OA), 0.814 (85.1%), on mapping tree species compared to other schemes (p < 0.05; Figure 8), however the combination of NDVI time series and four texture parameters for classification showed the lowest accuracy, 0.629 (70.7%) (Figure 8).

Forest Mapping on Landscape with Different Datasets
The landscape patterns based on SVM of different forest tree species in the study region combining with 10-day time series NDVI, the information of 22 PCs derived from PCA transformation based on 10-day NDVI, five phenological metrics, and contrast texture information showed that high spatial agreement occurred at intact forest area (Figure 9). The low spatial agreement was mainly dispersed at small and fragmented patches of tree species and the boundaries among different species. For the accuracy of forest mapping at landscape, there are only two tree species, Pt and Qm, with producer and user accuracies higher than 80% based on the original 10-day NDVI images, while the producer accuracy of Pd was lower than 53% ( Table 3). The overall accuracy will reach over 82% when applied with NPCA and the producer and user accuracies of each tree species were higher than 70% (Table 3). The combination of NPCA and phenological metrics (P) would increase the overall accuracy to 83% and the accuracy of tree species identification had some enhancements. When the contrast was integrated with NPCA and phenological metrics, the accuracy of forested landscape identification increased to 85% and the producer (user) accuracies of each tree species identification could reach over 77.3% (81.5%) for Pd and 100% (92.4%) for Pt ( Table 3). Regardless of the used data schemes, the producer (user) accuracies of each tree species revealed similar patterns, increasing from Pd to Bp, Lg, Qm to the best result of Pt (Table 3).  Table 3. Producer's accuracy (PA) and user's accuracy (UA) of forest mapping for individual tree species and overall accuracy based on different schemes. N indicates the original 10-day NDVI time series, NPCA stands for transformed patterns of 22 PCs from PCA analysis based on 10-day time series of NDVI, P indicates the phenological metrics derived from 10-day NDVI images, and C means the contrast of texture feature.

Discussion
The higher accuracies of forest mapping can usually be achieved by utilizing multi-temporal images instead of using a single one [82,83], and the seasonality of vegetation growth may be of critical assistance for separating land cover types [84]. Therefore, the high dimension of remotely sensed data will provide a better opportunity to capture the distinct cycles of annual growth of diverse vegetation [85,86]. In this study, we compared the effects of using different time-scale compositions derived from original 10-day GF-1 NDVI data on mapping tree species, and the results showed that the accuracy of tree species' identification significantly increased from 30-day to 10-day time-series (p < 0.05). The accuracy of results was bettered when based on the PCA-transformed 10-day NDVI time series, by approximately 3%, because the PCA analysis can effectively remove redundant information from the time-series data and only retains a few independent spatiotemporal patterns that benefit forest identification [87,88].
The phenological metrics calculated from annual vegetation index has been proved to increase the capability of distinguishing the variance in different tree species [31], as well as the paddy rice area extraction [33]. Several studies revealed that there were still inconsistencies between land surface phenology obtained from remotely sensed data and in situ observations [30,89]. In our study, the SOS and EOS derived from time-series images of main studied tree species were identical to field data ( Figure 4). This also explained why the accuracy of tree identification had significantly improved by integrating five phenological indicators derived from 10-day NDVI data instead of 30-day data. This finding also suggested that the relative lower resolution of data might miss the rapid seasonal shifts of vegetation growth [90,91].
Texture information, as a function of spatial patterns of pixel-spectral response in remotely sensed data, has been proved to improve forest mapping accuracy [53]. The study conducted in hinoki cypress and cool-temperate mixed forest based on six texture measures (homogeneity, contrast, dissimilarity, entropy, second moment and correlation) by GLCM derived from IKONOS-2 image in 4-20 m resolution indicated that the tree identification improved prominently, and kappa coefficient increased by 0.1 [34]. The relative work in the temperate forest in northeast China showed that the combination of four texture measures (contrast, entropy, second moment and correlation) by GLCM obtained from Gaofen images (1-8 m resolution) increased the power of discrimination of different tree species, Pinus tabulaeformis, Larix gmelinii, Populus davidiana, Betula platyphylla, and Quercus mongolica, by 2.0%, to 3.6% [36]. Although previous studies have confirmed the assistance of texture features to individual images, the effect of seasonal texture characteristics acquired from multi-temporal images remains highly uncertain. In this study, the comparisons indicated that the inclusion of CON, NDVI time series and phenological metrics significantly enhanced the accuracy of regional tree species identification. In the final landscape patterns of tree species identification, the highest accuracy of the scheme of NPCA+P+C also proved that, using this approach, the dispersal patterns of Populus davidiana (Pd) on flat valley in the study region and non-forest regions had been further decreased and eliminated (Figures 1 and 9). However, we also found that the ENT, SM and COR served as input, in contrast, reduced the overall accuracy of the results by 12.7% which is the same as previous outcomes [55][56][57]. The commonly applied texture features, including contrast, correlation, and entropy, are demonstrated to be superior to using the entire texture features, in which the contrast (CON) was the most recommended [56]. Over-enhancement of the texture features would be contrary to leading to effective information loss and might consequently lower the capability of recognizing forest patterns [92].
In addition, the spatial resolution of the image has an important influence on the applicability of texture features [34,93]. Studies conducted on a single image suggested that including texture information would improve the accuracy of tree species identification when the spatial resolution was better than 36 m [34,94]. The spatial resolution (16 m) of GF-1 NDVI data utilized in this study is higher than studies suggested and has potential to obtain good quality detailed texture information [36,95]. Our study indicates that the higher spatiotemporal resolution utilized to map temperate forest tree species in northeast China performed competently. The tree species can be best recognized with a combination of 10-day NDVI time series, texture features, and phenological metrics, including multiple vegetation growth phases simultaneously, as mentioned [32,33]. With the advancement in higher spatiotemporal resolution of satellite images, it can be expected that the available data in the future will be more capable of mapping the forest tree species on landscape in more detail, with more conformity with ground truth. Furthermore, the development of a better strategy to precisely delineate the forest covers at landscape level is crucial to provide an accurate input to biogeochemical model for carbon, nutrient budgets and productivity estimations, and the following upscale simulations [96][97][98].
The further studies of tree identification are imperative for projected forest management in tropics and subtropics which provide diverse habitats for the wildlife and ecosystem services, and are also a vital carbon pool worldwide. It will be more challenging to map expected tree species in the humid tropical and subtropical regions where the more diverse evergreen broadleaved forests and mixed phenological cycles will make the identification of tree species complicated compared to temperate and arid regions. However, it will be valuable work for sustainable development.

Conclusions
With the increasing requirement to manage regional forest resources, it will be more fundamental to acquire the distribution of tree species in real-time. The time-series of satellite images provides spectral information of vegetation growth dynamics, which have a key role in identifying regional forest tree species. Compared to previous studies conducted with individual images, we proposed a method for identifying five dominant tree species on a forested landscape in northeast China based on time-series of remotely sensed data. The combination with a time-series of GF-1 NDVI images (10-day and 16 m in spatiotemporal resolution), the phenological metrics and temporal texture features of forest canopy, the kappa coefficient (overall accuracy) for the forest tree species can reach 0.814 (85.13%), which performed better than conventional monthly images. The free availability, better spatial resolution and wider coverage of GF-1 data compared to other environmental satellite images will have the potential for mapping tree species at a regional scale, and its application is also urgent in different forests with various environment settings.  Acknowledgments: Thanks to Huaifei Shen, Nianxu Xu and Shaofei Tang for the field work and data collection.

Conflicts of Interest:
The authors declare no conflict of interest.