remote sensing A Planted Forest Mapping Method Based on Long-Term Change Trend Features Derived from Dense Landsat Time Series in an Ecological Restoration Region

: Planted forests provide a variety of meaningful ecological functions and services, which is a major approach for ecological restoration, especially in arid areas. However, mapping planted forests with remote-sensed data remains challenging due to the similarities in canopy spectral and structure characteristics and associated phenology features between planted forests and other vegetation types. In this study, taking advantage of the Google Earth Engine (GEE) platform and taking the Ningxia Hui Autonomous Region in northwestern China as an example, we developed an approach to map planted forests in an arid region by applying long-term features of the NDVI derived from dense Landsat time series. Our land cover map achieved a satisfactory accuracy and relatively low uncertainty, with an overall accuracy of 93.65% and a kappa value of 0.92. Speciﬁcally, the producer (PA) and user accuracies (UA) were 92.48% and 91.79% for the planted forest class, and 93.88% and 95.83% for the natural forest class, respectively. The total planted forest area was estimated as 3608.72 km 2 in 2020, accounting for 20.60% of the study area. The proposed mapping approach can facilitate assessment of the restoration effects of ecological engineering and research on ecosystem services and stability of planted forests.


Introduction
Afforestation and reforestation are widely applied to restore disturbed ecosystems, to combat desertification and soil erosion, and to mitigate carbon emissions [1,2]. To restore the disturbed ecosystems, China has implemented a series of ecological programs, such as the Three-North Shelterbelt Program (since 1978), the Grain for Green Program (since 1999), and the Natural Forest Conservation Program (since 2000) [3,4]. Owing to the ecological programs, greening has been observed in northern China [5]. However, the contribution of afforestation is ambiguous, as planted forests are distributed dispersedly and cannot be separated from natural forests at large scales [6]. Natural and planted forests provide different aspects of ecosystem services [7]. A precise map of both forest types can explicitly locate the planted forest expansion and natural forest loss [8]. In contrast, forest assessments without distinguishing planted from natural forests may underestimate forest change [9,10]. Therefore, fine-resolution mapping of planted forests is essential to not only assess ecosystem service and stability, but also evaluate the restoration effects of ecological programs [11].
Remote sensing is widely employed in land cover mapping, especially for forest monitoring [12,13]. Many studies have explored the feasibility of Landsat images in planted forest mapping [10,14], mostly based on the following three types of characteristics: multispectral features [15,16], canopy texture [17,18], and phenological insights [19][20][21]. From spectral, structural, and temporal perspectives, planted forests are characterized by faster growth, single-species monocultures, and shorter harvest rotation periods, compared to the counterparts of natural forests, respectively [22,23]. These characteristics function well in separating planted economic trees from natural forests. However, two challenges remain for planted forests identification in ecological restoration regions. First, the spectral and structural signatures of planted forests are similar to those of other vegetation types at different growing stages. For example, young or sparse forests are similar to cropland and grasslands regarding to spectral signatures, and mature plantations are similar to natural forests [6,22,24]. Second, phenology-based features cannot discriminate plantations from natural forests when plantations contain similar species to those in natural forests. The implementation modes of ecological engineering and characteristics, such as the species, soil backgrounds, and disturbances result in high diversity and variability of planted forests, which cannot be characterized based on universally identifiable phenological features [25,26].
The long-term growth metrics and change event characteristics of time series are promising for planted forest identification, which has been highlighted in extracting pine and rubber plantation [10,23]. Tree plantation is a process ranging from non-forested to forested areas, followed by continuously growing process in subsequent years, during which, their vegetation activity, as reflected by the vegetation index (VI), steadily increases [27]. In this sense, the subsequent VI of newly planted forests are slightly higher than those in previous years. In contrast, the VI of natural vegetation types fluctuate throughout the different seasons over time [22,24]. Including the long-term trend characteristics into planted forest identification is helpful to tackle above challenges.
In this paper, we took a semiarid ecological restoration region in the Ningxia Hui Autonomous Region in northwestern China as an example, and employed long-term change trend features derived from dense Landsat time series and random forest classification on the Google Earth Engine (GEE) platform to provide new insights into planted forest mapping.

Materials and Methods
There are four major steps in our mapping approach: (a) data preprocessing, (b) multifaceted feature extraction, (c) reference sample collection, and (d) classification, validation and accuracy assessment ( Figure 1).

Study Area
The study was conducted in a semiarid region in South Ningxia, with a longitude of 105 • 12 -106 • 57 E and a latitude of 35 • 15 -36 • 57 N, and a total area of 17,452 km 2 . The area is at the southern mountainous and loess hilly area in the central-western part of the Loess Plateau ( Figure 2). The area is of a typical temperate continental monsoon climate, with annual mean temperature ranging from −9 • C to 24 • C, annual precipitation ranging from 180 mm in the north to 800 mm in the south (80% of which was precipitated in summer and autumn) and annual surface evaporation of 1250 mm. The Chinese government implemented a series of policies and programs to manage the serious ecological problems encountered within this region [13]. However, high aridity, extensive human activities, and long-term soil erosion have resulted in ecological vulnerability and sensitivity, which make the survival and growth of planted forests difficult [28].  Table A1 (Appendix A).

Study Area
The study was conducted in a semiarid region in South Ningxia, with a longitude of 105°12′-106°57′E and a latitude of 35°15′-36°57′N, and a total area of 17,452 km 2 . The area is at the southern mountainous and loess hilly area in the central-western part of the Loess Plateau ( Figure 2). The area is of a typical temperate continental monsoon climate, with annual mean temperature ranging from −9 °C to 24 °C, annual precipitation ranging from  Table A1 (Appendix A).

Landsat Time Series Processing
All available Landsat 5, 7, and 8 surface reflectance data from 1986 to 2020 were compiled and processed on the GEE platform. Systematic atmospheric and terrain corrections were conducted to produce a Level-1 precision product [29]. In total, 460 Landsat scenes with less than 30% cloud cover were assembled, with an average of 13 images per year ( Figure 3a). Observations of individual pixels were processed with the CFmask algorithm, which can identify clouds and cloud shadows [30]. To get the consistent data among Landsat sensors, we harmonized Landsat TM and ETM+ data into OLI data according to Roy et al. (2016) on the GEE platform [31] (https://developers.google.com/earth-engine/ tutorials/community/landsat-etm-to-oli-harmonization, accessed on 10 August 2021). Several vegetation indices (VIs) were calculated and applied as input features for classification (see the full list in Table A1, Appendix A). Among these indices, the normalized difference vegetation index (NDVI) was employed to calculate the long-term change trend and textural metrics [32]. 180 mm in the north to 800 mm in the south (80% of which was precipitated in summer and autumn) and annual surface evaporation of 1250 mm. The Chinese government implemented a series of policies and programs to manage the serious ecological problems encountered within this region [13]. However, high aridity, extensive human activities, and long-term soil erosion have resulted in ecological vulnerability and sensitivity, which make the survival and growth of planted forests difficult [28].

Landsat Time Series Processing
All available Landsat 5, 7, and 8 surface reflectance data from 1986 to 2020 were compiled and processed on the GEE platform. Systematic atmospheric and terrain corrections were conducted to produce a Level-1 precision product [29]. In total, 460 Landsat scenes with less than 30% cloud cover were assembled, with an average of 13 images per year ( Figure 3a). Observations of individual pixels were processed with the CFmask algorithm, which can identify clouds and cloud shadows [30]. To get the consistent data among Landsat sensors, we harmonized Landsat TM and ETM+ data into OLI data according to Roy et al. (2016) on the GEE platform [31] (https://developers.google.com/earth-engine/tutorials/community/landsat-etm-to-oli-harmonization, accessed on 10 August 2021). Several vegetation indices (VIs) were calculated and applied as input features for classification (see the full list in Table A1, Appendix A). Among these indices, the normalized difference vegetation index (NDVI) was employed to calculate the long-term change trend and textural metrics [32].

Multifaceted Feature Extraction
The change rate, change magnitude, mean (μ), standard deviation (σ), and coefficient of variation (cv) of the NDVI over a long time series (from 1986 to 2020) were employed to illustrate the long-term change trend features (Figure 1b). Although planted forests ex-

Multifaceted Feature Extraction
The change rate, change magnitude, mean (µ), standard deviation (σ), and coefficient of variation (cv) of the NDVI over a long time series (from 1986 to 2020) were employed to illustrate the long-term change trend features ( Figure 1b). Although planted forests exhibit similar features to those of natural forests and other vegetation types, the long-term change trend characteristics provide the potential to distinguish planted forests from other vegetations. Changes in land cover, such as the conversion of barren land to grassland and farmland reclamation, may exhibit different dynamics in NDVI from that of tree growth of the planted forests. The NDVIs change sharply in the second year after the conversion of barren land to grassland or farmland reclamation and then remain stable, whilst the NDVIs of the newly planted forests change continuously [22,24]. Due to the specific growth process, planted forests exhibited notable variation in the change rate, mean (µ), standard deviation (σ) of long-term NDVI time series (Figure 4a,c,d). Additionally, mainly due to the recent national forestation campaigns implemented, planted forests are characterized by relatively young age and sparse distribution [33]. In contrast, natural forests attain higher mean NDVI values than other vegetation types (i.e., newly planted forests and croplands) over long periods [20]. Consequently, standard deviation (σ) and coefficient of variation (cv) of the NDVI could be determined to discriminate planted forests from other vegetation types, as their values should be higher for planted forests. Planted forests and other three vegetation types could be easily distinguished according to the long-term trend features of the reference samples ( Figure 4). In addition to long-term change trend features, we calculated other widely considered features, including textural and terrain metrics, in the following sections.

Long-Term Change Trend Features
Based on all available Landsat data on the GEE platform, we analyzed long-term trend features with the harmonic analysis of time series (HANTS) algorithm. This approach is useful to harmonize Landsat time series with the phenological cycles. The HANTS algorithm not only de-clouds and reconstructs remote sensing time series but also is advantageous for placing fewer requirements on the input image data, of which the

Long-Term Change Trend Features
Based on all available Landsat data on the GEE platform, we analyzed long-term trend features with the harmonic analysis of time series (HANTS) algorithm. This approach is useful to harmonize Landsat time series with the phenological cycles. The HANTS algorithm not only de-clouds and reconstructs remote sensing time series but also is advantageous for placing fewer requirements on the input image data, of which the time interval can be unequal [34]. We assembled about 10-20 (or more) observations in most years, which were sufficient to fit the harmonic model (Figure 3a). The HANTS algorithm considers both the seasonal change and characteristics of data to truly reflect the periodic changes in vegetation during time series reconstruction. We first decomposed the complex time series signal into a function of individual sine or cosine waves based on the Fourier series [35]: where t is the timestamp value composited by harmonic regression, m is the length of the seasonal change, and n is the order of the polynomial and approximately equal to the number of harmonics. Second, ordinary least squares (OLS) regression was employed to estimate the observations, and a composite Fourier series was adopted to determine the best-fit model against the original time series values. Then, the estimated coefficients (c 0 , a i , and b i ) were stored to obtain a fitted harmonic series, seasonal change, and residual errors: where F t is the harmonic fit, β 0 and β 1 are the linear regression coefficients, β 2 and β 3 are coefficients based on the Fourier series, ω is the frequency, and t is the timestep. Finally, the harmonic series data were substituted into the linear regression model to obtain the change rate and magnitude. The mean (µ) and standard deviation (σ) of the original long-term time series of the NDVI were computed pixel by pixel, and the coefficient of variation (cv) could then be obtained with the following equation:

Textural Metrics
We obtained the median cloud-free NDVI imagery for the year 2020, and calculated the joint probability distribution of pairs of gray levels (C ij ) as a function of the gray-level cooccurrence matrix (GLCM) [36,37], which is defined by the NDVI composite collection for 2020 using median values: where P ij is the count of NDVI occurrences, δ is the distance between two pixels (here, δ = 1), G is the quantized number of gray levels (here, G = 256), and θ is the orientation (θ = 0 • , 45 • , 90 • , and 135 • ). We calculated eight widely used texture features based on the GLCM method (Table A2). We tested the eight metrics under different window sizes, and the top six high-importance features (considering the sum of the averages under the five window sizes and the correlation at 64 × 64 pixels) of the textural metrics were employed in our method ( Figure A1).

Terrain Metrics
The Shuttle Radar Topography Mission V3 product (SRTM Plus) provided by NASA JPL yielded digital elevation models (DEMs) at a resolution of 1 arc-second (approximately 30 m) [38]. We calculated images containing bands of the elevation, slope, and aspect according to the terrain DEM as input features in this study.

Field Sampling Data
The required but rarely recorded forest origin information (natural or plantation) in field measurements is central to the difficulty in planted forest identification [24]. We determined the forest types that could not be identified from the obtained Landsat and high-resolution images based on field-sampled data.
One source was the Seventh National Forest Inventory (NFI 7th) surveyed from 2004-2008, in which attribute information such as land types and forest origins were adopted. According to the Technical Regulation for the national forest inventory, the position errors of coordinates of the NFI 7th plot data were less than 15 m at the GPS location. We first selected natural forests by evaluating the forest origin. After eliminating non-forest pixels in Landsat and Google Earth images for 2020, 135 natural forest pixels were labeled.
In addition, we conducted a field survey of the forest types on the Loess Plateau from July to August 2021. LocaSpace Viewer (http://uatwww.locaspace.cn/ (accessed on 10 August 2021)) was applied to collect location information on the sampled planted forests (695 pixels) and natural forests (120 pixels).

Reference Data Based on Images
We visually assessed the other landcover types via Landsat and high-resolution Google Earth images for 2020, which is a practical sampling method that can distinguish samples with a satisfactory accuracy [13]. There were 4500 pixels labeled as reference data for training and validation purposes in this study (Figure 3b and Table 1).

Classification, Validation and Accuracy Assessment
The long-term change trend, VI, texture, and terrain features extracted above were sequentially fed into random forest (RF) models to classify land cover into seven types, i.e., planted forest, natural forest, cropland, grassland, unused land, built-up area, and waterbody (Table A3). To assess the effectiveness of long-term change trend characteristics and optimize the classification results, we selected VIs and terrain features as input features first, then sequentially included texture and the long-term change trend and finally jointly input all these features (Figure 1d).
We divided the 4500 reference pixels into training (80%) and validation (20%) sets through random splitting. After employing a confusion matrix to compute the user accuracy (UA), producer accuracy (PA), and overall accuracy (OA) to determine the input feature performance [39], we obtained land cover results with the optimal input features.
The uncertainty analysis is coupled with an assessment of the classification accuracy at the overall, class-, and pixel-levels. For probabilistic classifiers, the classification uncertainty is characterized by the posterior probabilities that a pixel belongs to different land cover classes [40]. The RF classifier provided information on the pixel-level uncertainty in this study, which was calculated by 1-Pmax, with Pmax the maximum probability of being classified as each land cover class [41]. In addition, we applied the ee.pixelArea() function to compute the land cover area on the GEE platform, which considered the difference in pixel size change with the latitude and longitude (due to projection).

Results
After considering long-term change trend metrics, the PA and UA of the planted forest class improved from both approximately 76% to 86% and 92%, respectively. Similarly, the PA and UA of the natural forest class improved from approximately 69% and 88% to 100% and 92%, respectively. When combining the long-term change trend and texture metrics as input features, the accuracy of the planted forest class improved from approximately 76% to over 90%. Contrast analysis confirmed that trend features were crucial for planted forest identification, which improved the separability and guaranteed the classification accuracy across the different vegetation types (Table 2 and Figures 5a and A2). Table 2. User accuracy (UA), producer accuracy (PA), and overall accuracy (OA) of the seven land cover types using the vegetation index (VI), using VI and GLCM-based textural metrics, using VI and long-term change trend metrics, and using VI, textural metrics, and long-term change trend metrics. The PA and UA for the planted and natural forest class, OA, and kappa coefficient were illustrated in bold.

Land Cover Types
Planted Forest

Grass-Land
Unused Land

Built-Up Area
Water-Body Total PA  After using VI, textural metrics, and long-term change trend metrics, the OA reached 93.65%. The PA and UA were 92.48% and 91.79% for the planted forest, 93.88% and 95.83% for the natural forest, and 86.21% and 89.29% for the unused land, respectively. The other classes attained PA and UA values over 90%. In addition, the kappa coefficient reached 0.92, indicating that the classification method performed well (Table 2).

Using the vegetation index (VI)
Natural forests were mainly concentrated in the Liupan Mountain District. Planted forests were mostly densely distributed around the natural forests in the southern region and loess hilly region in the central and eastern regions. Croplands were widely distributed in the central and western regions. Unused land and grasslands were distributed in the northern part of the study area (Figure 5a). Croplands attained the highest proportion (5923.99 km 2 or 33.89% of the study area), followed by grasslands (4192.97 km 2 , 23.99%), planted forests (3608.72 km 2 , 20.60%), and unused land (2149.20 km 2 , 12.29%). Natural forests occupied only 6.58% (1149.43 km 2 ) of the study area (Figure 5b).

Uncertainty Analysis
At pixel level, the classification uncertainty of a given pixel belonging to a certain land cover type was generally low based on the above accuracy evaluation results ( Figure 6). The uncertainty value was less than 0.1 for at least 75% natural forest, unused land, built-up area, and waterbody pixels, was less than 0.2 and 0.3 for most regions for planted forests and grasslands. The relative high uncertainty for croplands might have resulted from orchards or unused land around the croplands [25]. After using VI, textural metrics, and long-term change trend metrics, the OA reached 93.65%. The PA and UA were 92.48% and 91.79% for the planted forest, 93.88% and 95.83% for the natural forest, and 86.21% and 89.29% for the unused land, respectively. The other classes attained PA and UA values over 90%. In addition, the kappa coefficient reached 0.92, indicating that the classification method performed well (Table 2).
Natural forests were mainly concentrated in the Liupan Mountain District. Planted forests were mostly densely distributed around the natural forests in the southern region and loess hilly region in the central and eastern regions. Croplands were widely distributed in the central and western regions. Unused land and grasslands were distributed in the northern part of the study area (Figure 5a). Croplands attained the highest proportion (5923.99 km 2 or 33.89% of the study area), followed by grasslands (4192.97 km 2 , 23.99%), planted forests (3608.72 km 2 , 20.60%), and unused land (2149.20 km 2 , 12.29%). Natural forests occupied only 6.58% (1149.43 km 2 ) of the study area (Figure 5b).

Uncertainty Analysis
At pixel level, the classification uncertainty of a given pixel belonging to a certain land cover type was generally low based on the above accuracy evaluation results ( Figure 6). The uncertainty value was less than 0.1 for at least 75% natural forest, unused land, built-up area, and waterbody pixels, was less than 0.2 and 0.3 for most regions for planted forests and grasslands. The relative high uncertainty for croplands might have resulted from orchards or unused land around the croplands [25]. Other possible sources of uncertainty include sensor inconsistency, the number and quality of valid observations, and the effect of NDVI reconstruction model. Different satellite sensors may perform inconsistently for the land cover types. However, previous studies have already indicated high similarity of the Landsat sensors, and that plantation mapping can be performed well from the long time series [42,43]. Moreover, the harmonic regression of dense vegetation index time series could expand the utility of the Landsat [44]. In this study, the interannual change trend of NDVI based on the sensors of TM, ETM+ and OLI has been harmonized on GEE using the harmonic model HANTS. The verification analysis showed that 95% of root mean square error (RMSE) values of the HANTS fit were less than 0.1, indicating that the inconsistencies between different satellite sensors could be negligible.

Limitations and Direction of Future Studies
Our land cover map achieved a satisfactory accuracy and a relatively low uncertainty. Planted forests could be distinguished using the long-term change trend derived Other possible sources of uncertainty include sensor inconsistency, the number and quality of valid observations, and the effect of NDVI reconstruction model. Different satellite sensors may perform inconsistently for the land cover types. However, previous studies have already indicated high similarity of the Landsat sensors, and that plantation mapping can be performed well from the long time series [42,43]. Moreover, the harmonic regression of dense vegetation index time series could expand the utility of the Landsat [44]. In this study, the interannual change trend of NDVI based on the sensors of TM, ETM+ and OLI has been harmonized on GEE using the harmonic model HANTS. The verification analysis showed that 95% of root mean square error (RMSE) values of the HANTS fit were less than 0.1, indicating that the inconsistencies between different satellite sensors could be negligible.

Limitations and Direction of Future Studies
Our land cover map achieved a satisfactory accuracy and a relatively low uncertainty. Planted forests could be distinguished using the long-term change trend derived from dense Landsat time series and machine learning classification. However, there are limitations and difficulties in the planted forest classification that should be addressed in future studies.
First, sufficient and representative reference samples are often decisive and remain a challenge in accurate supervised classification [45]. Nevertheless, spatiotemporally consistent collection (e.g., yearly or over a large area) of reference samples is expensive in terms of labor or time, infeasible in practice, and impossible over historical periods [46,47]. Therefore, it is necessary to develop an automatic land cover classification method for various time periods based on a small number of reference samples [48].
Second, ecological programs involving afforestation usually convert barren land, grasslands, or sloping farmlands into forests, with the aim to reduce soil erosion, restore ecosystem health, and improve local ecosystem quality [49,50]. Meanwhile, various shrub species, such as Caragana microphylla, are also planted because of their adaptive capacity to semiarid habitats. However, we did not distinguish planted shrublands from natural shrublands in our study, which may underestimate the effects of the ecological programs. It is valuable to provide scientific information on which species to plant by quantifying the characteristics of different vegetation types (trees or shrubs) [51].
In addition, certain planted forests share a few apparent similarities with grasslands in the extracted change magnitude and coefficient of variation features (Figure 4b,e) because relatively young planted forests can be easily confused with grasslands [33]. In this case, segmentation temporal indicators derived from a subset of time windows and combination of change detection could be considered in future research.
Our algorithm is aimed at mapping planted forest based on long-term change trend features in an ecological restoration region, and the method can be reliable in arid and semi-arid regions with similar climates. As Landsat observations are insufficient in other regions, we intend to further pursue the time series reconstruction method considering spatial and temporal fusion. In addition, the effective long-term change trend features can provide a scientific foundation for automatic mapping of planted forests in other regions. To be sure, the algorithm is applicable in arid and semi-arid regions, but in humid subtropical or tropical regions, such as southern China, there are usually herbs or shrubs before forestation. The NDVI value of these vegetation are usually high, which may affect the applicability of NDVI change trend features.

Conclusions
In this study, by combining long-term change trend features derived from remote sensing time series and the random forest machine learning model, we developed an effective approach to identify planted forests. Long-term change trend features are crucial for planted forest identification, which could improve the accuracy from approximately 76% by using vegetation index only to above 90%. The landcover map achieved a high classification accuracy, with the overall accuracy of 93.65%, the kappa coefficient of 0.92, and high producer and user accuracies for both planted and natural forest classes (all above 91.0%). This novel mapping method for planted forest and effective features provide a scientific foundation for automatic mapping of planted forests at large scales and over long periods in other regions, for the assessment of multiple ecosystem services and ecosystem stability of planted forests, and for the evaluation of the restoration effects of ecological programs.

Texture Features Equations
Angular second moment (ASM) x i −x j C ij where x i and x j denote the NDVI values of pixel i and its neighbor pixel j, respectively, µ is the mean of the GLCM matrix, µ x and µ y and σ x and σ y are the means and standard deviations, respectively, of the matrix rows and columns, respectively, and C ij is the probability distribution of pairs of gray levels, as defined in Equation (4). Table A3. Land cover types and class definitions according to the land use/cover classification (LUCC) system.

Land Cover Types Description
Planted forest Planted trees, shrubs, bamboo and other forest vegetation. Natural forest Natural trees, shrubs, bamboo and other forest vegetation.

Cropland
Land for crop planting, including cultivated land, newly reclaimed land, fallow land, rotation and rest land, and cereal croplands. Grassland Dominated by herbaceous vegetation. Unused land Nonvegetated barren (sand, rock, and soil) areas.
Built-up area Land for industrial activities, mining and vehicles in urban and rural residential areas.

Waterbody
Covered by permanent water bodies and water conservancy facilities.